Files
owlen/AGENTS.md
vikingowl 4ce4ac0b0e docs(agents): rewrite AGENTS.md with detailed v0.2 execution plan
Replaces the original overview with a comprehensive execution plan for Owlen v0.2, including:
- Provider health checks and resilient model listing for Ollama and Ollama Cloud
- Cloud key‑gating, rate‑limit handling, and usage tracking
- Multi‑provider model registry and UI aggregation
- Session pipeline refactor for tool calls and partial updates
- Robust streaming JSON parser
- UI header displaying context usage percentages
- Token usage tracker with hourly/weekly limits and toasts
- New web.search tool wrapper and related commands
- Expanded command set (`:provider`, `:model`, `:limits`, `:web`) and config sections
- Additional documentation, testing guidelines, and release notes for v0.2.
2025-10-18 04:52:07 +02:00

255 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AGENTS.md — Owlen v0.2 Execution Plan
**Focus:** Ollama (local) ✅, Ollama Cloud (with API key) ✅, context/limits UI ✅, seamless web-search tool ✅
**Style:** Medium-sized conventional commits, each with acceptance criteria & test notes.
**Definition of Done (DoD):**
* Local Ollama works out-of-the-box (no key required).
* Ollama Cloud is available **only when** an API key is present; no more 401 loops; clear fallback to local.
* Chat header shows **context used / context window** and **%**.
* Cloud usage shows **hourly / weekly token usage** (tracked locally; limits configurable).
* “Internet?” → the model/tooling performs web search automatically; no “I cant access the internet” replies.
---
## Release Assumptions
* Provider config has **separate entries** for local (“ollama”) and cloud (“ollama-cloud”), both optional.
* **Cloud base URL** is config-driven (default `[Inference] https://api.ollama.com`), **local** default `http://localhost:11434`.
* Cloud requests include `Authorization: Bearer <API_KEY>`.
* Token counts (`prompt_eval_count`, `eval_count`) are read from provider responses **if available**; else fallback to local estimation.
* Rate-limit quotas unknown → **user-configurable** (`hourly_quota_tokens`, `weekly_quota_tokens`); we still track actual usage.
---
## Commit Plan (Conventional Commits)
### 1) feat(provider/ollama): health checks, resilient model listing, friendly errors
* **Why:** Local should “just work”; no brittle failures if model not pulled or daemon is down.
* **Implement:**
* `GET /api/tags` with timeout & retry; on failure → user-friendly banner (“Ollama not reachable on localhost:11434”).
* If chat hits “model not found” → suggest `ollama pull <model>` (do **not** auto-pull).
* Pre-chat healthcheck endpoint with short timeout.
* **AC:** With no Ollama, UI shows actionable error, app stays usable. With Ollama up, models list renders within TTL.
* **Tests:** Mock 200/500/timeout; simulate missing model.
### 2) feat(provider/ollama-cloud): separate provider with auth header & key-gating
* **Why:** Prevent 401s; enable cloud only when key is present.
* **Implement:**
* New provider “ollama-cloud”.
* Build `reqwest::Client` with default headers incl. `Authorization: Bearer <API_KEY>`.
* Base URL from config (default `[Inference] https://api.ollama.com`).
* If no key: provider not registered; cloud models hidden.
* **AC:** With key → cloud models list & chat work; without key → no cloud provider listed.
* **Tests:** No key (hidden), invalid key (401 handled), valid key (happy path).
### 3) fix(provider/ollama-cloud): handle 401/429 gracefully, auto-degrade
* **Why:** Avoid dead UX where cloud is selectable but unusable.
* **Implement:**
* Intercept 401/403 → mark provider “unauthorized”, show toast “Cloud key invalid; using local”.
* Intercept 429 → toast “Cloud rate limit hit; retry later”; keep provider enabled.
* Auto-fallback to last good local provider for the active chat (dont lose message).
* **AC:** Selecting cloud with bad key never traps the user; one click recovers to local.
* **Tests:** Simulated 401/429 mid-stream & at start.
### 4) feat(models/registry): multi-provider registry + aggregated model list
* **Why:** Show local & cloud side-by-side, clearly labeled.
* **Implement:**
* Registry holds N providers (local, cloud).
* Aggregate lists with `provider_tag` = `ollama` / `ollama-cloud`.
* De-dupe identical names by appending provider tag in UI (“qwen3:8b · local”, “qwen3:8b-cloud · cloud”).
* **AC:** Model picker groups by provider; switching respects selection.
* **Tests:** Only local; only cloud; both; de-dup checks.
### 5) refactor(session): message pipeline to support tool-calls & partial updates
* **Why:** Prepare for search tool loop and robust streaming.
* **Implement:**
* Centralize stream parser → yields either `assistant_text` chunks **or** `tool_call` messages.
* Buffer + flush on `done` or on explicit tool boundary.
* **AC:** No regressions in normal chat; tool call boundary events visible to controller.
* **Tests:** Stream with mixed content, malformed frames, newline-split JSON.
### 6) fix(streaming): tolerant JSON lines parser & end-of-stream semantics
* **Why:** Avoid UI stalls on minor protocol hiccups.
* **Implement:**
* Accept `\n`/`\r\n` delimiters; ignore empty lines; robust `done: true` handling; final metrics extraction.
* **AC:** Long completions never hang; final token counts captured.
* **Tests:** Fuzz malformed frames.
### 7) feat(ui/header): context usage indicator (value + %), colorized thresholds
* **Why:** Transparency: “2560 / 8192 (31%)”.
* **Implement:**
* Track last `prompt_eval_count` (or estimate) as “context used”.
* Context window: from model metadata (if available) else config default per provider/model family.
* Header shows: `Model · Context 2.6k / 8k (33%)`. Colors: <60% normal, 6085% warn, >85% danger.
* **AC:** Live updates after each assistant turn.
* **Tests:** Various windows & counts; edge 0%/100%.
### 8) feat(usage): token usage tracker (hourly/weekly), local store
* **Why:** Cloud limits visibility even without official API.
* **Implement:**
* Append per-provider token use after each response: prompt+completion.
* Rolling sums: last 60m, last 7d.
* Persist to JSON/SQLite in app data dir; prune daily.
* Configurable quotas: `providers.ollama_cloud.hourly_quota_tokens`, `weekly_quota_tokens`.
* **AC:** `:limits` shows totals & percentages; survives restart.
* **Tests:** Time-travel tests; persistence.
### 9) feat(ui/usage): surface hourly/weekly usage + threshold toasts
* **Why:** Prevent surprises; act before you hit the wall.
* **Implement:**
* Header second line (when cloud active): `Cloud usage · 12k/50k (hour) · 90k/250k (week)` with % colors.
* Toast at 80% & 95% thresholds.
* **AC:** Visuals match tracker; toasts fire once per threshold band.
* **Tests:** Threshold crossings; reset behavior.
### 10) feat(tools): generic tool-calling loop (function-call adapter)
* **Why:** “Use the web” without whining.
* **Implement:**
* Internal tool schema: `{ name, parameters(JSONSchema), invoke(args)->ToolResult }`.
* Provider adapter: if model emits a tool call → pause stream, invoke tool, append `tool` message, resume.
* Guardrails: max tool hops (e.g., 3), max tokens per hop.
* **AC:** Model can call tools mid-answer; transcript shows tool result context to model (not necessarily to user).
* **Tests:** Single & chained tool calls; loop cutoff.
### 11) feat(tool/web.search): HTTP search wrapper for cloud mode; fallback stub
* **Why:** Default “internet” capability.
* **Implement:**
* Tool name: `web.search` with params `{ query: string }`.
* Cloud path: call providers search endpoint (configurable path); include API key.
* Fallback (no cloud): disabled; model will not see the tool (prevents “I cant” messaging).
* **AC:** Asking “what happened today in Rust X?” triggers one search call and integrates snippets into the answer.
* **Tests:** Happy path, 401/key missing (tool hidden), network failure (tool error surfaced to model).
### 12) feat(commands): `:provider`, `:model`, `:limits`, `:web on|off`
* **Why:** Fast control for power users.
* **Implement:**
* `:provider` lists/switches active provider.
* `:model` lists/switches models under active provider.
* `:limits` shows tracked usage.
* `:web on|off` toggles exposing `web.search` to the model.
* **AC:** Commands work in both TUI and CLI modes.
* **Tests:** Command parsing & side-effects.
### 13) feat(config): explicit sections & env fallbacks
* **Why:** Clear separation + easy secrets management.
* **Implement (example):**
```toml
[providers.ollama]
base_url = "http://localhost:11434"
list_ttl_secs = 60
default_context_window = 8192
[providers.ollama_cloud]
enabled = true
base_url = "https://api.ollama.com" # [Inference] default
api_key = "" # read from env OLLEN_OLLAMA_CLOUD_API_KEY if empty
hourly_quota_tokens = 50000 # configurable; visual only
weekly_quota_tokens = 250000 # configurable; visual only
list_ttl_secs = 60
```
* **AC:** Empty key → cloud disabled; env overrides work.
* **Tests:** Env vs file precedence.
### 14) docs(readme): setup & troubleshooting for local + cloud
* **Why:** Reduce support pings.
* **Include:**
* Local prerequisites; healthcheck tips; “model not found” guidance.
* Cloud key setup; common 401/429 causes; privacy note.
* Context/limits UI explainer; tool behavior & toggles.
* Upgrade notes from v0.1 → v0.2.
* **AC:** New users can configure both in <5 min.
### 15) test(integration): local-only, cloud-only, mixed; auth & rate-limit sims
* **Why:** Prevent regressions.
* **Implement:**
* Wiremock stubs for cloud: `/tags`, `/chat`, search endpoint, 401/429 cases.
* Snapshot test for tool-call transcript roundtrip.
* **AC:** Green suite in CI; reproducible.
### 16) refactor(errors): typed provider errors + UI toasts
* **Why:** Replace generic “something broke”.
* **Implement:** Error enum: `Unavailable`, `Unauthorized`, `RateLimited`, `Timeout`, `Protocol`. Map to toasts/banners.
* **AC:** Each known failure surfaces a precise message; logs redact secrets.
### 17) perf(models): cache model lists with TTL; invalidate on user action
* **Why:** Reduce network churn.
* **AC:** Re-open picker doesnt re-fetch until TTL or manual refresh.
### 18) chore(release): bump to v0.2; changelog; package metadata
* **AC:** `--version` shows v0.2; changelog includes above bullets.
---
## Missing Features vs. Codex / Claude-Code (Queued for v0.3+)
* **Multi-vendor providers:** OpenAI & Anthropic provider crates with function-calling parity and structured outputs.
* **Agentic coding ops:** Safe file edits, `git` ops, shell exec with approval queue.
* **“Thinking/Plan” pane:** First-class rendering of plan/reflect steps; optional auto-approve.
* **Plugin system:** Tool discovery & permissions; marketplace later.
* **IDE integration:** Minimal LSP/bridge (VS Code/JetBrains) to run Owlen as backend.
* **Retrieval/RAG:** Local project indexing; selective context injection with token budgeter.
---
## Acceptance Checklist (Release Gate)
* [ ] Local Ollama: healthcheck OK, models list OK, chat OK.
* [ ] Cloud appears only with valid key; no 401 loops; 429 shows toast.
* [ ] Header shows context value and %; updates after each turn.
* [ ] `:limits` shows hourly/weekly tallies; persists across restarts.
* [ ] Asking for current info triggers `web.search` (cloud on) and returns an answer without “I cant access the internet”.
* [ ] Docs updated; sample config included; tests green.
---
## Notes & Caveats
* **Base URLs & endpoints:** Kept **config-driven**. Defaults are `[Inference]`. If your cloud endpoint differs, set it in `config.toml`.
* **Quotas:** No official token quota API assumed; we **track locally** and let you configure limits for UI display.
* **Tooling:** If a given model doesnt emit tool calls, `web.search` wont be visible to it. You can force exposure via `:web on`.
---
## Lightly Opinionated Guidance (aka sparring stance)
* Dont auto-pull models on errors—teach users where control lives.
* Keep cloud totally opt-in; hide it on misconfig to reduce paper cuts.
* Treat tool-calling like `sudo`: short leash, clear logs, obvious toggles.
* Make the header a cockpit, not a billboard: model · context · usage; thats it.
If you want, I can turn this into a series of `codex apply`-ready prompts with one commit per run.