Replaces the original overview with a comprehensive execution plan for Owlen v0.2, including: - Provider health checks and resilient model listing for Ollama and Ollama Cloud - Cloud key‑gating, rate‑limit handling, and usage tracking - Multi‑provider model registry and UI aggregation - Session pipeline refactor for tool calls and partial updates - Robust streaming JSON parser - UI header displaying context usage percentages - Token usage tracker with hourly/weekly limits and toasts - New web.search tool wrapper and related commands - Expanded command set (`:provider`, `:model`, `:limits`, `:web`) and config sections - Additional documentation, testing guidelines, and release notes for v0.2.
12 KiB
AGENTS.md — Owlen v0.2 Execution Plan
Focus: Ollama (local) ✅, Ollama Cloud (with API key) ✅, context/limits UI ✅, seamless web-search tool ✅ Style: Medium-sized conventional commits, each with acceptance criteria & test notes. Definition of Done (DoD):
- Local Ollama works out-of-the-box (no key required).
- Ollama Cloud is available only when an API key is present; no more 401 loops; clear fallback to local.
- Chat header shows context used / context window and %.
- Cloud usage shows hourly / weekly token usage (tracked locally; limits configurable).
- “Internet?” → the model/tooling performs web search automatically; no “I can’t access the internet” replies.
Release Assumptions
- Provider config has separate entries for local (“ollama”) and cloud (“ollama-cloud”), both optional.
- Cloud base URL is config-driven (default
[Inference] https://api.ollama.com), local defaulthttp://localhost:11434. - Cloud requests include
Authorization: Bearer <API_KEY>. - Token counts (
prompt_eval_count,eval_count) are read from provider responses if available; else fallback to local estimation. - Rate-limit quotas unknown → user-configurable (
hourly_quota_tokens,weekly_quota_tokens); we still track actual usage.
Commit Plan (Conventional Commits)
1) feat(provider/ollama): health checks, resilient model listing, friendly errors
-
Why: Local should “just work”; no brittle failures if model not pulled or daemon is down.
-
Implement:
GET /api/tagswith timeout & retry; on failure → user-friendly banner (“Ollama not reachable on localhost:11434”).- If chat hits “model not found” → suggest
ollama pull <model>(do not auto-pull). - Pre-chat healthcheck endpoint with short timeout.
-
AC: With no Ollama, UI shows actionable error, app stays usable. With Ollama up, models list renders within TTL.
-
Tests: Mock 200/500/timeout; simulate missing model.
2) feat(provider/ollama-cloud): separate provider with auth header & key-gating
-
Why: Prevent 401s; enable cloud only when key is present.
-
Implement:
- New provider “ollama-cloud”.
- Build
reqwest::Clientwith default headers incl.Authorization: Bearer <API_KEY>. - Base URL from config (default
[Inference] https://api.ollama.com). - If no key: provider not registered; cloud models hidden.
-
AC: With key → cloud models list & chat work; without key → no cloud provider listed.
-
Tests: No key (hidden), invalid key (401 handled), valid key (happy path).
3) fix(provider/ollama-cloud): handle 401/429 gracefully, auto-degrade
-
Why: Avoid dead UX where cloud is selectable but unusable.
-
Implement:
- Intercept 401/403 → mark provider “unauthorized”, show toast “Cloud key invalid; using local”.
- Intercept 429 → toast “Cloud rate limit hit; retry later”; keep provider enabled.
- Auto-fallback to last good local provider for the active chat (don’t lose message).
-
AC: Selecting cloud with bad key never traps the user; one click recovers to local.
-
Tests: Simulated 401/429 mid-stream & at start.
4) feat(models/registry): multi-provider registry + aggregated model list
-
Why: Show local & cloud side-by-side, clearly labeled.
-
Implement:
- Registry holds N providers (local, cloud).
- Aggregate lists with
provider_tag=ollama/ollama-cloud. - De-dupe identical names by appending provider tag in UI (“qwen3:8b · local”, “qwen3:8b-cloud · cloud”).
-
AC: Model picker groups by provider; switching respects selection.
-
Tests: Only local; only cloud; both; de-dup checks.
5) refactor(session): message pipeline to support tool-calls & partial updates
-
Why: Prepare for search tool loop and robust streaming.
-
Implement:
- Centralize stream parser → yields either
assistant_textchunks ortool_callmessages. - Buffer + flush on
doneor on explicit tool boundary.
- Centralize stream parser → yields either
-
AC: No regressions in normal chat; tool call boundary events visible to controller.
-
Tests: Stream with mixed content, malformed frames, newline-split JSON.
6) fix(streaming): tolerant JSON lines parser & end-of-stream semantics
-
Why: Avoid UI stalls on minor protocol hiccups.
-
Implement:
- Accept
\n/\r\ndelimiters; ignore empty lines; robustdone: truehandling; final metrics extraction.
- Accept
-
AC: Long completions never hang; final token counts captured.
-
Tests: Fuzz malformed frames.
7) feat(ui/header): context usage indicator (value + %), colorized thresholds
-
Why: Transparency: “2560 / 8192 (31%)”.
-
Implement:
- Track last
prompt_eval_count(or estimate) as “context used”. - Context window: from model metadata (if available) else config default per provider/model family.
- Header shows:
Model · Context 2.6k / 8k (33%). Colors: <60% normal, 60–85% warn, >85% danger.
- Track last
-
AC: Live updates after each assistant turn.
-
Tests: Various windows & counts; edge 0%/100%.
8) feat(usage): token usage tracker (hourly/weekly), local store
-
Why: Cloud limits visibility even without official API.
-
Implement:
- Append per-provider token use after each response: prompt+completion.
- Rolling sums: last 60m, last 7d.
- Persist to JSON/SQLite in app data dir; prune daily.
- Configurable quotas:
providers.ollama_cloud.hourly_quota_tokens,weekly_quota_tokens.
-
AC:
:limitsshows totals & percentages; survives restart. -
Tests: Time-travel tests; persistence.
9) feat(ui/usage): surface hourly/weekly usage + threshold toasts
-
Why: Prevent surprises; act before you hit the wall.
-
Implement:
- Header second line (when cloud active):
Cloud usage · 12k/50k (hour) · 90k/250k (week)with % colors. - Toast at 80% & 95% thresholds.
- Header second line (when cloud active):
-
AC: Visuals match tracker; toasts fire once per threshold band.
-
Tests: Threshold crossings; reset behavior.
10) feat(tools): generic tool-calling loop (function-call adapter)
-
Why: “Use the web” without whining.
-
Implement:
- Internal tool schema:
{ name, parameters(JSONSchema), invoke(args)->ToolResult }. - Provider adapter: if model emits a tool call → pause stream, invoke tool, append
toolmessage, resume. - Guardrails: max tool hops (e.g., 3), max tokens per hop.
- Internal tool schema:
-
AC: Model can call tools mid-answer; transcript shows tool result context to model (not necessarily to user).
-
Tests: Single & chained tool calls; loop cutoff.
11) feat(tool/web.search): HTTP search wrapper for cloud mode; fallback stub
-
Why: Default “internet” capability.
-
Implement:
- Tool name:
web.searchwith params{ query: string }. - Cloud path: call provider’s search endpoint (configurable path); include API key.
- Fallback (no cloud): disabled; model will not see the tool (prevents “I can’t” messaging).
- Tool name:
-
AC: Asking “what happened today in Rust X?” triggers one search call and integrates snippets into the answer.
-
Tests: Happy path, 401/key missing (tool hidden), network failure (tool error surfaced to model).
12) feat(commands): :provider, :model, :limits, :web on|off
-
Why: Fast control for power users.
-
Implement:
:providerlists/switches active provider.:modellists/switches models under active provider.:limitsshows tracked usage.:web on|offtoggles exposingweb.searchto the model.
-
AC: Commands work in both TUI and CLI modes.
-
Tests: Command parsing & side-effects.
13) feat(config): explicit sections & env fallbacks
-
Why: Clear separation + easy secrets management.
-
Implement (example):
[providers.ollama] base_url = "http://localhost:11434" list_ttl_secs = 60 default_context_window = 8192 [providers.ollama_cloud] enabled = true base_url = "https://api.ollama.com" # [Inference] default api_key = "" # read from env OLLEN_OLLAMA_CLOUD_API_KEY if empty hourly_quota_tokens = 50000 # configurable; visual only weekly_quota_tokens = 250000 # configurable; visual only list_ttl_secs = 60 -
AC: Empty key → cloud disabled; env overrides work.
-
Tests: Env vs file precedence.
14) docs(readme): setup & troubleshooting for local + cloud
-
Why: Reduce support pings.
-
Include:
- Local prerequisites; healthcheck tips; “model not found” guidance.
- Cloud key setup; common 401/429 causes; privacy note.
- Context/limits UI explainer; tool behavior & toggles.
- Upgrade notes from v0.1 → v0.2.
-
AC: New users can configure both in <5 min.
15) test(integration): local-only, cloud-only, mixed; auth & rate-limit sims
-
Why: Prevent regressions.
-
Implement:
- Wiremock stubs for cloud:
/tags,/chat, search endpoint, 401/429 cases. - Snapshot test for tool-call transcript roundtrip.
- Wiremock stubs for cloud:
-
AC: Green suite in CI; reproducible.
16) refactor(errors): typed provider errors + UI toasts
- Why: Replace generic “something broke”.
- Implement: Error enum:
Unavailable,Unauthorized,RateLimited,Timeout,Protocol. Map to toasts/banners. - AC: Each known failure surfaces a precise message; logs redact secrets.
17) perf(models): cache model lists with TTL; invalidate on user action
- Why: Reduce network churn.
- AC: Re-open picker doesn’t re-fetch until TTL or manual refresh.
18) chore(release): bump to v0.2; changelog; package metadata
- AC:
--versionshows v0.2; changelog includes above bullets.
Missing Features vs. Codex / Claude-Code (Queued for v0.3+)
- Multi-vendor providers: OpenAI & Anthropic provider crates with function-calling parity and structured outputs.
- Agentic coding ops: Safe file edits,
gitops, shell exec with approval queue. - “Thinking/Plan” pane: First-class rendering of plan/reflect steps; optional auto-approve.
- Plugin system: Tool discovery & permissions; marketplace later.
- IDE integration: Minimal LSP/bridge (VS Code/JetBrains) to run Owlen as backend.
- Retrieval/RAG: Local project indexing; selective context injection with token budgeter.
Acceptance Checklist (Release Gate)
- Local Ollama: healthcheck OK, models list OK, chat OK.
- Cloud appears only with valid key; no 401 loops; 429 shows toast.
- Header shows context value and %; updates after each turn.
:limitsshows hourly/weekly tallies; persists across restarts.- Asking for current info triggers
web.search(cloud on) and returns an answer without “I can’t access the internet”. - Docs updated; sample config included; tests green.
Notes & Caveats
- Base URLs & endpoints: Kept config-driven. Defaults are
[Inference]. If your cloud endpoint differs, set it inconfig.toml. - Quotas: No official token quota API assumed; we track locally and let you configure limits for UI display.
- Tooling: If a given model doesn’t emit tool calls,
web.searchwon’t be visible to it. You can force exposure via:web on.
Lightly Opinionated Guidance (aka sparring stance)
- Don’t auto-pull models on errors—teach users where control lives.
- Keep cloud totally opt-in; hide it on misconfig to reduce paper cuts.
- Treat tool-calling like
sudo: short leash, clear logs, obvious toggles. - Make the header a cockpit, not a billboard: model · context · usage; that’s it.
If you want, I can turn this into a series of codex apply-ready prompts with one commit per run.