Files

vikingowl 4ce4ac0b0e docs(agents): rewrite AGENTS.md with detailed v0.2 execution plan

Replaces the original overview with a comprehensive execution plan for Owlen v0.2, including:
- Provider health checks and resilient model listing for Ollama and Ollama Cloud
- Cloud key‑gating, rate‑limit handling, and usage tracking
- Multi‑provider model registry and UI aggregation
- Session pipeline refactor for tool calls and partial updates
- Robust streaming JSON parser
- UI header displaying context usage percentages
- Token usage tracker with hourly/weekly limits and toasts
- New web.search tool wrapper and related commands
- Expanded command set (`:provider`, `:model`, `:limits`, `:web`) and config sections
- Additional documentation, testing guidelines, and release notes for v0.2.

2025-10-18 04:52:07 +02:00

12 KiB

Raw Blame History

AGENTS.md — Owlen v0.2 Execution Plan

Focus: Ollama (local) ✅, Ollama Cloud (with API key) ✅, context/limits UI ✅, seamless web-search tool ✅ Style: Medium-sized conventional commits, each with acceptance criteria & test notes. Definition of Done (DoD):

Local Ollama works out-of-the-box (no key required).
Ollama Cloud is available only when an API key is present; no more 401 loops; clear fallback to local.
Chat header shows context used / context window and %.
Cloud usage shows hourly / weekly token usage (tracked locally; limits configurable).
“Internet?” → the model/tooling performs web search automatically; no “I can’t access the internet” replies.

Release Assumptions

Provider config has separate entries for local (“ollama”) and cloud (“ollama-cloud”), both optional.
Cloud base URL is config-driven (default [Inference] https://api.ollama.com), local default http://localhost:11434.
Cloud requests include Authorization: Bearer <API_KEY>.
Token counts (prompt_eval_count, eval_count) are read from provider responses if available; else fallback to local estimation.
Rate-limit quotas unknown → user-configurable (hourly_quota_tokens, weekly_quota_tokens); we still track actual usage.

Commit Plan (Conventional Commits)

1) feat(provider/ollama): health checks, resilient model listing, friendly errors

Why: Local should “just work”; no brittle failures if model not pulled or daemon is down.
Implement:
- GET /api/tags with timeout & retry; on failure → user-friendly banner (“Ollama not reachable on localhost:11434”).
- If chat hits “model not found” → suggest ollama pull <model> (do not auto-pull).
- Pre-chat healthcheck endpoint with short timeout.
AC: With no Ollama, UI shows actionable error, app stays usable. With Ollama up, models list renders within TTL.
Tests: Mock 200/500/timeout; simulate missing model.

2) feat(provider/ollama-cloud): separate provider with auth header & key-gating

Why: Prevent 401s; enable cloud only when key is present.
Implement:
- New provider “ollama-cloud”.
- Build reqwest::Client with default headers incl. Authorization: Bearer <API_KEY>.
- Base URL from config (default [Inference] https://api.ollama.com).
- If no key: provider not registered; cloud models hidden.
AC: With key → cloud models list & chat work; without key → no cloud provider listed.
Tests: No key (hidden), invalid key (401 handled), valid key (happy path).

3) fix(provider/ollama-cloud): handle 401/429 gracefully, auto-degrade

Why: Avoid dead UX where cloud is selectable but unusable.
Implement:
- Intercept 401/403 → mark provider “unauthorized”, show toast “Cloud key invalid; using local”.
- Intercept 429 → toast “Cloud rate limit hit; retry later”; keep provider enabled.
- Auto-fallback to last good local provider for the active chat (don’t lose message).
AC: Selecting cloud with bad key never traps the user; one click recovers to local.
Tests: Simulated 401/429 mid-stream & at start.

4) feat(models/registry): multi-provider registry + aggregated model list

Why: Show local & cloud side-by-side, clearly labeled.
Implement:
- Registry holds N providers (local, cloud).
- Aggregate lists with provider_tag = ollama / ollama-cloud.
- De-dupe identical names by appending provider tag in UI (“qwen3:8b · local”, “qwen3:8b-cloud · cloud”).
AC: Model picker groups by provider; switching respects selection.
Tests: Only local; only cloud; both; de-dup checks.

5) refactor(session): message pipeline to support tool-calls & partial updates

Why: Prepare for search tool loop and robust streaming.
Implement:
- Centralize stream parser → yields either assistant_text chunks or tool_call messages.
- Buffer + flush on done or on explicit tool boundary.
AC: No regressions in normal chat; tool call boundary events visible to controller.
Tests: Stream with mixed content, malformed frames, newline-split JSON.

6) fix(streaming): tolerant JSON lines parser & end-of-stream semantics

Why: Avoid UI stalls on minor protocol hiccups.
Implement:
- Accept \n/\r\n delimiters; ignore empty lines; robust done: true handling; final metrics extraction.
AC: Long completions never hang; final token counts captured.
Tests: Fuzz malformed frames.

7) feat(ui/header): context usage indicator (value + %), colorized thresholds

Why: Transparency: “2560 / 8192 (31%)”.
Implement:
- Track last prompt_eval_count (or estimate) as “context used”.
- Context window: from model metadata (if available) else config default per provider/model family.
- Header shows: Model · Context 2.6k / 8k (33%). Colors: <60% normal, 60–85% warn, >85% danger.
AC: Live updates after each assistant turn.
Tests: Various windows & counts; edge 0%/100%.

8) feat(usage): token usage tracker (hourly/weekly), local store

Why: Cloud limits visibility even without official API.
Implement:
- Append per-provider token use after each response: prompt+completion.
- Rolling sums: last 60m, last 7d.
- Persist to JSON/SQLite in app data dir; prune daily.
- Configurable quotas: providers.ollama_cloud.hourly_quota_tokens, weekly_quota_tokens.
AC: :limits shows totals & percentages; survives restart.
Tests: Time-travel tests; persistence.

9) feat(ui/usage): surface hourly/weekly usage + threshold toasts

Why: Prevent surprises; act before you hit the wall.
Implement:
- Header second line (when cloud active): Cloud usage · 12k/50k (hour) · 90k/250k (week) with % colors.
- Toast at 80% & 95% thresholds.
AC: Visuals match tracker; toasts fire once per threshold band.
Tests: Threshold crossings; reset behavior.

10) feat(tools): generic tool-calling loop (function-call adapter)

Why: “Use the web” without whining.
Implement:
- Internal tool schema: { name, parameters(JSONSchema), invoke(args)->ToolResult }.
- Provider adapter: if model emits a tool call → pause stream, invoke tool, append tool message, resume.
- Guardrails: max tool hops (e.g., 3), max tokens per hop.
AC: Model can call tools mid-answer; transcript shows tool result context to model (not necessarily to user).
Tests: Single & chained tool calls; loop cutoff.

11) feat(tool/web.search): HTTP search wrapper for cloud mode; fallback stub

Why: Default “internet” capability.
Implement:
- Tool name: web.search with params { query: string }.
- Cloud path: call provider’s search endpoint (configurable path); include API key.
- Fallback (no cloud): disabled; model will not see the tool (prevents “I can’t” messaging).
AC: Asking “what happened today in Rust X?” triggers one search call and integrates snippets into the answer.
Tests: Happy path, 401/key missing (tool hidden), network failure (tool error surfaced to model).

12) feat(commands): `:provider`, `:model`, `:limits`, `:web on|off`

Why: Fast control for power users.
Implement:
- :provider lists/switches active provider.
- :model lists/switches models under active provider.
- :limits shows tracked usage.
- :web on|off toggles exposing web.search to the model.
AC: Commands work in both TUI and CLI modes.
Tests: Command parsing & side-effects.

13) feat(config): explicit sections & env fallbacks

Why: Clear separation + easy secrets management.

Implement (example):

[providers.ollama]
base_url = "http://localhost:11434"
list_ttl_secs = 60
default_context_window = 8192

[providers.ollama_cloud]
enabled = true
base_url = "https://api.ollama.com"        # [Inference] default
api_key = ""                                # read from env OLLEN_OLLAMA_CLOUD_API_KEY if empty
hourly_quota_tokens = 50000                 # configurable; visual only
weekly_quota_tokens = 250000                # configurable; visual only
list_ttl_secs = 60

AC: Empty key → cloud disabled; env overrides work.
Tests: Env vs file precedence.

14) docs(readme): setup & troubleshooting for local + cloud

Why: Reduce support pings.
Include:
- Local prerequisites; healthcheck tips; “model not found” guidance.
- Cloud key setup; common 401/429 causes; privacy note.
- Context/limits UI explainer; tool behavior & toggles.
- Upgrade notes from v0.1 → v0.2.
AC: New users can configure both in <5 min.

15) test(integration): local-only, cloud-only, mixed; auth & rate-limit sims

Why: Prevent regressions.
Implement:
- Wiremock stubs for cloud: /tags, /chat, search endpoint, 401/429 cases.
- Snapshot test for tool-call transcript roundtrip.
AC: Green suite in CI; reproducible.

16) refactor(errors): typed provider errors + UI toasts

Why: Replace generic “something broke”.
Implement: Error enum: Unavailable, Unauthorized, RateLimited, Timeout, Protocol. Map to toasts/banners.
AC: Each known failure surfaces a precise message; logs redact secrets.

17) perf(models): cache model lists with TTL; invalidate on user action

Why: Reduce network churn.
AC: Re-open picker doesn’t re-fetch until TTL or manual refresh.

18) chore(release): bump to v0.2; changelog; package metadata

AC: --version shows v0.2; changelog includes above bullets.

Missing Features vs. Codex / Claude-Code (Queued for v0.3+)

Multi-vendor providers: OpenAI & Anthropic provider crates with function-calling parity and structured outputs.
Agentic coding ops: Safe file edits, git ops, shell exec with approval queue.
“Thinking/Plan” pane: First-class rendering of plan/reflect steps; optional auto-approve.
Plugin system: Tool discovery & permissions; marketplace later.
IDE integration: Minimal LSP/bridge (VS Code/JetBrains) to run Owlen as backend.
Retrieval/RAG: Local project indexing; selective context injection with token budgeter.

Acceptance Checklist (Release Gate)

Local Ollama: healthcheck OK, models list OK, chat OK.
Cloud appears only with valid key; no 401 loops; 429 shows toast.
Header shows context value and %; updates after each turn.
:limits shows hourly/weekly tallies; persists across restarts.
Asking for current info triggers web.search (cloud on) and returns an answer without “I can’t access the internet”.
Docs updated; sample config included; tests green.

Notes & Caveats

Base URLs & endpoints: Kept config-driven. Defaults are [Inference]. If your cloud endpoint differs, set it in config.toml.
Quotas: No official token quota API assumed; we track locally and let you configure limits for UI display.
Tooling: If a given model doesn’t emit tool calls, web.search won’t be visible to it. You can force exposure via :web on.

Lightly Opinionated Guidance (aka sparring stance)

Don’t auto-pull models on errors—teach users where control lives.
Keep cloud totally opt-in; hide it on misconfig to reduce paper cuts.
Treat tool-calling like sudo: short leash, clear logs, obvious toggles.
Make the header a cockpit, not a billboard: model · context · usage; that’s it.

If you want, I can turn this into a series of codex apply-ready prompts with one commit per run.

12 KiB Raw Blame History Unescape Escape