Files
owlen/AGENTS.md
vikingowl 4ce4ac0b0e docs(agents): rewrite AGENTS.md with detailed v0.2 execution plan
Replaces the original overview with a comprehensive execution plan for Owlen v0.2, including:
- Provider health checks and resilient model listing for Ollama and Ollama Cloud
- Cloud key‑gating, rate‑limit handling, and usage tracking
- Multi‑provider model registry and UI aggregation
- Session pipeline refactor for tool calls and partial updates
- Robust streaming JSON parser
- UI header displaying context usage percentages
- Token usage tracker with hourly/weekly limits and toasts
- New web.search tool wrapper and related commands
- Expanded command set (`:provider`, `:model`, `:limits`, `:web`) and config sections
- Additional documentation, testing guidelines, and release notes for v0.2.
2025-10-18 04:52:07 +02:00

12 KiB
Raw Blame History

AGENTS.md — Owlen v0.2 Execution Plan

Focus: Ollama (local) , Ollama Cloud (with API key) , context/limits UI , seamless web-search tool Style: Medium-sized conventional commits, each with acceptance criteria & test notes. Definition of Done (DoD):

  • Local Ollama works out-of-the-box (no key required).
  • Ollama Cloud is available only when an API key is present; no more 401 loops; clear fallback to local.
  • Chat header shows context used / context window and %.
  • Cloud usage shows hourly / weekly token usage (tracked locally; limits configurable).
  • “Internet?” → the model/tooling performs web search automatically; no “I cant access the internet” replies.

Release Assumptions

  • Provider config has separate entries for local (“ollama”) and cloud (“ollama-cloud”), both optional.
  • Cloud base URL is config-driven (default [Inference] https://api.ollama.com), local default http://localhost:11434.
  • Cloud requests include Authorization: Bearer <API_KEY>.
  • Token counts (prompt_eval_count, eval_count) are read from provider responses if available; else fallback to local estimation.
  • Rate-limit quotas unknown → user-configurable (hourly_quota_tokens, weekly_quota_tokens); we still track actual usage.

Commit Plan (Conventional Commits)

1) feat(provider/ollama): health checks, resilient model listing, friendly errors

  • Why: Local should “just work”; no brittle failures if model not pulled or daemon is down.

  • Implement:

    • GET /api/tags with timeout & retry; on failure → user-friendly banner (“Ollama not reachable on localhost:11434”).
    • If chat hits “model not found” → suggest ollama pull <model> (do not auto-pull).
    • Pre-chat healthcheck endpoint with short timeout.
  • AC: With no Ollama, UI shows actionable error, app stays usable. With Ollama up, models list renders within TTL.

  • Tests: Mock 200/500/timeout; simulate missing model.

2) feat(provider/ollama-cloud): separate provider with auth header & key-gating

  • Why: Prevent 401s; enable cloud only when key is present.

  • Implement:

    • New provider “ollama-cloud”.
    • Build reqwest::Client with default headers incl. Authorization: Bearer <API_KEY>.
    • Base URL from config (default [Inference] https://api.ollama.com).
    • If no key: provider not registered; cloud models hidden.
  • AC: With key → cloud models list & chat work; without key → no cloud provider listed.

  • Tests: No key (hidden), invalid key (401 handled), valid key (happy path).

3) fix(provider/ollama-cloud): handle 401/429 gracefully, auto-degrade

  • Why: Avoid dead UX where cloud is selectable but unusable.

  • Implement:

    • Intercept 401/403 → mark provider “unauthorized”, show toast “Cloud key invalid; using local”.
    • Intercept 429 → toast “Cloud rate limit hit; retry later”; keep provider enabled.
    • Auto-fallback to last good local provider for the active chat (dont lose message).
  • AC: Selecting cloud with bad key never traps the user; one click recovers to local.

  • Tests: Simulated 401/429 mid-stream & at start.

4) feat(models/registry): multi-provider registry + aggregated model list

  • Why: Show local & cloud side-by-side, clearly labeled.

  • Implement:

    • Registry holds N providers (local, cloud).
    • Aggregate lists with provider_tag = ollama / ollama-cloud.
    • De-dupe identical names by appending provider tag in UI (“qwen3:8b · local”, “qwen3:8b-cloud · cloud”).
  • AC: Model picker groups by provider; switching respects selection.

  • Tests: Only local; only cloud; both; de-dup checks.

5) refactor(session): message pipeline to support tool-calls & partial updates

  • Why: Prepare for search tool loop and robust streaming.

  • Implement:

    • Centralize stream parser → yields either assistant_text chunks or tool_call messages.
    • Buffer + flush on done or on explicit tool boundary.
  • AC: No regressions in normal chat; tool call boundary events visible to controller.

  • Tests: Stream with mixed content, malformed frames, newline-split JSON.

6) fix(streaming): tolerant JSON lines parser & end-of-stream semantics

  • Why: Avoid UI stalls on minor protocol hiccups.

  • Implement:

    • Accept \n/\r\n delimiters; ignore empty lines; robust done: true handling; final metrics extraction.
  • AC: Long completions never hang; final token counts captured.

  • Tests: Fuzz malformed frames.

7) feat(ui/header): context usage indicator (value + %), colorized thresholds

  • Why: Transparency: “2560 / 8192 (31%)”.

  • Implement:

    • Track last prompt_eval_count (or estimate) as “context used”.
    • Context window: from model metadata (if available) else config default per provider/model family.
    • Header shows: Model · Context 2.6k / 8k (33%). Colors: <60% normal, 6085% warn, >85% danger.
  • AC: Live updates after each assistant turn.

  • Tests: Various windows & counts; edge 0%/100%.

8) feat(usage): token usage tracker (hourly/weekly), local store

  • Why: Cloud limits visibility even without official API.

  • Implement:

    • Append per-provider token use after each response: prompt+completion.
    • Rolling sums: last 60m, last 7d.
    • Persist to JSON/SQLite in app data dir; prune daily.
    • Configurable quotas: providers.ollama_cloud.hourly_quota_tokens, weekly_quota_tokens.
  • AC: :limits shows totals & percentages; survives restart.

  • Tests: Time-travel tests; persistence.

9) feat(ui/usage): surface hourly/weekly usage + threshold toasts

  • Why: Prevent surprises; act before you hit the wall.

  • Implement:

    • Header second line (when cloud active): Cloud usage · 12k/50k (hour) · 90k/250k (week) with % colors.
    • Toast at 80% & 95% thresholds.
  • AC: Visuals match tracker; toasts fire once per threshold band.

  • Tests: Threshold crossings; reset behavior.

10) feat(tools): generic tool-calling loop (function-call adapter)

  • Why: “Use the web” without whining.

  • Implement:

    • Internal tool schema: { name, parameters(JSONSchema), invoke(args)->ToolResult }.
    • Provider adapter: if model emits a tool call → pause stream, invoke tool, append tool message, resume.
    • Guardrails: max tool hops (e.g., 3), max tokens per hop.
  • AC: Model can call tools mid-answer; transcript shows tool result context to model (not necessarily to user).

  • Tests: Single & chained tool calls; loop cutoff.

11) feat(tool/web.search): HTTP search wrapper for cloud mode; fallback stub

  • Why: Default “internet” capability.

  • Implement:

    • Tool name: web.search with params { query: string }.
    • Cloud path: call providers search endpoint (configurable path); include API key.
    • Fallback (no cloud): disabled; model will not see the tool (prevents “I cant” messaging).
  • AC: Asking “what happened today in Rust X?” triggers one search call and integrates snippets into the answer.

  • Tests: Happy path, 401/key missing (tool hidden), network failure (tool error surfaced to model).

12) feat(commands): :provider, :model, :limits, :web on|off

  • Why: Fast control for power users.

  • Implement:

    • :provider lists/switches active provider.
    • :model lists/switches models under active provider.
    • :limits shows tracked usage.
    • :web on|off toggles exposing web.search to the model.
  • AC: Commands work in both TUI and CLI modes.

  • Tests: Command parsing & side-effects.

13) feat(config): explicit sections & env fallbacks

  • Why: Clear separation + easy secrets management.

  • Implement (example):

    [providers.ollama]
    base_url = "http://localhost:11434"
    list_ttl_secs = 60
    default_context_window = 8192
    
    [providers.ollama_cloud]
    enabled = true
    base_url = "https://api.ollama.com"        # [Inference] default
    api_key = ""                                # read from env OLLEN_OLLAMA_CLOUD_API_KEY if empty
    hourly_quota_tokens = 50000                 # configurable; visual only
    weekly_quota_tokens = 250000                # configurable; visual only
    list_ttl_secs = 60
    
  • AC: Empty key → cloud disabled; env overrides work.

  • Tests: Env vs file precedence.

14) docs(readme): setup & troubleshooting for local + cloud

  • Why: Reduce support pings.

  • Include:

    • Local prerequisites; healthcheck tips; “model not found” guidance.
    • Cloud key setup; common 401/429 causes; privacy note.
    • Context/limits UI explainer; tool behavior & toggles.
    • Upgrade notes from v0.1 → v0.2.
  • AC: New users can configure both in <5 min.

15) test(integration): local-only, cloud-only, mixed; auth & rate-limit sims

  • Why: Prevent regressions.

  • Implement:

    • Wiremock stubs for cloud: /tags, /chat, search endpoint, 401/429 cases.
    • Snapshot test for tool-call transcript roundtrip.
  • AC: Green suite in CI; reproducible.

16) refactor(errors): typed provider errors + UI toasts

  • Why: Replace generic “something broke”.
  • Implement: Error enum: Unavailable, Unauthorized, RateLimited, Timeout, Protocol. Map to toasts/banners.
  • AC: Each known failure surfaces a precise message; logs redact secrets.

17) perf(models): cache model lists with TTL; invalidate on user action

  • Why: Reduce network churn.
  • AC: Re-open picker doesnt re-fetch until TTL or manual refresh.

18) chore(release): bump to v0.2; changelog; package metadata

  • AC: --version shows v0.2; changelog includes above bullets.

Missing Features vs. Codex / Claude-Code (Queued for v0.3+)

  • Multi-vendor providers: OpenAI & Anthropic provider crates with function-calling parity and structured outputs.
  • Agentic coding ops: Safe file edits, git ops, shell exec with approval queue.
  • “Thinking/Plan” pane: First-class rendering of plan/reflect steps; optional auto-approve.
  • Plugin system: Tool discovery & permissions; marketplace later.
  • IDE integration: Minimal LSP/bridge (VS Code/JetBrains) to run Owlen as backend.
  • Retrieval/RAG: Local project indexing; selective context injection with token budgeter.

Acceptance Checklist (Release Gate)

  • Local Ollama: healthcheck OK, models list OK, chat OK.
  • Cloud appears only with valid key; no 401 loops; 429 shows toast.
  • Header shows context value and %; updates after each turn.
  • :limits shows hourly/weekly tallies; persists across restarts.
  • Asking for current info triggers web.search (cloud on) and returns an answer without “I cant access the internet”.
  • Docs updated; sample config included; tests green.

Notes & Caveats

  • Base URLs & endpoints: Kept config-driven. Defaults are [Inference]. If your cloud endpoint differs, set it in config.toml.
  • Quotas: No official token quota API assumed; we track locally and let you configure limits for UI display.
  • Tooling: If a given model doesnt emit tool calls, web.search wont be visible to it. You can force exposure via :web on.

Lightly Opinionated Guidance (aka sparring stance)

  • Dont auto-pull models on errors—teach users where control lives.
  • Keep cloud totally opt-in; hide it on misconfig to reduce paper cuts.
  • Treat tool-calling like sudo: short leash, clear logs, obvious toggles.
  • Make the header a cockpit, not a billboard: model · context · usage; thats it.

If you want, I can turn this into a series of codex apply-ready prompts with one commit per run.