f0633d8ac6
Mistral provider adapter with streaming, tool calls (single-chunk pattern), stop reason inference, model listing, capabilities, and JSON output support. Tool system: bash (7 security checks, shell alias harvesting for bash/zsh/fish), file ops (read, write, edit, glob, grep, ls). Alias harvesting collects 300+ aliases from user's shell config. Engine agentic loop: stream → tool execution → re-query → until done. Tool gating on model capabilities. Max turns safety limit. CLI pipe mode: echo "prompt" | gnoma streams response to stdout. Flags: --provider, --model, --system, --api-key, --max-turns, --verbose, --version. Provider interface expanded: Models(), DefaultModel(), Capabilities (ToolUse, JSONOutput, Vision, Thinking, ContextWindow, MaxOutput), ResponseFormat with JSON schema support. Live verified: text streaming + tool calling with devstral-small. 117 tests across 8 packages, 10MB binary.
274 lines
12 KiB
Markdown
274 lines
12 KiB
Markdown
---
|
||
essential: milestones
|
||
status: complete
|
||
last_updated: 2026-04-03
|
||
project: gnoma
|
||
depends_on: [vision]
|
||
---
|
||
|
||
# Milestones
|
||
|
||
## Overview
|
||
|
||
| # | Name | Core Deliverable | Deps |
|
||
|---|------|-----------------|------|
|
||
| M1 | Core Engine | Pipe mode, Mistral, tools, agentic loop | — |
|
||
| M2 | Multi-Provider | All providers, config, dynamic switching | M1 |
|
||
| M3 | Security Firewall | Request/response scanning, redaction, incognito | M2 |
|
||
| M4 | Router Foundation | Arm registry, pools, task classifier, heuristic selection | M2 |
|
||
| M5 | TUI | Bubble Tea, 6 permission modes, config screen | M3, M4 |
|
||
| M6 | Context Intelligence | Local tokenizer, full compaction (truncate + summarize) | M5 |
|
||
| M7 | Elfs | Router-integrated sub-agents, parallel work | M4, M6 |
|
||
| M8 | Extensibility | Hooks, skills, MCP client, MCP tool replaceability, plugins | M7 |
|
||
| M9 | Router Advanced | Bandit core, feedback, ensemble strategies, state persistence | M7 |
|
||
| M10 | Persistence & Serve | SQLite sessions, serve mode, coordinator | M7 |
|
||
| M11 | Task Learning | Pattern recognition, task suggestions, persistent tasks | M9 |
|
||
| M12 | Thinking & Structured Output | Thinking modes, schema validation | M2 |
|
||
| M13 | Auth | OAuth PKCE, keyring, multi-account | M5 |
|
||
| M14 | Observability | Feature flags, telemetry, cost dashboards | M10 |
|
||
| M15 | Web UI | `gnoma web` CLI flag, browser UI via serve mode | M10 |
|
||
|
||
---
|
||
|
||
## M1: Core Engine (MVP)
|
||
|
||
**Scope:** First working assistant. CLI pipe mode. Mistral as reference provider. Bash + file tools (with 7 critical security checks). No TUI, no permissions, no config file.
|
||
|
||
**Deliverables:**
|
||
|
||
- [x] Architecture docs in `docs/essentials/`
|
||
- [ ] Foundation types (`internal/message/`)
|
||
- [ ] Streaming abstraction (`internal/stream/`)
|
||
- [ ] Provider interface + Mistral adapter
|
||
- [ ] Tool system: bash (with security checks), fs.read, fs.write, fs.edit, fs.glob, fs.grep
|
||
- [ ] Engine agentic loop (stream → tool → re-query → done)
|
||
- [ ] CLI pipe mode (`echo "list files" | gnoma`)
|
||
|
||
**Exit criteria:** Pipe a coding question in, get a response that uses tools, answer on stdout.
|
||
|
||
## M2: Multi-Provider
|
||
|
||
**Scope:** All remaining providers. TOML config with layered loading. Dynamic provider switching.
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] TOML config system (defaults → user → project → env → flags)
|
||
- [ ] API key resolution from env vars and config
|
||
- [ ] Anthropic provider (streaming + tool use + thinking blocks)
|
||
- [ ] OpenAI provider (streaming + tool use)
|
||
- [ ] Google provider (streaming + function calling, goroutine bridge)
|
||
- [ ] OpenAI-compat for Ollama and llama.cpp
|
||
- [ ] `--provider` / `--model` flag switching
|
||
|
||
**Exit criteria:** `echo "hello" | gnoma --provider openai` works. All 5+ providers functional.
|
||
|
||
## M3: Security Firewall
|
||
|
||
**Scope:** Core security layer built into gnoma. Scans outgoing LLM requests and incoming tool results for sensitive data. Redacts or blocks. Incognito mode.
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] Secret scanner (gitleaks-derived, 40+ regex patterns, Shannon entropy detection)
|
||
- [ ] Unicode sanitization (NFKC + Cf/Co/Cn stripping, recursive on nested structs)
|
||
- [ ] Redactor (replace matched groups with `[REDACTED]`, preserve context)
|
||
- [ ] Configurable rules (regex patterns, action: redact/block/warn)
|
||
- [ ] Remaining bash security checks (checks 8-23 from CC bashSecurity.ts)
|
||
- [ ] Incognito mode: no persistence, no learning, no logging, optional local-only routing
|
||
- [ ] `--incognito` CLI flag
|
||
|
||
**Exit criteria:** Provider requests with embedded API keys get redacted. Incognito suppresses all persistence. Unicode attack vectors sanitized.
|
||
|
||
## M4: Router Foundation
|
||
|
||
**Scope:** Arm registry, limit pools, task classification, heuristic selection. Engine switches from direct provider calls to `router.Select()`.
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] Arm type (provider+model pair) with capability introspection
|
||
- [ ] Limit pools (RPM, RPD, tokens/day, cost caps, custom units)
|
||
- [ ] Pool tracker with optimistic reservation and scarcity multipliers
|
||
- [ ] Task classifier (10 types: Boilerplate, Generation, Refactor, Review, UnitTest, Planning, Orchestration, SecurityReview, Debug, Explain)
|
||
- [ ] Complexity scoring and value scoring
|
||
- [ ] Heuristic arm selection (score = quality × value / effective_cost)
|
||
- [ ] Background provider discovery (poll ollama, llama.cpp, API providers)
|
||
- [ ] Engine integration: `router.Select()` replaces direct provider calls
|
||
|
||
**Exit criteria:** Engine routes tasks through router. Limit pools track consumption. Task classification works for 10 types.
|
||
|
||
## M5: TUI
|
||
|
||
**Scope:** Interactive terminal UI. Full 6-mode permission system. Session management. In-app config. Incognito toggle.
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] Permission system with all 6 modes:
|
||
- `default` — prompt for each tool invocation
|
||
- `acceptEdits` — auto-allow file ops, prompt for bash/destructive
|
||
- `bypass` — allow everything
|
||
- `deny` — deny all unless explicit allow rule
|
||
- `plan` — read-only tools only
|
||
- `auto` — router task classification + tool risk scoring
|
||
- [ ] Permission rules with compound bash command decomposition (via `mvdan.cc/sh` AST)
|
||
- [ ] 7-step permission decision flow (deny gates → tool check → safety → mode → allow → passthrough → hooks)
|
||
- [ ] Bubble Tea TUI: chat panel, input, streaming output
|
||
- [ ] Status bar (provider, model, tokens, incognito indicator)
|
||
- [ ] Permission prompt overlay
|
||
- [ ] Model picker overlay
|
||
- [ ] In-app config editor (`/config` command)
|
||
- [ ] Incognito toggle (`/incognito` command)
|
||
- [ ] Interactive shell pane: `/shell` command or keybinding opens PTY-connected shell
|
||
- For commands needing user input (sudo, ssh, git push with auth, passwd prompts)
|
||
- Bash tool detects potentially interactive commands and suggests take-over
|
||
- PTY-based execution for flagged commands
|
||
- [ ] Session management (channel-based)
|
||
|
||
**Exit criteria:** Launch TUI, chat interactively, 6 permission modes work, config editable in-app, incognito toggleable, `/shell` opens interactive terminal for password prompts.
|
||
|
||
## M6: Context Intelligence
|
||
|
||
**Scope:** Long sessions. Local tokenizer. Full compaction with both truncation and LLM summarization.
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] Local tokenizer for accurate token counting
|
||
- [ ] Token tracker with warning states (OK / Warning / Critical)
|
||
- [ ] TruncateStrategy: drop oldest, preserve system + recent
|
||
- [ ] SummarizeStrategy: spawn compaction elf, LLM-powered summary, image stripping, boundary messages
|
||
- [ ] Auto-compaction triggers (threshold-based, reactive on 413, circuit breaker after 3 failures)
|
||
- [ ] Pre/post compact hooks
|
||
- [ ] Tool result persistence (>50KB → disk, 2KB preview + filepath)
|
||
- [ ] Deferred tool loading (`ShouldDefer()`, full schema on demand)
|
||
- [ ] Post-compact restoration budget (50K total, 5K/file, 25K/skill)
|
||
|
||
**Exit criteria:** 100+ turn conversation stays coherent. Summarization produces useful summaries. Token counting within 5% of provider.
|
||
|
||
## M7: Elfs (Router-Integrated)
|
||
|
||
**Scope:** Sub-agents using router for provider selection. Parallel work. Feedback to router.
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] Elf interface + SyncElf + BackgroundElf implementations
|
||
- [ ] ElfManager: spawn, monitor, cancel, collect results
|
||
- [ ] Router-integrated spawning (`router.Select()` picks arm per elf)
|
||
- [ ] Parent ↔ elf communication via typed channels
|
||
- [ ] Concurrent tool execution (read-only parallel via errgroup, writes serial)
|
||
- [ ] Elf results feed back to router as quality signals
|
||
- [ ] Coordinator mode: orchestrator dispatches to worker elfs
|
||
|
||
**Exit criteria:** Parent spawns 3 background elfs on different providers (chosen by router), collects and synthesizes results.
|
||
|
||
## M8: Extensibility
|
||
|
||
**Scope:** Hooks, skills, MCP client with tool replaceability, plugin system.
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] Hook system: PreToolUse, PostToolUse, SessionStart/End, PreCompact, Stop
|
||
- [ ] Hook protocol: stdin JSON, stdout JSON, exit codes (0=allow, 2=deny)
|
||
- [ ] Hook command types: command (shell), prompt (LLM), agent (spawn elf)
|
||
- [ ] Skill loading from .gnoma/skills/, ~/.config/gnoma/skills/, bundled, plugins
|
||
- [ ] Skill frontmatter: YAML (name, description, whenToUse, allowedTools, paths)
|
||
- [ ] MCP client: JSON-RPC over stdio, tool discovery
|
||
- [ ] MCP tool naming: `mcp__{server}__{tool}`
|
||
- [ ] MCP tool replaceability: `replace_default` config swaps built-in tools
|
||
- [ ] Plugin system: plugin.json manifest, install/enable/disable lifecycle
|
||
|
||
**Exit criteria:** MCP tools appear in gnoma. `replace_default` swaps built-ins. Skills invocable. Hooks fire on tool use.
|
||
|
||
## M9: Router Advanced
|
||
|
||
**Scope:** Full bandit learning. Feedback collection. Ensemble execution strategies. State persistence.
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] Discounted Thompson Sampling (per-arm, per-task-type Beta distributions)
|
||
- [ ] Feedback collection: implicit (acceptance, edit distance, escalation) + explicit
|
||
- [ ] Delayed attribution for orchestration/planning tasks
|
||
- [ ] Execution strategies: SingleArm, CascadeWithReview, ParallelEnsemble, MultiRoundSynthesis
|
||
- [ ] Strategy selection as learned routing decision
|
||
- [ ] Background arm benchmarking (TTFT, tok/s)
|
||
- [ ] State persistence (gob, versioned schema, atomic writes, CRC32)
|
||
- [ ] Cold start: shipped default.state with embedded priors
|
||
- [ ] Heuristic fallback for <5 observations per arm-task pair
|
||
|
||
**Exit criteria:** Bandit converges after ~50 observations. Ensemble outperforms single-arm on complex tasks. State persists across restarts.
|
||
|
||
## M10: Persistence & Serve
|
||
|
||
**Scope:** SQLite session persistence. Serve mode. Coordinator mode.
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] SQLite session storage (messages, parentUuid chain, tombstones)
|
||
- [ ] Session memory: background elf extracts notes from conversation
|
||
- [ ] Incognito enforcement: sessions NOT persisted
|
||
- [ ] Serve mode: Unix socket listener, spawn session goroutine per client
|
||
- [ ] Coordinator mode: orchestrator dispatches to restricted worker elfs
|
||
|
||
**Exit criteria:** Resume yesterday's conversation. External client connects via serve mode.
|
||
|
||
## M11: Task Learning
|
||
|
||
**Scope:** Detect recurring task patterns. Suggest persistent tasks. Refinement loop.
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] Pattern detector: observe turn sequences, identify repeats (≥3 times)
|
||
- [ ] Task suggestion UX: prompt user to save as persistent task
|
||
- [ ] Persistent task definitions: parameterized sequences, stored in .gnoma/tasks/ or ~/.config/gnoma/tasks/
|
||
- [ ] `/task <name> [args]` execution command
|
||
- [ ] Router feedback integration: learn which arm works best per task step
|
||
- [ ] Task refinement: re-split tasks, measure improvement
|
||
|
||
**Exit criteria:** gnoma suggests a persistent task after 3+ repetitions. `/task release v1.2.0` executes a saved workflow.
|
||
|
||
## M12: Thinking, Structured Output & Notebook
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] Thinking mode (disabled / enabled with budget / adaptive)
|
||
- [ ] Thinking block streaming and TUI display
|
||
- [ ] Structured output with JSON schema validation
|
||
- [ ] Retry logic for schema validation failures
|
||
- [ ] NotebookEdit tool: read/write/edit Jupyter notebook cells (.ipynb)
|
||
|
||
## M13: Auth
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] OAuth 2.0 + PKCE flow (browser → callback → token exchange)
|
||
- [ ] Proactive token refresh (before expiry)
|
||
- [ ] OS keyring integration for credential storage
|
||
- [ ] Multi-account support per provider
|
||
|
||
## M14: Observability
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] Feature flag system (local config + optional remote)
|
||
- [ ] Opt-in analytics (event queue, local-only by default)
|
||
- [ ] Usage dashboards (token spend, provider usage, tool frequency)
|
||
- [ ] Cost tracking per provider/model
|
||
|
||
## M15: Web UI
|
||
|
||
**Deliverables:**
|
||
|
||
- [ ] `gnoma web` CLI subcommand starts local web server
|
||
- [ ] Connects to serve mode backend (M10 prerequisite)
|
||
- [ ] Chat interface with streaming, tool output, permission prompts
|
||
|
||
## Future
|
||
|
||
- Voice input/output via provider audio APIs
|
||
- Collaborative sessions (multiple humans + elfs)
|
||
- Plugin marketplace
|
||
- Remote agent execution
|
||
- Federated learning for router priors (opt-in, anonymized)
|
||
|
||
## Changelog
|
||
|
||
- 2026-04-02: Initial version (M1-M11)
|
||
- 2026-04-03: Restructured to M1-M15. Split providers/TUI. Added Security (M3), Router Foundation (M4), Router Advanced (M9), Task Learning (M11). Full 6 permission modes. Full compaction. CC pattern integration.
|