docs: update essentials for router, security, task learning

Restructure milestones from M1-M11 to M1-M15: - M3: Security Firewall (secret scanner, incognito mode) - M4: Router Foundation (arm registry, pools, task classifier) - M5: TUI with full 6 permission modes - M6: Full compaction (truncate + LLM summarization) - M9: Router Advanced (bandit learning, ensemble strategies) - M11: Task Learning (pattern detection, persistent tasks) Add ADR-007 through ADR-012 for security-as-core, router split, Thompson Sampling, MCP replaceability, task learning, incognito. Add risks R-010 through R-015 for router, security, feedback, task learning, ensemble quality, shell parser. Update architecture dependency graph with security, router, elf, hook, skill, mcp, plugin, tasklearn packages. Update domain model with Router, Arm, LimitPool, Firewall entities.
2026-04-03 10:47:11 +02:00
parent efcb5a2901
commit d3990214a5
7 changed files with 462 additions and 108 deletions
--- a/docs/essentials/INDEX.md
+++ b/docs/essentials/INDEX.md
@@ -21,15 +21,15 @@ essentials:

 | # | Essential | Status | Link | Last Updated |
 |---|-----------|--------|------|-------------|
-| 1 | Vision | complete | [vision.md](vision.md) | 2026-04-02 |
-| 2 | Domain Model | complete | [domain-model.md](domain-model.md) | 2026-04-02 |
-| 3 | Architecture | complete | [architecture.md](architecture.md) | 2026-04-02 |
-| 4 | Patterns | complete | [patterns.md](patterns.md) | 2026-04-02 |
-| 5 | Process Flows | complete | [process-flows.md](process-flows.md) | 2026-04-02 |
-| 6 | UML Diagrams | complete | [uml-diagrams.md](uml-diagrams.md) | 2026-04-02 |
-| 7 | API Contracts | complete | [api-contracts.md](api-contracts.md) | 2026-04-02 |
-| 8 | Tech Stack & Conventions | complete | [tech-stack.md](tech-stack.md) | 2026-04-02 |
-| 9 | Constraints & Trade-offs | complete | [constraints.md](constraints.md) | 2026-04-02 |
-| 10 | Milestones | complete | [milestones.md](milestones.md) | 2026-04-02 |
-| 11 | Decision Log | complete | [decisions/001-initial-decisions.md](decisions/001-initial-decisions.md) | 2026-04-02 |
-| 12 | Risk / Unknowns | complete | [risks.md](risks.md) | 2026-04-02 |
+| 1 | Vision | complete | [vision.md](vision.md) | 2026-04-03 |
+| 2 | Domain Model | complete | [domain-model.md](domain-model.md) | 2026-04-03 |
+| 3 | Architecture | complete | [architecture.md](architecture.md) | 2026-04-03 |
+| 4 | Patterns | complete | [patterns.md](patterns.md) | 2026-04-03 |
+| 5 | Process Flows | complete | [process-flows.md](process-flows.md) | 2026-04-03 |
+| 6 | UML Diagrams | complete | [uml-diagrams.md](uml-diagrams.md) | 2026-04-03 |
+| 7 | API Contracts | complete | [api-contracts.md](api-contracts.md) | 2026-04-03 |
+| 8 | Tech Stack & Conventions | complete | [tech-stack.md](tech-stack.md) | 2026-04-03 |
+| 9 | Constraints & Trade-offs | complete | [constraints.md](constraints.md) | 2026-04-03 |
+| 10 | Milestones | complete | [milestones.md](milestones.md) | 2026-04-03 |
+| 11 | Decision Log | complete | [decisions/001-initial-decisions.md](decisions/001-initial-decisions.md) | 2026-04-03 |
+| 12 | Risk / Unknowns | complete | [risks.md](risks.md) | 2026-04-03 |
--- a/docs/essentials/architecture.md
+++ b/docs/essentials/architecture.md
@@ -85,9 +85,17 @@ graph TB
 | `internal/context` | Token tracking, compaction strategies, sliding window | Depends on message, provider | Internal |
 | `internal/config` | TOML layered config loading | BurntSushi/toml | Internal |
 | `internal/auth` | API key resolution from env/config | Pure Go | Internal |
-| `internal/engine` | Agentic query loop, tool execution orchestration | Depends on all above | Internal |
-| `internal/session` | Session lifecycle, channel-based UI decoupling | Depends on engine, stream | Internal |
-| `internal/tui` | Terminal UI: chat, input, status, permission dialogs | Bubble Tea, lipgloss | Internal |
+| `internal/security` | Firewall, secret scanner, unicode sanitizer, incognito mode | message, config | Security boundary |
+| `internal/router` | Smart router: arm registry, pools, task classifier, selection | provider, message, config | Internal |
+| `internal/engine` | Agentic query loop, tool execution orchestration | router, security, tool, stream, context | Internal |
+| `internal/session` | Session lifecycle, channel-based UI decoupling | engine, stream | Internal |
+| `internal/elf` | Sub-agent spawning, lifecycle, communication | engine, router, session | Internal |
+| `internal/tui` | Terminal UI: chat, input, status, permission dialogs, config screen | session, stream, permission | Internal |
+| `internal/hook` | Hook system: events, protocol, registration | message, tool | Internal |
+| `internal/skill` | Skill loading, frontmatter parsing, discovery | message | Internal |
+| `internal/mcp` | MCP client, tool discovery, tool replaceability | tool, config | External (stdio) |
+| `internal/plugin` | Plugin manifest, loader, lifecycle | config | Internal |
+| `internal/tasklearn` | Repetitive task detection, suggestions, persistent tasks | router, engine | Internal |

 ## Package Dependency Graph

@@ -98,12 +106,20 @@ graph BT
    provider["provider"]
    tool["tool"]
    permission["permission"]
+    security["security"]
+    router["router"]
    context_mgr["context"]
    config["config"]
    auth["auth"]
    engine["engine"]
    session["session"]
+    elf["elf"]
    tui["tui"]
+    hook["hook"]
+    skill["skill"]
+    mcp["mcp"]
+    plugin["plugin"]
+    tasklearn["tasklearn"]
    cmd["cmd/gnoma"]

    stream --> message
@@ -111,24 +127,44 @@ graph BT
    provider --> stream
    tool --> message
    permission --> message
+    permission --> config
+    security --> message
+    security --> config
+    router --> provider
+    router --> message
+    router --> config
    context_mgr --> message
    context_mgr --> provider
-    config --> permission
-    engine --> provider
+    engine --> router
+    engine --> security
    engine --> tool
    engine --> permission
    engine --> stream
    engine --> context_mgr
    session --> engine
    session --> stream
+    elf --> engine
+    elf --> router
+    elf --> session
+    hook --> message
+    hook --> tool
+    skill --> message
+    mcp --> tool
+    mcp --> config
+    plugin --> config
+    tasklearn --> router
+    tasklearn --> engine
    tui --> session
    tui --> stream
+    tui --> permission
    cmd --> tui
    cmd --> config
    cmd --> auth
    cmd --> session
    cmd --> provider
    cmd --> tool
+    cmd --> router
+    cmd --> security
 ```

 ## Scope
@@ -136,15 +172,19 @@ graph BT
 **In scope:**
 - Streaming chat with tool execution across 5+ LLM providers
 - Agentic loop (stream → tool calls → re-query → until done)
- Permission system for tool execution
+- Security firewall with secret scanning, redaction, incognito mode
+- Smart router with bandit-based multi-provider collaboration
+- 6-mode permission system for tool execution
 - TUI and CLI pipe modes
 - TOML configuration with layering
- Context management and compaction
- Multi-agent (elfs) with per-elf provider routing
- Hook, skill, and MCP extensibility
+- Context management and compaction (truncation + LLM summarization)
+- Multi-agent (elfs) with router-integrated provider selection
+- Hook, skill, MCP, and plugin extensibility
+- Repetitive task learning and persistent tasks
+- Session persistence (SQLite) and serve mode

 **Out of scope:**
- Web UI (future, via serve mode)
+- Web UI (M15, via serve mode)
 - Cloud hosting / SaaS deployment
 - Training or fine-tuning models
 - IDE extension authoring (gnoma provides the backend, not the extension itself)
--- a/docs/essentials/constraints.md
+++ b/docs/essentials/constraints.md
@@ -63,6 +63,35 @@ depends_on: [domain-model]
 - **Because:** User maintains the Mistral Go SDK, knows its internals. Good baseline — similar to OpenAI's API shape. Anthropic's unique features (thinking blocks, cache tokens) are better added as an M2 extension.
 - **Consequence:** Thinking block support tested later. Cache token tracking added with Anthropic provider.

+### Security as core over plugin
+
+- **Chose:** Security firewall baked into gnoma core (`internal/security/`)
+- **Over:** MCP-based security server (optional plugin)
+- **Because:** Default-off security is no security. Every user should get secret scanning, unicode sanitization, and incognito mode out of the box.
+- **Consequence:** Core binary is larger. False positives affect all users. Mitigated by configurable sensitivity and warn-first mode.
+
+### Proper shell parsing over regex decomposition
+
+- **Chose:** `mvdan.cc/sh` (Go POSIX shell parser) for compound command decomposition
+- **Over:** Regex-based `splitCommand()` (CC approach, caps at 50 subcommands)
+- **Because:** AST-based parsing is accurate for nested structures, doesn't need arbitrary caps, handles edge cases CC's regex misses.
+- **Consequence:** Additional dependency. But `mvdan.cc/sh` is well-maintained and widely used in the Go ecosystem.
+
+### Full 6 permission modes over simplified 3
+
+- **Chose:** All 6 CC permission modes (default, acceptEdits, bypass, deny, plan, auto)
+- **Over:** Simplified 3-mode system (allow, deny, prompt)
+- **Because:** Users need fine-grained control. `acceptEdits` is crucial for trusting file tools while verifying bash. `plan` mode enables read-only exploration. `auto` mode uses router signals for smart defaults.
+- **Consequence:** More complex permission system. Testing matrix is larger (6 modes × rule types × tool types).
+
+### Router split over monolithic
+
+- **Chose:** Router in two milestones: M4 (heuristic) + M9 (bandit learning)
+- **Over:** Full router in one milestone
+- **Because:** Engine needs routing abstraction early (M4). Bandit learning needs elf feedback (M7) that doesn't exist yet. Building everything at once blocks other milestones.
+- **Consequence:** Two integration points. Heuristic → bandit migration must be seamless.
+
 ## Changelog

 - 2026-04-02: Initial version
+- 2026-04-03: Added trade-offs for security-as-core, shell parsing, 6 permission modes, router split
--- a/docs/essentials/decisions/001-initial-decisions.md
+++ b/docs/essentials/decisions/001-initial-decisions.md
@@ -183,6 +183,153 @@ Multi-provider collaboration is a core feature and part of gnoma's identity. The
 **Positive:** Clear differentiator from all existing tools. Shapes architecture from day one.
 **Negative:** Elf system design must account for per-elf provider config from the start.

+---
+
+# ADR-007: Security Firewall as Core (Not Plugin)
+
+**Status:** Accepted
+**Date:** 2026-04-03
+
+## Context
+
+gnoma needs to prevent secrets and sensitive data from leaking to LLM providers. Options: build it as an MCP server (plugin), or bake it into the core.
+
+## Decision
+
+Security firewall is a core component (`internal/security/`), not a plugin. It wraps all provider calls and tool results. Everyone benefits by default.
+
+## Alternatives Considered
+
+### Alternative A: MCP-based security server
+
+- **Pros:** Modular, replaceable, user can choose their own
+- **Cons:** Users must opt-in. Default-off security is no security. MCP adds latency.
+
+## Consequences
+
+**Positive:** Every gnoma user gets secret scanning, unicode sanitization, and incognito mode out of the box.
+**Negative:** Core binary is larger. False positives affect all users (mitigated by configurable sensitivity).
+
+---
+
+# ADR-008: Router Split into Foundation + Advanced
+
+**Status:** Accepted
+**Date:** 2026-04-03
+
+## Context
+
+The smart router is gnoma's core differentiator but is a massive system (arm registry, limit pools, task classification, bandit learning, feedback, ensemble strategies, state persistence). Building it all at once blocks other milestones.
+
+## Decision
+
+Split into M4 (foundation: arm registry, pools, task classifier, heuristic selection) and M9 (advanced: bandit, feedback, ensemble, persistence). M4 gives the engine a routing abstraction early. M9 adds learning after elfs provide real feedback signals.
+
+## Alternatives Considered
+
+### Alternative A: Full router in one milestone
+
+- **Pros:** Complete system from day one
+- **Cons:** Massive milestone, blocks TUI and other features, bandit needs elf feedback that doesn't exist yet
+
+## Consequences
+
+**Positive:** Engine routes from M4 onward. Heuristic selection is good enough for daily use. Bandit learning lands when feedback is available.
+**Negative:** Two integration points instead of one.
+
+---
+
+# ADR-009: Thompson Sampling for Multi-Armed Bandit
+
+**Status:** Accepted
+**Date:** 2026-04-03
+
+## Context
+
+The router needs to learn which arm (provider+model) performs best per task type. Options: epsilon-greedy, UCB, LinUCB, Thompson Sampling.
+
+## Decision
+
+Discounted Thompson Sampling with per-arm, per-task-type Beta(α, β) distributions. No ML framework dependency — Beta distribution sampling via Marsaglia-Tsang Gamma (~30 lines of Go).
+
+## Alternatives Considered
+
+### Alternative A: LinUCB (contextual bandit)
+
+- **Pros:** Uses full task feature vector, theoretically optimal
+- **Cons:** Matrix inversion per decision, complex implementation, marginal gain at v1 scale
+
+### Alternative B: Epsilon-greedy
+
+- **Pros:** Simplest to implement
+- **Cons:** Fixed exploration rate, doesn't adapt, wastes budget on known-bad arms
+
+## Consequences
+
+**Positive:** Natural exploration via sampling. Handles non-stationarity with discounting. No external deps. Fast (<1ms per decision).
+**Negative:** Per-task-type, not contextual — can't generalize across task clusters. Contextual bandit (v2) planned as future upgrade.
+
+---
+
+# ADR-010: MCP Tool Replaceability via Priority Registry
+
+**Status:** Accepted
+**Date:** 2026-04-03
+
+## Context
+
+MCP servers provide tools. Some users want MCP tools to replace gnoma's built-in tools (e.g., a custom file system tool). Need a mechanism for this.
+
+## Decision
+
+Tool registry has a priority system. MCP servers can declare `replace_default = "fs"` in config to replace all `fs.*` built-in tools. Resolution: MCP override > built-in.
+
+## Consequences
+
+**Positive:** Users can swap any built-in tool via config. No code changes needed.
+**Negative:** MCP tool must implement the same contract (same parameter schema). Mismatch → runtime errors.
+
+---
+
+# ADR-011: Task Learning as Late-Stage Feature (M11)
+
+**Status:** Accepted
+**Date:** 2026-04-03
+
+## Context
+
+Task learning (detecting recurring patterns, suggesting persistent tasks) could be built early or late.
+
+## Decision
+
+M11 — after router advanced (M9) and persistence (M10). Task learning needs: (1) router feedback signals to understand quality, (2) session persistence to observe patterns across sessions, (3) enough real usage to detect meaningful repetitions.
+
+## Consequences
+
+**Positive:** Built on solid foundations. Feedback signals are real, not synthetic.
+**Negative:** Users don't benefit from task learning until late in the roadmap.
+
+---
+
+# ADR-012: Incognito Mode as Core Security Feature
+
+**Status:** Accepted
+**Date:** 2026-04-03
+
+## Context
+
+Users working with sensitive code need a way to prevent any data from being persisted, logged, or fed back to the learning system.
+
+## Decision
+
+Incognito mode is part of the security firewall (M3). When active: no session persistence, no router learning, no logging of content, optional local-only routing. Activated via `--incognito` flag or TUI toggle. Visual indicator in status bar.
+
+## Consequences
+
+**Positive:** Strong privacy guarantee. Users can work on sensitive projects without worrying about data leakage to disk or learning systems.
+**Negative:** No learning improvement from incognito sessions. Router stays static.
+
 ## Changelog

- 2026-04-02: Initial decisions from architecture planning session
+- 2026-04-02: Initial decisions (ADR-001 through ADR-006)
+- 2026-04-03: Added ADR-007 through ADR-012 (security, router split, Thompson Sampling, MCP replaceability, task learning, incognito)
--- a/docs/essentials/domain-model.md
+++ b/docs/essentials/domain-model.md
@@ -79,15 +79,47 @@ classDiagram
        +Wait() ElfResult
    }

+    class Router {
+        +Select(task) RoutingDecision
+        +ClassifyTask(history) Task
+    }
+
+    class Arm {
+        +ID: ArmID
+        +Provider: Provider
+        +ModelName: string
+        +IsLocal: bool
+        +Pools: []LimitPool
+    }
+
+    class LimitPool {
+        +ID: string
+        +Kind: PoolKind
+        +TotalLimit: float64
+        +Used: float64
+        +Reserved: float64
+        +ScarcityMultiplier() float64
+    }
+
+    class Firewall {
+        +ScanOutgoing(req) req
+        +ScanToolResult(result) result
+        +Incognito: IncognitoMode
+    }
+
    Session "1" --> "1" Engine : owns
-    Engine "1" --> "1" Provider : uses
+    Engine "1" --> "1" Router : routes through
+    Engine "1" --> "1" Firewall : scans through
+    Router "1" --> "*" Arm : selects from
+    Arm "1" --> "1" Provider : wraps
+    Arm "1" --> "*" LimitPool : draws from
    Engine "1" --> "*" Tool : executes
    Engine "1" --> "*" Message : history
    Engine "1" --> "*" Turn : produces
    Message "1" --> "*" Content : contains
    Provider "1" --> "*" Stream : creates
    Stream "1" --> "*" Event : yields
-    Session "1" --> "*" Elf : spawns (future)
+    Session "1" --> "*" Elf : spawns
    Elf "1" --> "1" Engine : owns
 ```

@@ -98,7 +130,12 @@ classDiagram
 | gnoma | The host application — single binary, agentic coding assistant | `gnoma "list files"` |
 | Elf | A sub-agent (goroutine) with its own engine, history, and provider. Named after the elf owl. | Background elf exploring `auth/` on Ollama |
 | Session | A conversation boundary between UI and engine. Owns one engine, communicates via channels. | TUI session, CLI pipe session |
-| Engine | The agentic loop orchestrator. Manages history, streams from provider, executes tools, loops until done. | Engine running on Mistral with 5 tools |
+| Engine | The agentic loop orchestrator. Routes through firewall and router, executes tools, loops until done. | Engine running via router with 5 tools |
+| Router | The smart routing layer. Classifies tasks, selects arms based on quality/cost/scarcity, learns from feedback. | Router picks local Qwen for boilerplate, Claude for security review |
+| Arm | A provider+model pair registered in the router. Has capability metadata, pool memberships, and performance stats. | `ollama/mistral-7b`, `anthropic/claude-opus-4` |
+| LimitPool | A shared resource budget that arms draw from. Tracks usage with optimistic reservation and scarcity multipliers. | Daily cost cap of 5 EUR shared across API providers |
+| Firewall | Security layer that scans outgoing requests and tool results for sensitive data. Manages incognito mode. | Redacts `sk-ant-...` from prompts before sending to API |
+| Incognito | Mode where no data is persisted, logged, or fed back to the router. Optional local-only routing. | User toggles incognito for sensitive work |
 | Provider | An LLM backend adapter. Translates gnoma types to/from SDK-specific types. | Anthropic provider, OpenAI-compat provider |
 | Stream | Pull-based iterator over streaming events from a provider. Unified interface across all SDKs. | `for s.Next() { e := s.Current() }` |
 | Event | A single streaming delta — text chunk, tool call fragment, thinking trace, or usage update. | `EventTextDelta{Text: "hello"}` |
@@ -108,9 +145,11 @@ classDiagram
 | ToolResult | The output of executing a tool, correlated to a ToolCall by ID. | `{ToolCallID: "tc_1", Content: "file1.go\nfile2.go"}` |
 | Turn | The result of a complete agentic loop — may span multiple API calls and tool executions. | Turn with 3 rounds: stream → tool → stream → tool → stream → done |
 | Accumulator | Assembles a complete Response from a sequence of streaming Events. Shared across all providers. | Text fragments → complete assistant message |
+| TaskType | Classification of a task for routing purposes. 10 types from boilerplate to security review. | `TaskGeneration`, `TaskRefactor`, `TaskSecurityReview` |
 | Callback | Function the engine calls for each streaming event, enabling real-time UI updates. | `func(evt stream.Event) { ch <- evt }` |
 | Round | A single API call within a Turn. A turn with 2 tool-use loops has 3 rounds. | Round 1: initial query. Round 2: after tool results. |
 | Routing | Directing tasks to different providers based on capability, cost, or latency rules. | Complex reasoning → Claude, quick lookups → local Qwen |
+| PersistentTask | A user-confirmed recurring task pattern saved for re-execution. | `/task release v1.2.0` runs the saved release workflow |

 ## Invariants

--- a/docs/essentials/milestones.md
+++ b/docs/essentials/milestones.md
@@ -1,24 +1,46 @@
 ---
 essential: milestones
 status: complete
-last_updated: 2026-04-02
+last_updated: 2026-04-03
 project: gnoma
 depends_on: [vision]
 ---

 # Milestones

+## Overview
+
+| # | Name | Core Deliverable | Deps |
+|---|------|-----------------|------|
+| M1 | Core Engine | Pipe mode, Mistral, tools, agentic loop | — |
+| M2 | Multi-Provider | All providers, config, dynamic switching | M1 |
+| M3 | Security Firewall | Request/response scanning, redaction, incognito | M2 |
+| M4 | Router Foundation | Arm registry, pools, task classifier, heuristic selection | M2 |
+| M5 | TUI | Bubble Tea, 6 permission modes, config screen | M3, M4 |
+| M6 | Context Intelligence | Local tokenizer, full compaction (truncate + summarize) | M5 |
+| M7 | Elfs | Router-integrated sub-agents, parallel work | M4, M6 |
+| M8 | Extensibility | Hooks, skills, MCP client, MCP tool replaceability, plugins | M7 |
+| M9 | Router Advanced | Bandit core, feedback, ensemble strategies, state persistence | M7 |
+| M10 | Persistence & Serve | SQLite sessions, serve mode, coordinator | M7 |
+| M11 | Task Learning | Pattern recognition, task suggestions, persistent tasks | M9 |
+| M12 | Thinking & Structured Output | Thinking modes, schema validation | M2 |
+| M13 | Auth | OAuth PKCE, keyring, multi-account | M5 |
+| M14 | Observability | Feature flags, telemetry, cost dashboards | M10 |
+| M15 | Web UI | `gnoma web` CLI flag, browser UI via serve mode | M10 |
+
+---
+
 ## M1: Core Engine (MVP)

-**Scope:** First working assistant. CLI pipe mode. Mistral as reference provider. Bash + file tools. No TUI, no permissions, no config file.
+**Scope:** First working assistant. CLI pipe mode. Mistral as reference provider. Bash + file tools (with 7 critical security checks). No TUI, no permissions, no config file.

 **Deliverables:**

- [ ] Architecture docs in `docs/essentials/`
+- [x] Architecture docs in `docs/essentials/`
 - [ ] Foundation types (`internal/message/`)
 - [ ] Streaming abstraction (`internal/stream/`)
 - [ ] Provider interface + Mistral adapter
- [ ] Tool system: bash, fs.read, fs.write, fs.edit, fs.glob, fs.grep
+- [ ] Tool system: bash (with security checks), fs.read, fs.write, fs.edit, fs.glob, fs.grep
 - [ ] Engine agentic loop (stream → tool → re-query → done)
 - [ ] CLI pipe mode (`echo "list files" | gnoma`)

@@ -26,152 +48,221 @@ depends_on: [vision]

 ## M2: Multi-Provider

-**Scope:** All remaining providers. Config file. Dynamic provider switching.
+**Scope:** All remaining providers. TOML config with layered loading. Dynamic provider switching.

 **Deliverables:**

+- [ ] TOML config system (defaults → user → project → env → flags)
+- [ ] API key resolution from env vars and config
 - [ ] Anthropic provider (streaming + tool use + thinking blocks)
 - [ ] OpenAI provider (streaming + tool use)
- [ ] Google provider (streaming + function calling)
+- [ ] Google provider (streaming + function calling, goroutine bridge)
 - [ ] OpenAI-compat for Ollama and llama.cpp
- [ ] TOML config (global + project + env + flags)
- [ ] `/model provider/model` switching mid-session
+- [ ] `--provider` / `--model` flag switching

-**Exit criteria:** Chat with any configured provider via CLI pipe. Switch providers mid-session.
+**Exit criteria:** `echo "hello" | gnoma --provider openai` works. All 5+ providers functional.

-## M3: TUI
+## M3: Security Firewall

-**Scope:** Interactive terminal UI. Permission system.
+**Scope:** Core security layer built into gnoma. Scans outgoing LLM requests and incoming tool results for sensitive data. Redacts or blocks. Incognito mode.

 **Deliverables:**

- [ ] Bubble Tea TUI: chat panel, input box, streaming output
- [ ] Status bar (provider, model, token usage)
- [ ] Permission system (allow / deny / prompt modes)
- [ ] Permission dialog overlay
+- [ ] Secret scanner (gitleaks-derived, 40+ regex patterns, Shannon entropy detection)
+- [ ] Unicode sanitization (NFKC + Cf/Co/Cn stripping, recursive on nested structs)
+- [ ] Redactor (replace matched groups with `[REDACTED]`, preserve context)
+- [ ] Configurable rules (regex patterns, action: redact/block/warn)
+- [ ] Remaining bash security checks (checks 8-23 from CC bashSecurity.ts)
+- [ ] Incognito mode: no persistence, no learning, no logging, optional local-only routing
+- [ ] `--incognito` CLI flag
+
+**Exit criteria:** Provider requests with embedded API keys get redacted. Incognito suppresses all persistence. Unicode attack vectors sanitized.
+
+## M4: Router Foundation
+
+**Scope:** Arm registry, limit pools, task classification, heuristic selection. Engine switches from direct provider calls to `router.Select()`.
+
+**Deliverables:**
+
+- [ ] Arm type (provider+model pair) with capability introspection
+- [ ] Limit pools (RPM, RPD, tokens/day, cost caps, custom units)
+- [ ] Pool tracker with optimistic reservation and scarcity multipliers
+- [ ] Task classifier (10 types: Boilerplate, Generation, Refactor, Review, UnitTest, Planning, Orchestration, SecurityReview, Debug, Explain)
+- [ ] Complexity scoring and value scoring
+- [ ] Heuristic arm selection (score = quality × value / effective_cost)
+- [ ] Background provider discovery (poll ollama, llama.cpp, API providers)
+- [ ] Engine integration: `router.Select()` replaces direct provider calls
+
+**Exit criteria:** Engine routes tasks through router. Limit pools track consumption. Task classification works for 10 types.
+
+## M5: TUI
+
+**Scope:** Interactive terminal UI. Full 6-mode permission system. Session management. In-app config. Incognito toggle.
+
+**Deliverables:**
+
+- [ ] Permission system with all 6 modes:
+  - `default` — prompt for each tool invocation
+  - `acceptEdits` — auto-allow file ops, prompt for bash/destructive
+  - `bypass` — allow everything
+  - `deny` — deny all unless explicit allow rule
+  - `plan` — read-only tools only
+  - `auto` — router task classification + tool risk scoring
+- [ ] Permission rules with compound bash command decomposition (via `mvdan.cc/sh` AST)
+- [ ] 7-step permission decision flow (deny gates → tool check → safety → mode → allow → passthrough → hooks)
+- [ ] Bubble Tea TUI: chat panel, input, streaming output
+- [ ] Status bar (provider, model, tokens, incognito indicator)
+- [ ] Permission prompt overlay
 - [ ] Model picker overlay
- [ ] Input history (up/down)
+- [ ] In-app config editor (`/config` command)
+- [ ] Incognito toggle (`/incognito` command)
+- [ ] Session management (channel-based)

-**Exit criteria:** Launch TUI, chat interactively, tools execute with permission prompts.
+**Exit criteria:** Launch TUI, chat interactively, 6 permission modes work, config editable in-app, incognito toggleable.

-## M4: Context Intelligence
+## M6: Context Intelligence

-**Scope:** Long sessions. Token tracking. Compaction. Local tokenizer.
+**Scope:** Long sessions. Local tokenizer. Full compaction with both truncation and LLM summarization.

 **Deliverables:**

- [ ] Local tokenizer for accurate token counting without provider round-trips
- [ ] Token tracker (cumulative usage, OK/warning/critical states)
- [ ] Truncate compaction (drop old messages, keep system + recent)
- [ ] Summarize compaction (LLM summarizes dropped messages)
- [ ] Compact boundaries (transaction markers for crash recovery)
- [ ] Deferred tool loading (non-essential tools loaded on demand)
- [ ] Result persistence (large tool outputs written to disk)
+- [ ] Local tokenizer for accurate token counting
+- [ ] Token tracker with warning states (OK / Warning / Critical)
+- [ ] TruncateStrategy: drop oldest, preserve system + recent
+- [ ] SummarizeStrategy: spawn compaction elf, LLM-powered summary, image stripping, boundary messages
+- [ ] Auto-compaction triggers (threshold-based, reactive on 413, circuit breaker after 3 failures)
+- [ ] Pre/post compact hooks
+- [ ] Tool result persistence (>50KB → disk, 2KB preview + filepath)
+- [ ] Deferred tool loading (`ShouldDefer()`, full schema on demand)
+- [ ] Post-compact restoration budget (50K total, 5K/file, 25K/skill)

-**Exit criteria:** 100+ turn conversation stays coherent within token budget. Local token counting matches provider reports within 5%.
+**Exit criteria:** 100+ turn conversation stays coherent. Summarization produces useful summaries. Token counting within 5% of provider.

-## M5: Elfs (Multi-Agent + Multi-Provider Routing)
+## M7: Elfs (Router-Integrated)

-**Scope:** Sub-agents on different providers. Parallel work. Provider routing.
+**Scope:** Sub-agents using router for provider selection. Parallel work. Feedback to router.

 **Deliverables:**

- [ ] Elf spawning (`Engine.SpawnElf` with per-elf provider config)
- [ ] Background elfs (independent goroutine + engine)
+- [ ] Elf interface + SyncElf + BackgroundElf implementations
+- [ ] ElfManager: spawn, monitor, cancel, collect results
+- [ ] Router-integrated spawning (`router.Select()` picks arm per elf)
 - [ ] Parent ↔ elf communication via typed channels
- [ ] Concurrent tool execution (read-only parallel, writes sequential)
- [ ] Provider routing rules (route by capability, cost, latency) — research needed
- [ ] Coordinator dispatches tasks to elfs on different providers
+- [ ] Concurrent tool execution (read-only parallel via errgroup, writes serial)
+- [ ] Elf results feed back to router as quality signals
+- [ ] Coordinator mode: orchestrator dispatches to worker elfs

-**Exit criteria:** Coordinator on Claude spawns research elf on local Qwen + review elf on OpenAI, collects and synthesizes results.
+**Exit criteria:** Parent spawns 3 background elfs on different providers (chosen by router), collects and synthesizes results.

-## M6: Extensibility
+## M8: Extensibility

-**Scope:** Hooks, skills, MCP, plugin foundation.
+**Scope:** Hooks, skills, MCP client with tool replaceability, plugin system.

 **Deliverables:**

- [ ] Hook system (PreToolUse / PostToolUse, stdin/stdout protocol)
- [ ] Skill loading (`.gnoma/skills/*.md` with frontmatter)
- [ ] MCP client (JSON-RPC over stdio, tool discovery)
- [ ] Plugin foundation (manifest, install, lifecycle)
+- [ ] Hook system: PreToolUse, PostToolUse, SessionStart/End, PreCompact, Stop
+- [ ] Hook protocol: stdin JSON, stdout JSON, exit codes (0=allow, 2=deny)
+- [ ] Hook command types: command (shell), prompt (LLM), agent (spawn elf)
+- [ ] Skill loading from .gnoma/skills/, ~/.config/gnoma/skills/, bundled, plugins
+- [ ] Skill frontmatter: YAML (name, description, whenToUse, allowedTools, paths)
+- [ ] MCP client: JSON-RPC over stdio, tool discovery
+- [ ] MCP tool naming: `mcp__{server}__{tool}`
+- [ ] MCP tool replaceability: `replace_default` config swaps built-in tools
+- [ ] Plugin system: plugin.json manifest, install/enable/disable lifecycle

-**Exit criteria:** MCP server tools appear in gnoma. Skills invocable by model. Hook logs all bash commands.
+**Exit criteria:** MCP tools appear in gnoma. `replace_default` swaps built-ins. Skills invocable. Hooks fire on tool use.

-## M7: Persistence & Serve
+## M9: Router Advanced

-**Scope:** Session persistence via SQLite. Serve mode for external clients. Coordinator mode.
+**Scope:** Full bandit learning. Feedback collection. Ensemble execution strategies. State persistence.

 **Deliverables:**

- [ ] Session persistence with SQLite (save/restore conversations across restarts)
- [ ] Serve mode (Unix socket listener, external UI clients)
- [ ] Coordinator mode (orchestrator dispatches to worker elfs)
+- [ ] Discounted Thompson Sampling (per-arm, per-task-type Beta distributions)
+- [ ] Feedback collection: implicit (acceptance, edit distance, escalation) + explicit
+- [ ] Delayed attribution for orchestration/planning tasks
+- [ ] Execution strategies: SingleArm, CascadeWithReview, ParallelEnsemble, MultiRoundSynthesis
+- [ ] Strategy selection as learned routing decision
+- [ ] Background arm benchmarking (TTFT, tok/s)
+- [ ] State persistence (gob, versioned schema, atomic writes, CRC32)
+- [ ] Cold start: shipped default.state with embedded priors
+- [ ] Heuristic fallback for <5 observations per arm-task pair

-**Exit criteria:** Resume yesterday's conversation. VS Code extension connects via serve mode. Coordinator parallelizes subtasks.
+**Exit criteria:** Bandit converges after ~50 observations. Ensemble outperforms single-arm on complex tasks. State persists across restarts.

-## M8: Thinking & Structured Output
+## M10: Persistence & Serve

-**Scope:** Extended thinking support across providers. Schema-validated structured output.
+**Scope:** SQLite session persistence. Serve mode. Coordinator mode.
+
+**Deliverables:**
+
+- [ ] SQLite session storage (messages, parentUuid chain, tombstones)
+- [ ] Session memory: background elf extracts notes from conversation
+- [ ] Incognito enforcement: sessions NOT persisted
+- [ ] Serve mode: Unix socket listener, spawn session goroutine per client
+- [ ] Coordinator mode: orchestrator dispatches to restricted worker elfs
+
+**Exit criteria:** Resume yesterday's conversation. External client connects via serve mode.
+
+## M11: Task Learning
+
+**Scope:** Detect recurring task patterns. Suggest persistent tasks. Refinement loop.
+
+**Deliverables:**
+
+- [ ] Pattern detector: observe turn sequences, identify repeats (≥3 times)
+- [ ] Task suggestion UX: prompt user to save as persistent task
+- [ ] Persistent task definitions: parameterized sequences, stored in .gnoma/tasks/ or ~/.config/gnoma/tasks/
+- [ ] `/task <name> [args]` execution command
+- [ ] Router feedback integration: learn which arm works best per task step
+- [ ] Task refinement: re-split tasks, measure improvement
+
+**Exit criteria:** gnoma suggests a persistent task after 3+ repetitions. `/task release v1.2.0` executes a saved workflow.
+
+## M12: Thinking & Structured Output

 **Deliverables:**

 - [ ] Thinking mode (disabled / enabled with budget / adaptive)
- [ ] Thinking block streaming and display in TUI
+- [ ] Thinking block streaming and TUI display
 - [ ] Structured output with JSON schema validation
 - [ ] Retry logic for schema validation failures

-**Exit criteria:** Extended thinking with budget works on Anthropic. Structured output validates against schema on all providers that support it.
-
-## M9: Auth
-
-**Scope:** OAuth 2.0 + PKCE for cloud providers. Credential management.
+## M13: Auth

 **Deliverables:**

- [ ] OAuth 2.0 + PKCE flow (browser redirect → callback → token exchange)
- [ ] Token refresh (proactive, before expiry)
- [ ] OS keyring integration for secure credential storage
+- [ ] OAuth 2.0 + PKCE flow (browser → callback → token exchange)
+- [ ] Proactive token refresh (before expiry)
+- [ ] OS keyring integration for credential storage
 - [ ] Multi-account support per provider

-**Exit criteria:** `gnoma login anthropic` opens browser, completes OAuth flow, stores token in keyring. Automatic refresh works.
-
-## M10: Observability
-
-**Scope:** Feature flags. Opt-in telemetry and analytics.
+## M14: Observability

 **Deliverables:**

- [ ] Feature flag system (local config + optional remote evaluation)
+- [ ] Feature flag system (local config + optional remote)
 - [ ] Opt-in analytics (event queue, local-only by default)
 - [ ] Usage dashboards (token spend, provider usage, tool frequency)
 - [ ] Cost tracking per provider/model

-**Exit criteria:** Feature flags gate experimental features. User can view their token spend breakdown. Analytics disabled by default.
-
-## M11: Web UI
-
-**Scope:** Browser-based UI as alternative to TUI. Requires serve mode (M7).
+## M15: Web UI

 **Deliverables:**

- [ ] `gnoma web` CLI subcommand (or `gnoma --web`) starts local web server
- [ ] Web UI connects to serve mode backend
+- [ ] `gnoma web` CLI subcommand starts local web server
+- [ ] Connects to serve mode backend (M10 prerequisite)
 - [ ] Chat interface with streaming, tool output, permission prompts
- [ ] Responsive design for desktop browsers
-
-**Exit criteria:** `gnoma web` opens browser, full chat with streaming and tool execution. Serve mode required as prerequisite.

 ## Future

-Ideas not yet committed:
-
 - Voice input/output via provider audio APIs
 - Collaborative sessions (multiple humans + elfs)
 - Plugin marketplace
 - Remote agent execution
+- Federated learning for router priors (opt-in, anonymized)

 ## Changelog

- 2026-04-02: Initial version — M1-M6
- 2026-04-02: Split M2 into providers (M2) and TUI (M3). Added M8-M11 for thinking, auth, observability, web UI. Local tokenizer in M4. SQLite for session persistence in M7.
+- 2026-04-02: Initial version (M1-M11)
+- 2026-04-03: Restructured to M1-M15. Split providers/TUI. Added Security (M3), Router Foundation (M4), Router Advanced (M9), Task Learning (M11). Full 6 permission modes. Full compaction. CC pattern integration.
--- a/docs/essentials/risks.md
+++ b/docs/essentials/risks.md
@@ -19,16 +19,24 @@ depends_on: []
 | R-007 | Multi-provider routing complexity — coordinating elfs on different providers with different capabilities | High | Design routing interface early (M4), start simple (manual provider assignment), add rules incrementally | Open |
 | R-008 | Context compaction coherence — summarization may lose critical details | Medium | Truncation as safe default, summarization opt-in, compact boundaries for recovery | Open |
 | R-009 | Permission prompt UX in pipe mode — no TUI for interactive prompts | Low | Default to `allow` or `deny` in pipe mode, require explicit flag | Open |
+| R-010 | Router complexity — bandit tuning, cold start problem | High | Ship default.state with embedded priors, heuristic fallback for <5 observations | Open |
+| R-011 | Security false positives — blocking legitimate content | Medium | Warn-first mode, user override per-pattern, configurable sensitivity | Open |
+| R-012 | Feedback attribution — delayed/noisy signals for orchestration tasks | Medium | Neutral default for missing signals, ensemble contribution rank as strong signal | Open |
+| R-013 | Task learning privacy — pattern data persistence | Low | Patterns stored locally only, cleared in incognito mode | Open |
+| R-014 | Ensemble synthesis quality — depends heavily on synthesis prompt | Medium | Invest in prompt engineering, A/B test with polisher arm | Open |
+| R-015 | Shell parser dependency — `mvdan.cc/sh` for compound command decomposition | Low | Well-maintained Go package, fallback to regex-based decomposition if needed | Open |

 ## Open Questions

 - [ ] How should routing rules be expressed in config? Per-task rules, model capability tags, cost-based? — needs research before M5
 - [ ] Which local tokenizer library to use? (tiktoken port, sentencepiece, or provider-specific)
- [ ] Serve mode protocol — choose what fits best when implementing M7
- [x] ~~Should gnoma embed a tokenizer?~~ → Yes, include local tokenizer (M4)
- [x] ~~Session persistence format?~~ → SQLite (M7)
+- [ ] Serve mode protocol — choose what fits best when implementing M10
+- [ ] What automated quality evaluation to use for router feedback? (compile check, linter, self-consistency, small local judge model)
+- [x] ~~Should gnoma embed a tokenizer?~~ → Yes, include local tokenizer (M6)
+- [x] ~~Session persistence format?~~ → SQLite (M10)
 - [x] ~~Mistral SDK as long-term reference?~~ → Yes for now, revisit after M2

 ## Changelog

 - 2026-04-02: Initial version
+- 2026-04-03: Added R-010 through R-015 for router, security, feedback, task learning, shell parser