From c556d3172feed65f0214f7882212dc5bbf02fde9 Mon Sep 17 00:00:00 2001 From: vikingowl Date: Sun, 5 Apr 2026 21:22:26 +0200 Subject: [PATCH] =?UTF-8?q?docs:=20M6/M7=20close-out=20design=20spec=20?= =?UTF-8?q?=E2=80=94=20tool=20persistence,=20tokenizer,=20router=20feedbac?= =?UTF-8?q?k,=20coordinator?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../specs/2026-04-05-m6-m7-closeout-design.md | 296 ++++++++++++++++++ 1 file changed, 296 insertions(+) create mode 100644 docs/superpowers/specs/2026-04-05-m6-m7-closeout-design.md diff --git a/docs/superpowers/specs/2026-04-05-m6-m7-closeout-design.md b/docs/superpowers/specs/2026-04-05-m6-m7-closeout-design.md new file mode 100644 index 0000000..786b54b --- /dev/null +++ b/docs/superpowers/specs/2026-04-05-m6-m7-closeout-design.md @@ -0,0 +1,296 @@ +# M6/M7 Close-out: Tool Persistence, Tokenizer, Router Feedback, Coordinator Mode + +**Date:** 2026-04-05 +**Milestones:** M6 (Context Intelligence), M7 (Elfs) +**Status:** Approved + +--- + +## Context + +Gnoma's M6/M7 gap audit left four items unfinished: + +1. **Tool result persistence** — currently only fires for results >50K chars and writes to `.gnoma/sessions/`. The vision is every meaningful result (>1KB) persisted to `/tmp` so tools can share state across a session. +2. **Local tokenizer** — token counting uses a `len/4` heuristic. This causes compaction to fire too early or too late and makes context window sizing inaccurate. +3. **Router feedback** — `ReportOutcome` is a `slog.Debug` stub. Elf success/failure signals are captured but never used to influence arm selection. +4. **Coordinator mode** — completely unimplemented. Needed to close M7 and unblock M8/M10 coordinator work. + +The `/tmp` tool result files become the shared artifact layer connecting all four: elfs write results to shared files, the coordinator discovers them via `list_results`, and the router uses elf outcomes (with result file references) for quality tracking. + +--- + +## 1. Tool Result Persistence + +### What changes + +**New package:** `internal/tool/persist` + +``` +persist/ + store.go -- Store type, session dir management + store_test.go +``` + +**`Store` type:** + +```go +type Store struct { + dir string // /tmp/gnoma-/tool-results +} + +func New(sessionID string) *Store +func (s *Store) Save(toolName, callID, content string) (path string, persisted bool) +func (s *Store) List(filter string) ([]ResultFile, error) // filter = glob on tool name prefix +func (s *Store) Read(path string) (string, error) // validates path is within session dir +``` + +```go +type ResultFile struct { + Path string + ToolName string + CallID string + Size int64 + ModTime time.Time +} +``` + +**Threshold:** `len(content) >= 1024` bytes. Below this, `Save` returns `("", false)` — no file written. + +**File naming:** `/tmp/gnoma-/tool-results/-.txt` +- Example: `/tmp/gnoma-20260405-150405-abc123/tool-results/bash-toolu_01AbCd.txt` + +**Session ID:** `-<6 random hex chars>` — generated once at engine startup, passed into `Store.New()`. + +**Inline context replacement** (what the LLM sees instead of the full result): +``` +[Tool result saved: /tmp/gnoma-/tool-results/-.txt] + +Preview (first 2000 chars): + +``` + +**Engine integration:** +- `engine.Engine` gains a `store *persist.Store` field +- `executeSingleTool` in `loop.go` replaces the `PersistLargeResult` call with `store.Save()` +- The existing `PersistLargeResult` function in `internal/context/persist.go` is retired (deleted) +- `elf.Manager` receives the `Store` and passes it to each elf's engine config + +**Cleanup:** No explicit cleanup. `/tmp/gnoma-*` dirs are session-scoped; OS garbage-collects `/tmp` on reboot or via tmpwatch/systemd-tmpfiles. + +### Key constraint + +`Store.Read()` must validate that the requested path is prefixed with the session's tool-results dir. This prevents `read_result` from being used to traverse arbitrary filesystem paths. + +--- + +## 2. Local Tokenizer (tiktoken-go) + +### What changes + +**New dependency:** `github.com/pkoukk/tiktoken-go` + +**New package:** `internal/tokenizer` + +``` +tokenizer/ + tokenizer.go + tokenizer_test.go +``` + +**`Tokenizer` type:** + +```go +type Tokenizer struct { + enc *tiktoken.Tiktoken // nil until first use + encoding string // e.g. "cl100k_base" + mu sync.Mutex +} + +func New(encoding string) *Tokenizer +func ForProvider(providerName string) *Tokenizer +func (t *Tokenizer) Count(text string) int +``` + +**Provider → encoding mapping:** + +| Provider | Encoding | +|----------|----------| +| `anthropic` | `cl100k_base` | +| `openai` | `cl100k_base` | +| `mistral` | `o200k_base` | +| `google` | `o200k_base` | +| `ollama` | `o200k_base` | +| `llamacpp` | `o200k_base` | +| (unknown) | `cl100k_base` (fallback) | + +**Lazy loading:** Encoding is loaded on first `Count()` call inside a `sync.Once` equivalent (`sync.Mutex` guard, check-and-initialize). Encoding files are ~2MB per vocab and are embedded by tiktoken-go via `go:embed`. + +**Fallback:** If tiktoken initialization fails (e.g., unsupported encoding, memory pressure), `Count()` falls back to `len(text)/4` and logs a `slog.Warn` once via `sync.Once`. + +### Context tracker changes + +- `context.Tracker` gains a `tokenizer *tokenizer.Tokenizer` field (optional; nil → heuristic) +- `EstimateTokens(text)` replaced by `CountTokens(tok *Tokenizer, text string)` — uses tokenizer if non-nil, else heuristic +- `EstimateMessages` renamed `CountMessages`, same pattern +- Tracker initialized with tokenizer in `main.go`: `tokenizer.ForProvider(cfg.Provider.Name)` +- **Context window size fix:** `MaxTokens` set from `arm.Capabilities.ContextWindow` instead of `cfg.Provider.MaxTokens * 20`. This field is already populated for all providers. +- **Prefix token counting:** Prefix messages are counted at load time and added to the tracker's initial baseline so they're visible to compaction logic. + +--- + +## 3. Router Feedback (Heuristic Quality Tracking) + +### What changes + +**New file:** `internal/router/feedback.go` + +```go +type QualityTracker struct { + mu sync.RWMutex + scores map[string]map[TaskType]*EMAScore // armID -> taskType -> score +} + +type EMAScore struct { + Value float64 + Count int +} + +const qualityAlpha = 0.3 +const minObservations = 3 // below this, fall back to heuristic-only + +func NewQualityTracker() *QualityTracker +func (qt *QualityTracker) Record(armID string, taskType TaskType, success bool) +func (qt *QualityTracker) Quality(armID string, taskType TaskType) (score float64, hasData bool) +``` + +`Record`: +- Maps `success` to observation: `1.0` (success) or `0.0` (failure) +- EMA update: `score.Value = qualityAlpha*observation + (1-qualityAlpha)*score.Value` +- Increments `Count` + +`Quality`: +- Returns `(0, false)` when `Count < minObservations` +- Returns `(score.Value, true)` otherwise + +### Outcome struct extension + +`router.Outcome` gains one field: +```go +type Outcome struct { + ArmID string + TaskType TaskType + Success bool + Tokens int + Duration time.Duration + ResultFilePaths []string // NEW: paths to /tmp tool result files (for future M9 analysis) +} +``` + +The `ResultFilePaths` field is populated by the `agent`/`spawn_elfs` tools: snapshot `store.List()` before spawning the elf, snapshot again after `Wait()` returns, then diff — files present in the post-snapshot but not the pre-snapshot are attributed to that elf's run. + +### Router integration + +- `Router` gains a `quality *QualityTracker` field, initialized in `New()` +- `ReportOutcome` calls `qt.Record(o.ArmID, o.TaskType, o.Success)` (replaces slog.Debug stub) +- `scoreArm()` updated to blend observed and heuristic quality: + ```go + hq := heuristicQuality(arm, task) + if observed, hasData := r.quality.Quality(arm.ID, task.Type); hasData { + quality = 0.7*observed + 0.3*hq + } else { + quality = hq + } + ``` + +### What this does NOT include (M9) + +- No Thompson Sampling / Beta distributions +- No state persistence across restarts +- No delayed attribution for orchestration tasks +- No implicit feedback (edit distance, escalation signals) + +--- + +## 4. Coordinator Mode + +### What changes + +**New tools in `internal/tool/agent/`:** + +`list_results.go` — `ListResultsTool` (name: `list_results`): +```go +// Parameters: filter string (optional, glob on tool name prefix, e.g. "bash*") +// Returns: formatted list of result files in the session: +// /tmp/gnoma-/tool-results/bash-toolu_abc.txt [bash, 4.2KB, 15:04:05] +// /tmp/gnoma-/tool-results/fs.grep-toolu_def.txt [fs.grep, 1.1KB, 15:04:12] +// IsReadOnly: true, IsDestructive: false +``` + +`read_result.go` — `ReadResultTool` (name: `read_result`): +```go +// Parameters: path string (required) +// Validates: path must be prefixed with store.Dir() — no path traversal +// Returns: full file content +// IsReadOnly: true, IsDestructive: false +``` + +Both tools receive the `*persist.Store` as a constructor argument. + +**Coordinator system prompt injection** in `internal/engine/loop.go`: + +When `router.ClassifyTask()` returns `TaskOrchestration`, the engine prepends a coordinator block to the request's system prompt: + +``` +You are operating in coordinator mode. Your role is to decompose complex work into parallel tasks and orchestrate elfs. + +Rules: +- Use `spawn_elfs` to dispatch N tasks in parallel when they don't share write state. +- Use `list_results` to discover outputs produced by prior tool calls in this session. +- Pass result file paths to elfs in their prompts so they can read prior outputs with `read_result` or `fs.read`. +- Writes are serial: if two elfs would write the same file, sequence them. +- Synthesize elf outputs into a coherent final answer. +``` + +This prompt injection is conditional: only fires when `ClassifyTask(latestUserMessage).Type == TaskOrchestration`. It does not create a new engine mode. + +**Tool registration** in `main.go`: +```go +reg.Register(agent.NewListResultsTool(store)) +reg.Register(agent.NewReadResultTool(store)) +``` + +--- + +## File Map + +| File | Action | +|------|--------| +| `internal/tool/persist/store.go` | New | +| `internal/tool/persist/store_test.go` | New | +| `internal/tokenizer/tokenizer.go` | New | +| `internal/tokenizer/tokenizer_test.go` | New | +| `internal/router/feedback.go` | New | +| `internal/router/feedback_test.go` | New | +| `internal/tool/agent/list_results.go` | New | +| `internal/tool/agent/read_result.go` | New | +| `internal/engine/engine.go` | Modify: add `store`, `tokenizer` fields to `Config` | +| `internal/engine/loop.go` | Modify: replace `PersistLargeResult`, add coordinator prompt injection | +| `internal/context/tracker.go` | Modify: accept `*tokenizer.Tokenizer`, update `EstimateTokens` | +| `internal/context/window.go` | Modify: use `CountMessages`, fix `MaxTokens` derivation | +| `internal/context/persist.go` | Delete: retire `PersistLargeResult` / `TruncateToolResult` | +| `internal/router/router.go` | Modify: add `QualityTracker`, wire `ReportOutcome` | +| `internal/router/selector.go` | Modify: blend observed quality into `scoreArm()` | +| `internal/router/arm.go` | Modify: extend `Outcome` with `ResultFilePaths` | +| `internal/elf/manager.go` | Modify: accept and forward `*persist.Store` to elf engines | +| `cmd/gnoma/main.go` | Modify: init `Store`, `Tokenizer`, register new tools | +| `go.mod` | Modify: add `github.com/pkoukk/tiktoken-go` | + +--- + +## Verification + +1. **Persistence:** Run `echo "list all go files" | gnoma --provider anthropic` and check `/tmp/gnoma-*/tool-results/` for result files. Verify small results (<1KB) are absent, large ones present with preview in conversation. +2. **Tokenizer:** Set breakpoints or add `slog.Debug` in `Count()` to confirm tiktoken is invoked. Check that context window percentage in TUI tracks accurately against provider-reported token counts. +3. **Router feedback:** Spawn 5 elfs, mix of successes and failures. Check that `scoreArm()` values differ from pure heuristic via a debug log or test. Run `make test ./internal/router/...`. +4. **Coordinator:** Send a prompt containing "orchestrate" / "coordinate" to the TUI. Verify coordinator system prompt appears in the request (add a debug log or check via provider trace). Run a multi-elf workflow where elf B references elf A's `/tmp` output. +5. **Tests:** `make test` must pass. New packages have unit tests covering `Store`, `Tokenizer`, `QualityTracker`, and the two new tools.