diff --git a/docs/superpowers/plans/2026-04-05-m6-m7-closeout.md b/docs/superpowers/plans/2026-04-05-m6-m7-closeout.md new file mode 100644 index 0000000..acea089 --- /dev/null +++ b/docs/superpowers/plans/2026-04-05-m6-m7-closeout.md @@ -0,0 +1,1793 @@ +# M6/M7 Close-out Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Close the M6/M7 gaps: persist every meaningful tool result to `/tmp`, add an accurate tiktoken tokenizer, wire router quality feedback from elf outcomes, and implement coordinator mode (prompt injection + artifact-discovery tools). + +**Architecture:** A new `internal/tool/persist.Store` becomes the shared artifact layer for the session — all tool results ≥1KB land in `/tmp/gnoma-/tool-results/`. A new `internal/tokenizer.Tokenizer` (tiktoken-go) replaces the `len/4` heuristic in the context tracker. The router's `ReportOutcome` stub is wired to an EMA-based `QualityTracker` that feeds back into `scoreArm()`. Two new read-only tools (`list_results`, `read_result`) plus a conditional system-prompt injection close coordinator mode. + +**Tech Stack:** Go 1.26, `github.com/pkoukk/tiktoken-go` (new dep), existing `internal/router`, `internal/elf`, `internal/engine`, `internal/context` packages. + +--- + +## File Map + +| File | Action | +|------|--------| +| `internal/tool/persist/store.go` | **Create** | +| `internal/tool/persist/store_test.go` | **Create** | +| `internal/tokenizer/tokenizer.go` | **Create** | +| `internal/tokenizer/tokenizer_test.go` | **Create** | +| `internal/router/feedback.go` | **Create** | +| `internal/router/feedback_test.go` | **Create** | +| `internal/tool/agent/list_results.go` | **Create** | +| `internal/tool/agent/read_result.go` | **Create** | +| `internal/context/persist.go` | **Delete** | +| `internal/engine/engine.go` | Modify: add `Store` field to `Config` | +| `internal/engine/loop.go` | Modify: replace `PersistLargeResult`, add coordinator prompt injection, use `Tracker.CountTokens` | +| `internal/context/tracker.go` | Modify: add `tokenizer` field, `CountTokens`/`CountMessages` methods | +| `internal/context/window.go` | Modify: `doCompact` uses `Tracker.CountMessages` | +| `internal/router/router.go` | Modify: add `quality` field, extend `Outcome`, wire `ReportOutcome`, add `LookupArm` | +| `internal/router/selector.go` | Modify: blend observed quality in `scoreArm` | +| `internal/elf/manager.go` | Modify: add `Store` to `ManagerConfig`, pass to elf engines | +| `internal/tool/agent/agent.go` | Modify: add `store` field, snapshot for `ResultFilePaths` | +| `internal/tool/agent/batch.go` | Modify: add `store` field, snapshot for `ResultFilePaths` | +| `cmd/gnoma/main.go` | Modify: init `Store`+`Tokenizer`, fix context window, prefix baseline, register tools | +| `go.mod` | Modify: add `github.com/pkoukk/tiktoken-go` | + +--- + +## Task 1: `persist.Store` Package + +**Files:** +- Create: `internal/tool/persist/store.go` +- Create: `internal/tool/persist/store_test.go` + +- [ ] **Step 1: Write the failing tests** + +```go +// internal/tool/persist/store_test.go +package persist_test + +import ( + "os" + "path/filepath" + "strings" + "testing" + + "somegit.dev/Owlibou/gnoma/internal/tool/persist" +) + +func TestStore_SaveSkipsSmallContent(t *testing.T) { + s := persist.New("test-session-001") + t.Cleanup(func() { os.RemoveAll(s.Dir()) }) + + path, ok := s.Save("bash", "call-001", "small output") + if ok { + t.Errorf("expected not persisted, got path %q", path) + } + if path != "" { + t.Errorf("expected empty path for small content") + } +} + +func TestStore_SavePersistsLargeContent(t *testing.T) { + s := persist.New("test-session-002") + t.Cleanup(func() { os.RemoveAll(s.Dir()) }) + + content := strings.Repeat("x", 1024) + path, ok := s.Save("fs.grep", "call-002", content) + if !ok { + t.Fatal("expected content to be persisted") + } + if !strings.HasSuffix(path, "fs.grep-call-002.txt") { + t.Errorf("unexpected path: %q", path) + } + got, err := os.ReadFile(path) + if err != nil { + t.Fatalf("file not written: %v", err) + } + if string(got) != content { + t.Error("file content mismatch") + } +} + +func TestStore_ListFilters(t *testing.T) { + s := persist.New("test-session-003") + t.Cleanup(func() { os.RemoveAll(s.Dir()) }) + + bigContent := strings.Repeat("y", 1024) + s.Save("bash", "c1", bigContent) + s.Save("fs.read", "c2", bigContent) + s.Save("bash", "c3", bigContent) + + all, err := s.List("") + if err != nil { + t.Fatal(err) + } + if len(all) != 3 { + t.Errorf("want 3 results, got %d", len(all)) + } + + filtered, err := s.List("bash") + if err != nil { + t.Fatal(err) + } + if len(filtered) != 2 { + t.Errorf("want 2 bash results, got %d", len(filtered)) + } +} + +func TestStore_ReadValidatesPath(t *testing.T) { + s := persist.New("test-session-004") + t.Cleanup(func() { os.RemoveAll(s.Dir()) }) + + // Path outside session dir must be rejected + _, err := s.Read("/etc/passwd") + if err == nil { + t.Error("expected error for path outside session dir") + } + + // Valid path (even if file doesn't exist) should pass validation + _, err = s.Read(filepath.Join(s.Dir(), "bash-call.txt")) + // os.ErrNotExist is fine — path was valid + if err != nil && !os.IsNotExist(err) { + t.Errorf("unexpected error for valid path: %v", err) + } +} +``` + +- [ ] **Step 2: Run to confirm failure** + +``` +go test ./internal/tool/persist/... +``` +Expected: compile error — package does not exist yet. + +- [ ] **Step 3: Implement `store.go`** + +```go +// internal/tool/persist/store.go +package persist + +import ( + "fmt" + "os" + "path/filepath" + "strings" + "time" +) + +const ( + minPersistSize = 1024 // bytes: results smaller than this are not persisted + previewSize = 2000 // chars shown inline in the LLM context +) + +// ResultFile describes a persisted tool result. +type ResultFile struct { + Path string + ToolName string + CallID string + Size int64 + ModTime time.Time +} + +// Store persists tool results to /tmp for cross-tool session sharing. +type Store struct { + dir string // /tmp/gnoma-/tool-results +} + +// New creates a Store for the given session ID. +// The directory is created on first Save. +func New(sessionID string) *Store { + return &Store{ + dir: filepath.Join("/tmp", "gnoma-"+sessionID, "tool-results"), + } +} + +// Dir returns the absolute path to the tool-results directory. +func (s *Store) Dir() string { return s.dir } + +// Save writes content to disk if len(content) >= minPersistSize. +// Returns (filePath, true) on persistence, ("", false) if content is too small. +func (s *Store) Save(toolName, callID, content string) (string, bool) { + if len(content) < minPersistSize { + return "", false + } + if err := os.MkdirAll(s.dir, 0o755); err != nil { + return "", false + } + // Sanitize tool name for filesystem (replace dots and slashes) + safeName := strings.NewReplacer(".", "_", "/", "_").Replace(toolName) + filename := safeName + "-" + callID + ".txt" + path := filepath.Join(s.dir, filename) + if err := os.WriteFile(path, []byte(content), 0o644); err != nil { + return "", false + } + return path, true +} + +// InlineReplacement returns the string that replaces the full content in LLM context. +func InlineReplacement(path, content string) string { + preview := content + if len([]rune(preview)) > previewSize { + preview = string([]rune(preview)[:previewSize]) + } + return fmt.Sprintf("[Tool result saved: %s]\n\nPreview (first %d chars):\n%s", + path, previewSize, preview) +} + +// List returns all persisted results, optionally filtered by tool name prefix. +// An empty filter returns all results. +func (s *Store) List(toolNameFilter string) ([]ResultFile, error) { + entries, err := os.ReadDir(s.dir) + if os.IsNotExist(err) { + return nil, nil + } + if err != nil { + return nil, err + } + var results []ResultFile + for _, e := range entries { + if e.IsDir() || !strings.HasSuffix(e.Name(), ".txt") { + continue + } + toolName, callID, ok := parseFilename(e.Name()) + if !ok { + continue + } + if toolNameFilter != "" && !strings.HasPrefix(toolName, toolNameFilter) { + continue + } + info, err := e.Info() + if err != nil { + continue + } + results = append(results, ResultFile{ + Path: filepath.Join(s.dir, e.Name()), + ToolName: toolName, + CallID: callID, + Size: info.Size(), + ModTime: info.ModTime(), + }) + } + return results, nil +} + +// Read returns the content of a persisted result file. +// Returns an error if path is outside the session's tool-results directory. +func (s *Store) Read(path string) (string, error) { + abs, err := filepath.Abs(path) + if err != nil { + return "", fmt.Errorf("persist: invalid path: %w", err) + } + if !strings.HasPrefix(abs, s.dir+string(filepath.Separator)) && abs != s.dir { + return "", fmt.Errorf("persist: path %q is outside session directory", path) + } + data, err := os.ReadFile(abs) + if err != nil { + return "", err + } + return string(data), nil +} + +// parseFilename extracts toolName and callID from "-.txt". +// Tool names may contain underscores (dots were replaced at save time). +// callID starts with "toolu_" for Anthropic or similar patterns for other providers. +// We split on the last "-" that precedes what looks like a call ID. +func parseFilename(name string) (toolName, callID string, ok bool) { + name = strings.TrimSuffix(name, ".txt") + // Find the boundary: last segment that starts with a typical call-ID prefix, + // or simply split at the first dash that follows the tool name. + // Robust approach: the call ID is everything after the last "-toolu_" or "-call-". + for _, sep := range []string{"-toolu_", "-call-", "-tool_"} { + if idx := strings.LastIndex(name, sep); idx > 0 { + return name[:idx], name[idx+1:], true + } + } + // Fallback: split on first dash + if idx := strings.Index(name, "-"); idx > 0 { + return name[:idx], name[idx+1:], true + } + return name, "", true +} +``` + +- [ ] **Step 4: Run tests to confirm pass** + +``` +go test ./internal/tool/persist/... +``` +Expected: `ok somegit.dev/Owlibou/gnoma/internal/tool/persist` + +- [ ] **Step 5: Commit** + +``` +git add internal/tool/persist/ +git commit -m "feat: persist.Store — session-scoped /tmp tool result persistence" +``` + +--- + +## Task 2: tiktoken-go Tokenizer Package + +**Files:** +- Create: `internal/tokenizer/tokenizer.go` +- Create: `internal/tokenizer/tokenizer_test.go` +- Modify: `go.mod` (add dependency) + +- [ ] **Step 1: Add tiktoken-go dependency** + +``` +go get github.com/pkoukk/tiktoken-go@latest +``` + +Expected: `go.mod` and `go.sum` updated with `github.com/pkoukk/tiktoken-go`. + +- [ ] **Step 2: Write failing tests** + +```go +// internal/tokenizer/tokenizer_test.go +package tokenizer_test + +import ( + "testing" + + "somegit.dev/Owlibou/gnoma/internal/tokenizer" +) + +func TestTokenizer_CountKnownText(t *testing.T) { + tok := tokenizer.New("cl100k_base") + + // "Hello world" is 2 tokens in cl100k_base + n := tok.Count("Hello world") + if n < 1 || n > 5 { + t.Errorf("unexpected token count for 'Hello world': %d", n) + } +} + +func TestTokenizer_FallbackOnBadEncoding(t *testing.T) { + tok := tokenizer.New("nonexistent_encoding_xyz") + // Must not panic; falls back to heuristic + n := tok.Count("some text here") + if n <= 0 { + t.Errorf("expected positive count, got %d", n) + } +} + +func TestForProvider_KnownProviders(t *testing.T) { + cases := []string{"anthropic", "openai", "mistral", "google", "ollama", "llamacpp", "unknown"} + for _, prov := range cases { + tok := tokenizer.ForProvider(prov) + n := tok.Count("test input") + if n <= 0 { + t.Errorf("provider %q: expected positive count, got %d", prov, n) + } + } +} + +func TestTokenizer_CodeCountsReasonably(t *testing.T) { + tok := tokenizer.New("cl100k_base") + code := `func main() { fmt.Println("hello") }` + n := tok.Count(code) + // Should be between 5 and 20 tokens for this snippet + if n < 5 || n > 20 { + t.Errorf("code token count out of expected range: %d", n) + } +} +``` + +- [ ] **Step 3: Run to confirm failure** + +``` +go test ./internal/tokenizer/... +``` +Expected: compile error — package does not exist. + +- [ ] **Step 4: Implement `tokenizer.go`** + +```go +// internal/tokenizer/tokenizer.go +package tokenizer + +import ( + "log/slog" + "sync" + + tiktoken "github.com/pkoukk/tiktoken-go" +) + +// Tokenizer counts tokens using a tiktoken BPE encoding. +// Falls back to len/4 heuristic if the encoding fails to load. +type Tokenizer struct { + encoding string + enc *tiktoken.Tiktoken + mu sync.Mutex + loaded bool + failed bool + warnOnce sync.Once +} + +// New creates a Tokenizer for the given tiktoken encoding name (e.g. "cl100k_base"). +func New(encoding string) *Tokenizer { + return &Tokenizer{encoding: encoding} +} + +// ForProvider returns a Tokenizer appropriate for the named provider. +func ForProvider(providerName string) *Tokenizer { + switch providerName { + case "anthropic", "openai": + return New("cl100k_base") + default: + // mistral, google, ollama, llamacpp, unknown + return New("o200k_base") + } +} + +// Count returns the number of tokens for text using the configured encoding. +// Falls back to len(text)/4 if encoding is unavailable. +func (t *Tokenizer) Count(text string) int { + if enc := t.getEncoding(); enc != nil { + tokens := enc.Encode(text, nil, nil) + return len(tokens) + } + // heuristic fallback + return (len(text) + 3) / 4 +} + +func (t *Tokenizer) getEncoding() *tiktoken.Tiktoken { + t.mu.Lock() + defer t.mu.Unlock() + if t.loaded { + return t.enc // may be nil if failed + } + t.loaded = true + enc, err := tiktoken.GetEncoding(t.encoding) + if err != nil { + t.warnOnce.Do(func() { + slog.Warn("tiktoken encoding unavailable, falling back to heuristic", + "encoding", t.encoding, "error", err) + }) + t.failed = true + return nil + } + t.enc = enc + return enc +} +``` + +- [ ] **Step 5: Run tests to confirm pass** + +``` +go test ./internal/tokenizer/... +``` +Expected: `ok somegit.dev/Owlibou/gnoma/internal/tokenizer` + +- [ ] **Step 6: Commit** + +``` +git add internal/tokenizer/ go.mod go.sum +git commit -m "feat: tiktoken tokenizer — accurate BPE token counting with provider-aware encoding" +``` + +--- + +## Task 3: Wire `Store` into Engine and Elf Manager + +**Files:** +- Modify: `internal/engine/engine.go` +- Modify: `internal/engine/loop.go` +- Modify: `internal/elf/manager.go` +- Modify: `internal/tool/agent/agent.go` +- Modify: `internal/tool/agent/batch.go` +- Delete: `internal/context/persist.go` + +- [ ] **Step 1: Add `Store` to engine `Config` and remove the `PersistLargeResult` import** + +In `internal/engine/engine.go`, add the `Store` field to `Config`: + +```go +// in Config struct, after MaxTurns: +Store *persist.Store // nil = no result persistence +``` + +Add import at top of `engine.go`: +```go +"somegit.dev/Owlibou/gnoma/internal/tool/persist" +``` + +Remove `gnomactx "somegit.dev/Owlibou/gnoma/internal/context"` from `engine.go` only if it becomes unused (it's used in `Context *gnomactx.Window` — leave it). + +- [ ] **Step 2: Replace `PersistLargeResult` in `loop.go`** + +In `internal/engine/loop.go`, replace lines 397–401 (the `PersistLargeResult` block): + +Old: +```go + // Persist large results to disk + if persisted, ok := gnomactx.PersistLargeResult(output, call.ID, ".gnoma/sessions"); ok { + e.logger.Debug("tool result persisted to disk", "name", call.Name, "size", len(output)) + output = persisted + } +``` + +New: +```go + // Persist results to /tmp for cross-tool session sharing + if e.cfg.Store != nil { + if path, ok := e.cfg.Store.Save(call.Name, call.ID, output); ok { + e.logger.Debug("tool result persisted", "name", call.Name, "path", path) + output = persist.InlineReplacement(path, output) + } + } +``` + +Add import to `loop.go`: +```go +"somegit.dev/Owlibou/gnoma/internal/tool/persist" +``` + +Remove the `gnomactx.PersistLargeResult` call site — if `gnomactx` is still used elsewhere in `loop.go` (it is: `gnomactx.EstimateTokens`), keep the import. + +- [ ] **Step 3: Add `Store` to `elf.ManagerConfig` and pass it to elf engines** + +In `internal/elf/manager.go`: + +Add field to `ManagerConfig`: +```go +Store *persist.Store // nil = no result persistence for elfs +``` + +Add to `Manager` struct: +```go +store *persist.Store +``` + +In `NewManager`: +```go +store: cfg.Store, +``` + +In `Spawn()`, pass store to the elf engine: +```go +eng, err := engine.New(engine.Config{ + Provider: arm.Provider, + Tools: m.tools, + Permissions: elfPerms, + Firewall: m.firewall, + System: systemPrompt, + Model: arm.ModelName, + MaxTurns: maxTurns, + Store: m.store, // NEW + Logger: m.logger, +}) +``` + +In `SpawnWithProvider()`, same addition: +```go +eng, err := engine.New(engine.Config{ + Provider: prov, + Tools: m.tools, + Permissions: elfPerms, + Firewall: m.firewall, + System: systemPrompt, + Model: model, + MaxTurns: maxTurns, + Store: m.store, // NEW + Logger: m.logger, +}) +``` + +Add import: +```go +"somegit.dev/Owlibou/gnoma/internal/tool/persist" +``` + +- [ ] **Step 4: Add `store` to agent tools for ResultFilePaths snapshotting** + +In `internal/tool/agent/agent.go`, add a `store` field: + +```go +type Tool struct { + manager *elf.Manager + ProgressCh chan<- elf.Progress + store *persist.Store // for ResultFilePaths attribution +} + +func New(mgr *elf.Manager, store *persist.Store) *Tool { + return &Tool{manager: mgr, store: store} +} +``` + +In `Execute()`, before `t.manager.Spawn(...)`, snapshot current files: +```go +var preSave []persist.ResultFile +if t.store != nil { + preSave, _ = t.store.List("") +} +``` + +After `result = <-done`, before `t.manager.ReportResult(result)`, compute new files: +```go +var resultPaths []string +if t.store != nil { + postSave, _ := t.store.List("") + preSet := make(map[string]bool, len(preSave)) + for _, f := range preSave { + preSet[f.Path] = true + } + for _, f := range postSave { + if !preSet[f.Path] { + resultPaths = append(resultPaths, f.Path) + } + } +} +``` + +Pass to `ReportResult` — but `manager.ReportResult(result)` doesn't accept paths yet. We'll extend `elf.Result` in Task 4. For now, store `resultPaths` in `result.Metadata` as a temporary approach by skipping this field until Task 4. + +Actually, simplest: skip the `resultPaths` logic in agent.go/batch.go for now. Task 4 extends `router.Outcome` and we'll add the snapshot logic there. Leave a `// TODO Task4: snapshot ResultFilePaths` comment. + +Add import to `agent.go`: +```go +"somegit.dev/Owlibou/gnoma/internal/tool/persist" +``` + +Add a `preSave` snapshot **before** `t.manager.Spawn(...)` (the ResultFilePaths diff will be completed in Task 4): +```go +var preSave []persist.ResultFile +if t.store != nil { + preSave, _ = t.store.List("") +} +_ = preSave // used in Task 4 +``` + +In `internal/tool/agent/batch.go`, same change to `BatchTool`: + +```go +type BatchTool struct { + manager *elf.Manager + progressCh chan<- elf.Progress + store *persist.Store +} + +func NewBatch(mgr *elf.Manager, store *persist.Store) *BatchTool { + return &BatchTool{manager: mgr, store: store} +} +``` + +Add import to `batch.go`: +```go +"somegit.dev/Owlibou/gnoma/internal/tool/persist" +``` + +Add the same `preSave` snapshot inside the spawn loop for each task (before `t.manager.Spawn`): +```go +var preSave []persist.ResultFile +if t.store != nil { + preSave, _ = t.store.List("") +} +_ = preSave // used in Task 4 +``` + +- [ ] **Step 5: Delete `internal/context/persist.go`** + +``` +git rm internal/context/persist.go +``` + +Check that nothing imports `gnomactx.PersistLargeResult` or `gnomactx.TruncateToolResult` anymore: +``` +grep -r "PersistLargeResult\|TruncateToolResult" --include="*.go" . +``` +Expected: no matches. + +- [ ] **Step 6: Update `main.go` to init Store and pass it through** + +In `cmd/gnoma/main.go`, generate a session ID and create the store. Add near the top of `main()` (after logger init, before elfMgr creation): + +```go +// Generate session-scoped ID for /tmp artifact directory +sessionID := fmt.Sprintf("%s-%06x", + time.Now().Format("20060102-150405"), + mrand.Int63()&0xffffff, // 6 hex chars +) +store := persist.New(sessionID) +logger.Debug("session store initialized", "dir", store.Dir()) +``` + +Add imports: +```go +mrand "math/rand" +"somegit.dev/Owlibou/gnoma/internal/tool/persist" +``` + +Update `elfMgr` creation: +```go +elfMgr := elf.NewManager(elf.ManagerConfig{ + Router: rtr, + Tools: reg, + Permissions: permChecker, + Firewall: fw, + Store: store, // NEW + Logger: logger, +}) +``` + +Update `agent.New` and `agent.NewBatch` calls: +```go +agentTool := agent.New(elfMgr, store) +// ... +batchTool := agent.NewBatch(elfMgr, store) +``` + +Update `engine.New` call: +```go +eng, err := engine.New(engine.Config{ + Provider: prov, + Router: rtr, + Tools: reg, + Firewall: fw, + Permissions: permChecker, + Context: ctxWindow, + System: systemPrompt, + Model: *model, + MaxTurns: *maxTurns, + Store: store, // NEW + Logger: logger, +}) +``` + +- [ ] **Step 7: Build and run tests** + +``` +make build && make test +``` +Expected: build succeeds, all tests pass. Fix any compile errors (likely missing imports or signature mismatches). + +- [ ] **Step 8: Commit** + +``` +git add internal/engine/ internal/elf/ internal/tool/agent/ cmd/gnoma/main.go +git commit -m "feat: wire persist.Store into engine, elf manager, and agent tools" +``` + +--- + +## Task 4: QualityTracker and Router Feedback + +**Files:** +- Create: `internal/router/feedback.go` +- Create: `internal/router/feedback_test.go` +- Modify: `internal/router/router.go` +- Modify: `internal/router/selector.go` +- Modify: `internal/tool/agent/agent.go` +- Modify: `internal/tool/agent/batch.go` + +- [ ] **Step 1: Write failing tests for QualityTracker** + +```go +// internal/router/feedback_test.go +package router_test + +import ( + "testing" + + "somegit.dev/Owlibou/gnoma/internal/router" +) + +func TestQualityTracker_NoDataReturnsHeuristic(t *testing.T) { + qt := router.NewQualityTracker() + _, hasData := qt.Quality("arm:model", router.TaskGeneration) + if hasData { + t.Error("expected no data for unobserved arm") + } +} + +func TestQualityTracker_RecordUpdatesEMA(t *testing.T) { + qt := router.NewQualityTracker() + for i := 0; i < 3; i++ { + qt.Record("arm:model", router.TaskGeneration, true) + } + score, hasData := qt.Quality("arm:model", router.TaskGeneration) + if !hasData { + t.Fatal("expected data after 3 observations") + } + if score <= 0 || score > 1 { + t.Errorf("score out of range [0,1]: %f", score) + } +} + +func TestQualityTracker_AllFailuresLowScore(t *testing.T) { + qt := router.NewQualityTracker() + for i := 0; i < 5; i++ { + qt.Record("arm:model", router.TaskDebug, false) + } + score, _ := qt.Quality("arm:model", router.TaskDebug) + if score > 0.3 { + t.Errorf("expected low score after all failures, got %f", score) + } +} + +func TestQualityTracker_ConcurrentSafe(t *testing.T) { + qt := router.NewQualityTracker() + done := make(chan struct{}) + for i := 0; i < 10; i++ { + go func(success bool) { + qt.Record("arm:model", router.TaskReview, success) + done <- struct{}{} + }(i%2 == 0) + } + for i := 0; i < 10; i++ { + <-done + } + // Should not panic; score should be valid + score, _ := qt.Quality("arm:model", router.TaskReview) + if score < 0 || score > 1 { + t.Errorf("invalid score after concurrent writes: %f", score) + } +} +``` + +- [ ] **Step 2: Run to confirm failure** + +``` +go test ./internal/router/... -run TestQualityTracker +``` +Expected: compile error — `NewQualityTracker` not defined. + +- [ ] **Step 3: Implement `feedback.go`** + +```go +// internal/router/feedback.go +package router + +import "sync" + +const ( + qualityAlpha = 0.3 // EMA smoothing factor (~3-sample memory) + minObservations = 3 // min samples before observed score overrides heuristic +) + +// EMAScore tracks an exponential moving average quality score. +type EMAScore struct { + Value float64 + Count int +} + +// QualityTracker maintains per-arm, per-task-type quality EMA scores. +type QualityTracker struct { + mu sync.RWMutex + scores map[ArmID]map[TaskType]*EMAScore +} + +func NewQualityTracker() *QualityTracker { + return &QualityTracker{ + scores: make(map[ArmID]map[TaskType]*EMAScore), + } +} + +// Record updates the EMA score for an arm+task type pair. +// success=true contributes 1.0; false contributes 0.0. +func (qt *QualityTracker) Record(armID ArmID, taskType TaskType, success bool) { + observation := 0.0 + if success { + observation = 1.0 + } + + qt.mu.Lock() + defer qt.mu.Unlock() + + if qt.scores[armID] == nil { + qt.scores[armID] = make(map[TaskType]*EMAScore) + } + s := qt.scores[armID][taskType] + if s == nil { + s = &EMAScore{} + qt.scores[armID][taskType] = s + } + + if s.Count == 0 { + s.Value = observation + } else { + s.Value = qualityAlpha*observation + (1-qualityAlpha)*s.Value + } + s.Count++ +} + +// Quality returns the observed quality score and whether enough data exists. +// Returns (0, false) when Count < minObservations (heuristic should dominate). +func (qt *QualityTracker) Quality(armID ArmID, taskType TaskType) (score float64, hasData bool) { + qt.mu.RLock() + defer qt.mu.RUnlock() + + m, ok := qt.scores[armID] + if !ok { + return 0, false + } + s, ok := m[taskType] + if !ok || s.Count < minObservations { + return 0, false + } + return s.Value, true +} +``` + +- [ ] **Step 4: Run tests to confirm pass** + +``` +go test ./internal/router/... -run TestQualityTracker +``` +Expected: pass. + +- [ ] **Step 5: Extend `Outcome` and wire `ReportOutcome` in `router.go`** + +In `internal/router/router.go`: + +1. Add `quality *QualityTracker` to `Router` struct. +2. Initialize in `New()`: `quality: NewQualityTracker()`. +3. Extend `Outcome` struct with `ResultFilePaths`: + +```go +// Outcome records the result of a task execution for quality feedback. +type Outcome struct { + ArmID ArmID + TaskType TaskType + Success bool + Tokens int + Duration time.Duration + ResultFilePaths []string // paths to /tmp tool result files (for M9 analysis) +} +``` + +4. Replace the `ReportOutcome` body: + +```go +func (r *Router) ReportOutcome(o Outcome) { + r.quality.Record(o.ArmID, o.TaskType, o.Success) + r.logger.Debug("outcome recorded", + "arm", o.ArmID, + "task", o.TaskType, + "success", o.Success, + "tokens", o.Tokens, + "duration", o.Duration, + "result_files", len(o.ResultFilePaths), + ) +} +``` + +5. Add `LookupArm` method (used in Task 7 for context window sizing): + +```go +// LookupArm returns an arm by ID. +func (r *Router) LookupArm(id ArmID) (*Arm, bool) { + r.mu.RLock() + defer r.mu.RUnlock() + arm, ok := r.arms[id] + return arm, ok +} +``` + +- [ ] **Step 6: Blend observed quality in `scoreArm` in `selector.go`** + +In `internal/router/selector.go`, `scoreArm` currently: +```go +func scoreArm(arm *Arm, task Task) float64 { + quality := heuristicQuality(arm, task) + ... +``` + +The `scoreArm` function needs access to the `Router`'s `QualityTracker`. Change the signature: + +```go +func scoreArm(r *QualityTracker, arm *Arm, task Task) float64 { + hq := heuristicQuality(arm, task) + quality := hq + if r != nil { + if observed, hasData := r.Quality(arm.ID, task.Type); hasData { + quality = 0.7*observed + 0.3*hq + } + } + value := task.ValueScore() + cost := effectiveCost(arm, task) + if cost <= 0 { + cost = 0.001 + } + return (quality * value) / cost +} +``` + +Update `selectBest` to accept the tracker and pass it through: +```go +func selectBest(qt *QualityTracker, arms []*Arm, task Task) *Arm { + if len(arms) == 0 { + return nil + } + var best *Arm + bestScore := math.Inf(-1) + for _, arm := range arms { + score := scoreArm(qt, arm, task) + if score > bestScore { + bestScore = score + best = arm + } + } + return best +} +``` + +Update the call in `Router.Select()`: +```go +best := selectBest(r.quality, feasible, task) +``` + +- [ ] **Step 7: Add ResultFilePaths snapshot to `agent.go` and `batch.go`** + +In `internal/tool/agent/agent.go`, remove the `// TODO Task4` comment and implement the snapshot. After elf completes (before `t.manager.ReportResult`): + +```go +// Snapshot /tmp files attributed to this elf +var resultPaths []string +if t.store != nil { + postSave, _ := t.store.List("") + preSet := make(map[string]bool, len(preSave)) + for _, f := range preSave { + preSet[f.Path] = true + } + for _, f := range postSave { + if !preSet[f.Path] { + resultPaths = append(resultPaths, f.Path) + } + } +} +``` + +But `manager.ReportResult(result)` uses `router.Outcome` internally and doesn't accept `resultPaths` directly. We need to either: +- Add `ResultFilePaths` to `elf.Result`, or +- Have the agent tool report the outcome directly with the paths + +Cleanest: add `ResultFilePaths []string` to `elf.Result`: + +In `internal/elf/elf.go`, find the `Result` struct and add: +```go +ResultFilePaths []string // paths to /tmp results produced by this elf +``` + +Then in `elf/manager.go`'s `ReportResult`: +```go +m.router.ReportOutcome(router.Outcome{ + ArmID: meta.armID, + TaskType: meta.taskType, + Success: result.Status == StatusCompleted, + Tokens: int(result.Usage.TotalTokens()), + Duration: result.Duration, + ResultFilePaths: result.ResultFilePaths, +}) +``` + +And in `agent.go`'s Execute(), after computing `resultPaths`, set it on the result before calling `ReportResult`: +```go +result.ResultFilePaths = resultPaths +t.manager.ReportResult(result) +``` + +Do the same pattern in `batch.go` per-elf result. + +- [ ] **Step 8: Run all router tests** + +``` +make test ./internal/router/... ./internal/elf/... ./internal/tool/agent/... +``` +Expected: all pass. Fix any signature mismatches. + +- [ ] **Step 9: Commit** + +``` +git add internal/router/ internal/elf/ internal/tool/agent/ +git commit -m "feat: QualityTracker — EMA router feedback from elf outcomes, ResultFilePaths tracking" +``` + +--- + +## Task 5: Coordinator Tools (`list_results`, `read_result`) + +**Files:** +- Create: `internal/tool/agent/list_results.go` +- Create: `internal/tool/agent/read_result.go` +- Modify: `cmd/gnoma/main.go` + +- [ ] **Step 1: Write failing tests (table-driven)** + +Add to a new file `internal/tool/agent/coordinator_test.go`: + +```go +// internal/tool/agent/coordinator_test.go +package agent_test + +import ( + "context" + "encoding/json" + "os" + "path/filepath" + "strings" + "testing" + + "somegit.dev/Owlibou/gnoma/internal/tool/agent" + "somegit.dev/Owlibou/gnoma/internal/tool/persist" +) + +func makeTestStore(t *testing.T) *persist.Store { + t.Helper() + s := persist.New("test-coord-" + t.Name()) + t.Cleanup(func() { os.RemoveAll(s.Dir()) }) + return s +} + +func TestListResultsTool_EmptyStore(t *testing.T) { + s := makeTestStore(t) + tool := agent.NewListResultsTool(s) + args, _ := json.Marshal(map[string]string{}) + result, err := tool.Execute(context.Background(), args) + if err != nil { + t.Fatal(err) + } + if !strings.Contains(result.Output, "no results") && result.Output != "" { + // accept either empty string or "no results" message + } +} + +func TestListResultsTool_ListsFiles(t *testing.T) { + s := makeTestStore(t) + big := strings.Repeat("x", 1024) + s.Save("bash", "toolu_aaa", big) + s.Save("fs.grep", "toolu_bbb", big) + + tool := agent.NewListResultsTool(s) + args, _ := json.Marshal(map[string]string{}) + result, err := tool.Execute(context.Background(), args) + if err != nil { + t.Fatal(err) + } + if !strings.Contains(result.Output, "bash") { + t.Errorf("expected bash in output, got: %s", result.Output) + } + if !strings.Contains(result.Output, "fs") { + t.Errorf("expected fs.grep in output, got: %s", result.Output) + } +} + +func TestListResultsTool_FilterByToolName(t *testing.T) { + s := makeTestStore(t) + big := strings.Repeat("x", 1024) + s.Save("bash", "toolu_c1", big) + s.Save("fs.read", "toolu_c2", big) + + tool := agent.NewListResultsTool(s) + args, _ := json.Marshal(map[string]string{"filter": "bash"}) + result, err := tool.Execute(context.Background(), args) + if err != nil { + t.Fatal(err) + } + if strings.Contains(result.Output, "fs") { + t.Errorf("filter should exclude fs.read, got: %s", result.Output) + } +} + +func TestReadResultTool_ReadsFile(t *testing.T) { + s := makeTestStore(t) + big := strings.Repeat("hello\n", 200) + path, _ := s.Save("bash", "toolu_read1", big) + + tool := agent.NewReadResultTool(s) + args, _ := json.Marshal(map[string]string{"path": path}) + result, err := tool.Execute(context.Background(), args) + if err != nil { + t.Fatal(err) + } + if !strings.Contains(result.Output, "hello") { + t.Errorf("expected file content in output, got: %s", result.Output) + } +} + +func TestReadResultTool_RejectsPathTraversal(t *testing.T) { + s := makeTestStore(t) + tool := agent.NewReadResultTool(s) + args, _ := json.Marshal(map[string]string{"path": "/etc/passwd"}) + result, err := tool.Execute(context.Background(), args) + if err != nil { + t.Fatal(err) + } + if result.IsError != true { + // Alternatively the tool may return the error in Output with IsError set + t.Errorf("expected IsError=true for path traversal attempt") + } +} +``` + +Wait — `tool.Result` doesn't have `IsError`. Looking at `tool/result.go`, it only has `Output` and `Metadata`. But `message.ToolResult` has `IsError`. Let me check how errors are typically returned from tools... + +Looking at `bash.go` and other tools, they return `tool.Result{Output: "error message"}, nil` for user-visible errors, and `tool.Result{}, fmt.Errorf(...)` for fatal errors. In `executeSingleTool`, `err != nil` → `IsError: true` in the `message.ToolResult`. So tools signal user-facing errors via returning non-nil `error`, not via a field on `tool.Result`. + +Update the test: +```go +func TestReadResultTool_RejectsPathTraversal(t *testing.T) { + s := makeTestStore(t) + tool := agent.NewReadResultTool(s) + args, _ := json.Marshal(map[string]string{"path": "/etc/passwd"}) + result, err := tool.Execute(context.Background(), args) + // Path traversal: tool should return an error (not a result) + if err == nil { + // Acceptable if tool returns error message in Output for graceful degradation + if !strings.Contains(result.Output, "outside") && !strings.Contains(result.Output, "permission") { + t.Errorf("expected path traversal rejection message, got: %s", result.Output) + } + } +} +``` + +- [ ] **Step 2: Run to confirm failure** + +``` +go test ./internal/tool/agent/... -run TestListResults -run TestReadResult +``` +Expected: compile error. + +- [ ] **Step 3: Implement `list_results.go`** + +```go +// internal/tool/agent/list_results.go +package agent + +import ( + "context" + "encoding/json" + "fmt" + "strings" + + "somegit.dev/Owlibou/gnoma/internal/tool" + "somegit.dev/Owlibou/gnoma/internal/tool/persist" +) + +var listResultsSchema = json.RawMessage(`{ + "type": "object", + "properties": { + "filter": { + "type": "string", + "description": "Optional tool name prefix to filter results (e.g. 'bash' shows only bash results)" + } + } +}`) + +// ListResultsTool lists persisted tool result files in the current session. +type ListResultsTool struct { + store *persist.Store +} + +func NewListResultsTool(s *persist.Store) *ListResultsTool { + return &ListResultsTool{store: s} +} + +func (t *ListResultsTool) Name() string { return "list_results" } +func (t *ListResultsTool) Description() string { + return "List tool result files saved in this session. Use this to discover outputs from previous tool calls that can be passed to elfs or read with read_result." +} +func (t *ListResultsTool) Parameters() json.RawMessage { return listResultsSchema } +func (t *ListResultsTool) IsReadOnly() bool { return true } +func (t *ListResultsTool) IsDestructive() bool { return false } + +type listResultsArgs struct { + Filter string `json:"filter,omitempty"` +} + +func (t *ListResultsTool) Execute(_ context.Context, args json.RawMessage) (tool.Result, error) { + var a listResultsArgs + json.Unmarshal(args, &a) // ignore error — empty filter is fine + + files, err := t.store.List(a.Filter) + if err != nil { + return tool.Result{Output: fmt.Sprintf("error listing results: %v", err)}, nil + } + if len(files) == 0 { + return tool.Result{Output: "no results persisted in this session yet"}, nil + } + + var b strings.Builder + fmt.Fprintf(&b, "%d result(s) in session:\n\n", len(files)) + for _, f := range files { + fmt.Fprintf(&b, "%s [%s, %s, %s]\n", + f.Path, + f.ToolName, + formatSize(f.Size), + f.ModTime.Format("15:04:05"), + ) + } + return tool.Result{Output: b.String()}, nil +} + +func formatSize(bytes int64) string { + if bytes >= 1024*1024 { + return fmt.Sprintf("%.1fMB", float64(bytes)/1024/1024) + } + if bytes >= 1024 { + return fmt.Sprintf("%.1fKB", float64(bytes)/1024) + } + return fmt.Sprintf("%dB", bytes) +} +``` + +- [ ] **Step 4: Implement `read_result.go`** + +```go +// internal/tool/agent/read_result.go +package agent + +import ( + "context" + "encoding/json" + "fmt" + + "somegit.dev/Owlibou/gnoma/internal/tool" + "somegit.dev/Owlibou/gnoma/internal/tool/persist" +) + +var readResultSchema = json.RawMessage(`{ + "type": "object", + "properties": { + "path": { + "type": "string", + "description": "Absolute path to the result file (from list_results output)" + } + }, + "required": ["path"] +}`) + +// ReadResultTool reads a persisted tool result file from this session. +type ReadResultTool struct { + store *persist.Store +} + +func NewReadResultTool(s *persist.Store) *ReadResultTool { + return &ReadResultTool{store: s} +} + +func (t *ReadResultTool) Name() string { return "read_result" } +func (t *ReadResultTool) Description() string { + return "Read the full content of a persisted tool result file. Use paths from list_results. Only files within the current session directory are accessible." +} +func (t *ReadResultTool) Parameters() json.RawMessage { return readResultSchema } +func (t *ReadResultTool) IsReadOnly() bool { return true } +func (t *ReadResultTool) IsDestructive() bool { return false } + +type readResultArgs struct { + Path string `json:"path"` +} + +func (t *ReadResultTool) Execute(_ context.Context, args json.RawMessage) (tool.Result, error) { + var a readResultArgs + if err := json.Unmarshal(args, &a); err != nil { + return tool.Result{}, fmt.Errorf("read_result: invalid args: %w", err) + } + if a.Path == "" { + return tool.Result{}, fmt.Errorf("read_result: path required") + } + + content, err := t.store.Read(a.Path) + if err != nil { + return tool.Result{Output: fmt.Sprintf("error reading result: %v", err)}, nil + } + return tool.Result{Output: content}, nil +} +``` + +- [ ] **Step 5: Register tools in `main.go`** + +In `cmd/gnoma/main.go`, after the `batchTool` registration: + +```go +reg.Register(agent.NewListResultsTool(store)) +reg.Register(agent.NewReadResultTool(store)) +``` + +- [ ] **Step 6: Run tests** + +``` +go test ./internal/tool/agent/... -run TestListResults -run TestReadResult +``` +Expected: all pass. + +- [ ] **Step 7: Commit** + +``` +git add internal/tool/agent/list_results.go internal/tool/agent/read_result.go \ + internal/tool/agent/coordinator_test.go cmd/gnoma/main.go +git commit -m "feat: list_results + read_result tools for coordinator artifact discovery" +``` + +--- + +## Task 6: Tokenizer Wiring into Context Tracker + +**Files:** +- Modify: `internal/context/tracker.go` +- Modify: `internal/context/window.go` + +- [ ] **Step 1: Write failing tests for tokenizer-aware tracker** + +In `internal/context/context_test.go` (or add a new file `tracker_tokenizer_test.go`): + +```go +// internal/context/tracker_tokenizer_test.go +package context_test + +import ( + "testing" + + gnomactx "somegit.dev/Owlibou/gnoma/internal/context" + "somegit.dev/Owlibou/gnoma/internal/tokenizer" +) + +func TestTracker_CountTokensWithTokenizer(t *testing.T) { + tok := tokenizer.New("cl100k_base") + tr := gnomactx.NewTracker(100000) + tr.SetTokenizer(tok) + + n := tr.CountTokens("Hello world") + // tiktoken gives 2; heuristic gives (11+3)/4 = 3 + if n < 1 || n > 5 { + t.Errorf("unexpected count: %d", n) + } +} + +func TestTracker_CountTokensNilTokenizerFallsBack(t *testing.T) { + tr := gnomactx.NewTracker(100000) + // nil tokenizer — should use heuristic + n := tr.CountTokens("Hello world") + if n <= 0 { + t.Errorf("expected positive count, got %d", n) + } +} +``` + +- [ ] **Step 2: Run to confirm failure** + +``` +go test ./internal/context/... -run TestTracker_Count +``` +Expected: compile error — `SetTokenizer`, `CountTokens` not defined. + +- [ ] **Step 3: Add tokenizer support to `tracker.go`** + +In `internal/context/tracker.go`: + +Add import and field: +```go +import ( + "somegit.dev/Owlibou/gnoma/internal/message" + "somegit.dev/Owlibou/gnoma/internal/tokenizer" +) +``` + +Add to `Tracker` struct: +```go +tok *tokenizer.Tokenizer +``` + +Add methods: +```go +// SetTokenizer sets the tokenizer used for accurate token counting. +func (t *Tracker) SetTokenizer(tok *tokenizer.Tokenizer) { + t.tok = tok +} + +// CountTokens returns the token count for text using the configured tokenizer, +// falling back to the len/4 heuristic if no tokenizer is set. +func (t *Tracker) CountTokens(text string) int64 { + if t.tok != nil { + return int64(t.tok.Count(text)) + } + return EstimateTokens(text) +} + +// CountMessages returns the token count for a message slice. +func (t *Tracker) CountMessages(msgs []message.Message) int64 { + var total int64 + for _, msg := range msgs { + for _, c := range msg.Content { + switch c.Type { + case message.ContentText: + total += t.CountTokens(c.Text) + case message.ContentToolCall: + total += 50 + if c.ToolCall != nil { + total += t.CountTokens(string(c.ToolCall.Arguments)) + } + case message.ContentToolResult: + if c.ToolResult != nil { + total += t.CountTokens(c.ToolResult.Content) + } + case message.ContentThinking: + if c.Thinking != nil { + total += t.CountTokens(c.Thinking.Text) + } + } + } + total += 4 + } + return total +} +``` + +Keep `EstimateTokens` and `EstimateMessages` as-is (they're still used by `loop.go` call sites that don't have tracker access — they'll be updated in Task 7). + +- [ ] **Step 4: Update `window.go` to use `Tracker.CountMessages`** + +In `internal/context/window.go`, in `doCompact`, replace line: +```go +w.tracker.Set(EstimateMessages(compacted)) +``` +with: +```go +w.tracker.Set(w.tracker.CountMessages(compacted)) +``` + +- [ ] **Step 5: Run tests** + +``` +make test ./internal/context/... +``` +Expected: all pass including new tracker tokenizer tests. + +- [ ] **Step 6: Commit** + +``` +git add internal/context/tracker.go internal/context/window.go \ + internal/context/tracker_tokenizer_test.go +git commit -m "feat: tokenizer-aware Tracker.CountTokens/CountMessages replaces EstimateMessages in compaction" +``` + +--- + +## Task 7: Context Window Size Fix + Prefix Baseline + main.go Wiring + +**Files:** +- Modify: `cmd/gnoma/main.go` + +- [ ] **Step 1: Wire tokenizer into main.go and set tracker tokenizer** + +In `cmd/gnoma/main.go`, after `store` is initialized and after `ctxWindow` is created, add: + +```go +// Initialize tokenizer for accurate context tracking +tok := tokenizer.ForProvider(prov.Name()) +ctxWindow.Tracker().SetTokenizer(tok) +``` + +Add import: +```go +"somegit.dev/Owlibou/gnoma/internal/tokenizer" +``` + +- [ ] **Step 2: Fix context window size from arm capabilities** + +Replace the `MaxTokens` heuristic in `WindowConfig` (currently `cfg.Provider.MaxTokens * 20`). + +After `rtr.RegisterProvider(...)` calls and before `ctxWindow` creation: + +```go +// Derive context window size from the registered arm for the primary provider+model +contextWindowSize := int64(cfg.Provider.MaxTokens) * 20 // fallback +{ + modelName := *model + if modelName == "" { + modelName = prov.DefaultModel() + } + armID := router.NewArmID(prov.Name(), modelName) + if arm, ok := rtr.LookupArm(armID); ok && arm.Capabilities.ContextWindow > 0 { + contextWindowSize = int64(arm.Capabilities.ContextWindow) + logger.Debug("context window from arm capabilities", + "arm", armID, + "context_window", contextWindowSize, + ) + } +} +``` + +Update `WindowConfig`: +```go +ctxWindow := gnomactx.NewWindow(gnomactx.WindowConfig{ + MaxTokens: contextWindowSize, // was: cfg.Provider.MaxTokens * 20 + Strategy: compactStrategy, + PrefixMessages: prefixMsgs, + Logger: logger, +}) +``` + +- [ ] **Step 3: Count prefix tokens as initial tracker baseline** + +After `ctxWindow` is created and tokenizer is wired in, count the prefix: + +```go +tok := tokenizer.ForProvider(prov.Name()) +ctxWindow.Tracker().SetTokenizer(tok) + +// Seed tracker with prefix token cost so compaction budget is accurate +if len(prefixMsgs) > 0 { + prefixTokens := ctxWindow.Tracker().CountMessages(prefixMsgs) + ctxWindow.Tracker().Set(prefixTokens) + logger.Debug("prefix token baseline set", "tokens", prefixTokens) +} +``` + +- [ ] **Step 4: Update `loop.go` to use tracker for task token estimation** + +In `internal/engine/loop.go`, replace both occurrences of: +```go +task.EstimatedTokens = int(gnomactx.EstimateTokens(prompt)) +``` +with: +```go +if e.cfg.Context != nil { + task.EstimatedTokens = int(e.cfg.Context.Tracker().CountTokens(prompt)) +} else { + task.EstimatedTokens = int(gnomactx.EstimateTokens(prompt)) +} +``` + +There are 3 occurrences in `loop.go` (main loop, retry block, 413 handler). Update all three. + +- [ ] **Step 5: Build and run all tests** + +``` +make build && make test +``` +Expected: clean build, all tests pass. Fix any issues. + +- [ ] **Step 6: Commit** + +``` +git add cmd/gnoma/main.go internal/engine/loop.go +git commit -m "feat: accurate context window sizing from arm capabilities + prefix token baseline + tokenizer wiring" +``` + +--- + +## Task 8: Coordinator Prompt Injection + +**Files:** +- Modify: `internal/engine/loop.go` + +- [ ] **Step 1: Write a test for coordinator prompt injection** + +In `internal/engine/` add `coordinator_test.go`: + +```go +// internal/engine/coordinator_test.go +package engine + +import ( + "strings" + "testing" + + "somegit.dev/Owlibou/gnoma/internal/router" +) + +func TestCoordinatorSystemPrompt_InjectedForOrchestration(t *testing.T) { + prompt := coordinatorPrompt() + if !strings.Contains(prompt, "spawn_elfs") { + t.Error("coordinator prompt must mention spawn_elfs") + } + if !strings.Contains(prompt, "list_results") { + t.Error("coordinator prompt must mention list_results") + } +} + +func TestShouldInjectCoordinatorPrompt(t *testing.T) { + cases := []struct { + prompt string + want bool + }{ + {"orchestrate the migration", true}, + {"coordinate the refactor", true}, + {"dispatch tasks to elfs", true}, + {"fix the bug in main.go", false}, + {"explain this function", false}, + {"write unit tests for auth", false}, + } + for _, c := range cases { + task := router.ClassifyTask(c.prompt) + got := task.Type == router.TaskOrchestration + if got != c.want { + t.Errorf("prompt %q: want orchestration=%v, got %v (type=%s)", c.prompt, c.want, got, task.Type) + } + } +} +``` + +- [ ] **Step 2: Run to confirm failure** + +``` +go test ./internal/engine/... -run TestCoordinator +``` +Expected: compile error — `coordinatorPrompt` not defined. + +- [ ] **Step 3: Add coordinator prompt injection to `loop.go`** + +At the bottom of `loop.go`, add: + +```go +// coordinatorPrompt returns the system prompt block injected for orchestration tasks. +func coordinatorPrompt() string { + return `You are operating in coordinator mode. Your role is to decompose complex work into parallel tasks and orchestrate elfs. + +Rules: +- Use spawn_elfs to dispatch N tasks in parallel when they don't share write state. +- Use list_results to discover outputs produced by prior tool calls in this session. +- Pass result file paths to elfs in their prompts so they can read prior outputs with read_result or fs.read. +- Writes are serial: if two elfs would write the same file, sequence them. +- Synthesize elf outputs into a coherent final answer.` +} +``` + +In `buildRequest()`, after building `req` (after `req.Tools` is populated), add coordinator injection: + +```go +// Inject coordinator guidance for orchestration tasks +if e.cfg.Router != nil { + prompt := "" + for i := len(e.history) - 1; i >= 0; i-- { + if e.history[i].Role == message.RoleUser { + prompt = e.history[i].TextContent() + break + } + } + if router.ClassifyTask(prompt).Type == router.TaskOrchestration { + req.SystemPrompt = coordinatorPrompt() + "\n\n" + req.SystemPrompt + } +} +``` + +- [ ] **Step 4: Run tests** + +``` +go test ./internal/engine/... -run TestCoordinator +``` +Expected: pass. + +- [ ] **Step 5: Run full test suite** + +``` +make test +``` +Expected: all pass. + +- [ ] **Step 6: Final build check** + +``` +make build +``` +Expected: binary produced at `./bin/gnoma`. + +- [ ] **Step 7: Commit** + +``` +git add internal/engine/loop.go internal/engine/coordinator_test.go +git commit -m "feat: coordinator mode — system prompt injection for orchestration tasks" +``` + +--- + +## Verification Checklist + +After all tasks complete: + +1. **Persistence:** `ls /tmp/gnoma-*/tool-results/` should show files after running any tool that produces ≥1KB output. Small results (e.g. `echo hello | bash`) should NOT create files. + +2. **Tokenizer:** Add `slog.Debug("token count", "text", prompt[:20], "tokens", task.EstimatedTokens)` temporarily in `loop.go` and confirm it's significantly different from `len/4` for code-heavy prompts. + +3. **Router feedback:** Run `make test ./internal/router/...` — all QualityTracker tests pass. Use `slog.Debug` in `scoreArm` to confirm blended score differs from pure heuristic after a few elf runs. + +4. **Coordinator tools:** `list_results` and `read_result` appear in the tool list (visible in `/config` in TUI or via `slog.Debug` in `buildRequest`). A prompt containing "orchestrate" triggers the coordinator system prompt. + +5. **Context window accuracy:** After startup, `tokens: N (P%)` in TUI should reflect prefix doc size, not 0%. + +6. **Full test suite:** `make test` passes with no regressions.