docs: M6/M7 close-out design spec — tool persistence, tokenizer, router feedback, coordinator

This commit is contained in:
2026-04-05 21:22:26 +02:00
parent c2502a2b39
commit c556d3172f

View File

@@ -0,0 +1,296 @@
# M6/M7 Close-out: Tool Persistence, Tokenizer, Router Feedback, Coordinator Mode
**Date:** 2026-04-05
**Milestones:** M6 (Context Intelligence), M7 (Elfs)
**Status:** Approved
---
## Context
Gnoma's M6/M7 gap audit left four items unfinished:
1. **Tool result persistence** — currently only fires for results >50K chars and writes to `.gnoma/sessions/`. The vision is every meaningful result (>1KB) persisted to `/tmp` so tools can share state across a session.
2. **Local tokenizer** — token counting uses a `len/4` heuristic. This causes compaction to fire too early or too late and makes context window sizing inaccurate.
3. **Router feedback**`ReportOutcome` is a `slog.Debug` stub. Elf success/failure signals are captured but never used to influence arm selection.
4. **Coordinator mode** — completely unimplemented. Needed to close M7 and unblock M8/M10 coordinator work.
The `/tmp` tool result files become the shared artifact layer connecting all four: elfs write results to shared files, the coordinator discovers them via `list_results`, and the router uses elf outcomes (with result file references) for quality tracking.
---
## 1. Tool Result Persistence
### What changes
**New package:** `internal/tool/persist`
```
persist/
store.go -- Store type, session dir management
store_test.go
```
**`Store` type:**
```go
type Store struct {
dir string // /tmp/gnoma-<sessionID>/tool-results
}
func New(sessionID string) *Store
func (s *Store) Save(toolName, callID, content string) (path string, persisted bool)
func (s *Store) List(filter string) ([]ResultFile, error) // filter = glob on tool name prefix
func (s *Store) Read(path string) (string, error) // validates path is within session dir
```
```go
type ResultFile struct {
Path string
ToolName string
CallID string
Size int64
ModTime time.Time
}
```
**Threshold:** `len(content) >= 1024` bytes. Below this, `Save` returns `("", false)` — no file written.
**File naming:** `/tmp/gnoma-<sessionID>/tool-results/<toolName>-<callID>.txt`
- Example: `/tmp/gnoma-20260405-150405-abc123/tool-results/bash-toolu_01AbCd.txt`
**Session ID:** `<YYYYMMDD-HHMMSS>-<6 random hex chars>` — generated once at engine startup, passed into `Store.New()`.
**Inline context replacement** (what the LLM sees instead of the full result):
```
[Tool result saved: /tmp/gnoma-<session>/tool-results/<tool>-<id>.txt]
Preview (first 2000 chars):
<truncated content>
```
**Engine integration:**
- `engine.Engine` gains a `store *persist.Store` field
- `executeSingleTool` in `loop.go` replaces the `PersistLargeResult` call with `store.Save()`
- The existing `PersistLargeResult` function in `internal/context/persist.go` is retired (deleted)
- `elf.Manager` receives the `Store` and passes it to each elf's engine config
**Cleanup:** No explicit cleanup. `/tmp/gnoma-*` dirs are session-scoped; OS garbage-collects `/tmp` on reboot or via tmpwatch/systemd-tmpfiles.
### Key constraint
`Store.Read()` must validate that the requested path is prefixed with the session's tool-results dir. This prevents `read_result` from being used to traverse arbitrary filesystem paths.
---
## 2. Local Tokenizer (tiktoken-go)
### What changes
**New dependency:** `github.com/pkoukk/tiktoken-go`
**New package:** `internal/tokenizer`
```
tokenizer/
tokenizer.go
tokenizer_test.go
```
**`Tokenizer` type:**
```go
type Tokenizer struct {
enc *tiktoken.Tiktoken // nil until first use
encoding string // e.g. "cl100k_base"
mu sync.Mutex
}
func New(encoding string) *Tokenizer
func ForProvider(providerName string) *Tokenizer
func (t *Tokenizer) Count(text string) int
```
**Provider → encoding mapping:**
| Provider | Encoding |
|----------|----------|
| `anthropic` | `cl100k_base` |
| `openai` | `cl100k_base` |
| `mistral` | `o200k_base` |
| `google` | `o200k_base` |
| `ollama` | `o200k_base` |
| `llamacpp` | `o200k_base` |
| (unknown) | `cl100k_base` (fallback) |
**Lazy loading:** Encoding is loaded on first `Count()` call inside a `sync.Once` equivalent (`sync.Mutex` guard, check-and-initialize). Encoding files are ~2MB per vocab and are embedded by tiktoken-go via `go:embed`.
**Fallback:** If tiktoken initialization fails (e.g., unsupported encoding, memory pressure), `Count()` falls back to `len(text)/4` and logs a `slog.Warn` once via `sync.Once`.
### Context tracker changes
- `context.Tracker` gains a `tokenizer *tokenizer.Tokenizer` field (optional; nil → heuristic)
- `EstimateTokens(text)` replaced by `CountTokens(tok *Tokenizer, text string)` — uses tokenizer if non-nil, else heuristic
- `EstimateMessages` renamed `CountMessages`, same pattern
- Tracker initialized with tokenizer in `main.go`: `tokenizer.ForProvider(cfg.Provider.Name)`
- **Context window size fix:** `MaxTokens` set from `arm.Capabilities.ContextWindow` instead of `cfg.Provider.MaxTokens * 20`. This field is already populated for all providers.
- **Prefix token counting:** Prefix messages are counted at load time and added to the tracker's initial baseline so they're visible to compaction logic.
---
## 3. Router Feedback (Heuristic Quality Tracking)
### What changes
**New file:** `internal/router/feedback.go`
```go
type QualityTracker struct {
mu sync.RWMutex
scores map[string]map[TaskType]*EMAScore // armID -> taskType -> score
}
type EMAScore struct {
Value float64
Count int
}
const qualityAlpha = 0.3
const minObservations = 3 // below this, fall back to heuristic-only
func NewQualityTracker() *QualityTracker
func (qt *QualityTracker) Record(armID string, taskType TaskType, success bool)
func (qt *QualityTracker) Quality(armID string, taskType TaskType) (score float64, hasData bool)
```
`Record`:
- Maps `success` to observation: `1.0` (success) or `0.0` (failure)
- EMA update: `score.Value = qualityAlpha*observation + (1-qualityAlpha)*score.Value`
- Increments `Count`
`Quality`:
- Returns `(0, false)` when `Count < minObservations`
- Returns `(score.Value, true)` otherwise
### Outcome struct extension
`router.Outcome` gains one field:
```go
type Outcome struct {
ArmID string
TaskType TaskType
Success bool
Tokens int
Duration time.Duration
ResultFilePaths []string // NEW: paths to /tmp tool result files (for future M9 analysis)
}
```
The `ResultFilePaths` field is populated by the `agent`/`spawn_elfs` tools: snapshot `store.List()` before spawning the elf, snapshot again after `Wait()` returns, then diff — files present in the post-snapshot but not the pre-snapshot are attributed to that elf's run.
### Router integration
- `Router` gains a `quality *QualityTracker` field, initialized in `New()`
- `ReportOutcome` calls `qt.Record(o.ArmID, o.TaskType, o.Success)` (replaces slog.Debug stub)
- `scoreArm()` updated to blend observed and heuristic quality:
```go
hq := heuristicQuality(arm, task)
if observed, hasData := r.quality.Quality(arm.ID, task.Type); hasData {
quality = 0.7*observed + 0.3*hq
} else {
quality = hq
}
```
### What this does NOT include (M9)
- No Thompson Sampling / Beta distributions
- No state persistence across restarts
- No delayed attribution for orchestration tasks
- No implicit feedback (edit distance, escalation signals)
---
## 4. Coordinator Mode
### What changes
**New tools in `internal/tool/agent/`:**
`list_results.go` — `ListResultsTool` (name: `list_results`):
```go
// Parameters: filter string (optional, glob on tool name prefix, e.g. "bash*")
// Returns: formatted list of result files in the session:
// /tmp/gnoma-<session>/tool-results/bash-toolu_abc.txt [bash, 4.2KB, 15:04:05]
// /tmp/gnoma-<session>/tool-results/fs.grep-toolu_def.txt [fs.grep, 1.1KB, 15:04:12]
// IsReadOnly: true, IsDestructive: false
```
`read_result.go` — `ReadResultTool` (name: `read_result`):
```go
// Parameters: path string (required)
// Validates: path must be prefixed with store.Dir() — no path traversal
// Returns: full file content
// IsReadOnly: true, IsDestructive: false
```
Both tools receive the `*persist.Store` as a constructor argument.
**Coordinator system prompt injection** in `internal/engine/loop.go`:
When `router.ClassifyTask()` returns `TaskOrchestration`, the engine prepends a coordinator block to the request's system prompt:
```
You are operating in coordinator mode. Your role is to decompose complex work into parallel tasks and orchestrate elfs.
Rules:
- Use `spawn_elfs` to dispatch N tasks in parallel when they don't share write state.
- Use `list_results` to discover outputs produced by prior tool calls in this session.
- Pass result file paths to elfs in their prompts so they can read prior outputs with `read_result` or `fs.read`.
- Writes are serial: if two elfs would write the same file, sequence them.
- Synthesize elf outputs into a coherent final answer.
```
This prompt injection is conditional: only fires when `ClassifyTask(latestUserMessage).Type == TaskOrchestration`. It does not create a new engine mode.
**Tool registration** in `main.go`:
```go
reg.Register(agent.NewListResultsTool(store))
reg.Register(agent.NewReadResultTool(store))
```
---
## File Map
| File | Action |
|------|--------|
| `internal/tool/persist/store.go` | New |
| `internal/tool/persist/store_test.go` | New |
| `internal/tokenizer/tokenizer.go` | New |
| `internal/tokenizer/tokenizer_test.go` | New |
| `internal/router/feedback.go` | New |
| `internal/router/feedback_test.go` | New |
| `internal/tool/agent/list_results.go` | New |
| `internal/tool/agent/read_result.go` | New |
| `internal/engine/engine.go` | Modify: add `store`, `tokenizer` fields to `Config` |
| `internal/engine/loop.go` | Modify: replace `PersistLargeResult`, add coordinator prompt injection |
| `internal/context/tracker.go` | Modify: accept `*tokenizer.Tokenizer`, update `EstimateTokens` |
| `internal/context/window.go` | Modify: use `CountMessages`, fix `MaxTokens` derivation |
| `internal/context/persist.go` | Delete: retire `PersistLargeResult` / `TruncateToolResult` |
| `internal/router/router.go` | Modify: add `QualityTracker`, wire `ReportOutcome` |
| `internal/router/selector.go` | Modify: blend observed quality into `scoreArm()` |
| `internal/router/arm.go` | Modify: extend `Outcome` with `ResultFilePaths` |
| `internal/elf/manager.go` | Modify: accept and forward `*persist.Store` to elf engines |
| `cmd/gnoma/main.go` | Modify: init `Store`, `Tokenizer`, register new tools |
| `go.mod` | Modify: add `github.com/pkoukk/tiktoken-go` |
---
## Verification
1. **Persistence:** Run `echo "list all go files" | gnoma --provider anthropic` and check `/tmp/gnoma-*/tool-results/` for result files. Verify small results (<1KB) are absent, large ones present with preview in conversation.
2. **Tokenizer:** Set breakpoints or add `slog.Debug` in `Count()` to confirm tiktoken is invoked. Check that context window percentage in TUI tracks accurately against provider-reported token counts.
3. **Router feedback:** Spawn 5 elfs, mix of successes and failures. Check that `scoreArm()` values differ from pure heuristic via a debug log or test. Run `make test ./internal/router/...`.
4. **Coordinator:** Send a prompt containing "orchestrate" / "coordinate" to the TUI. Verify coordinator system prompt appears in the request (add a debug log or check via provider trace). Run a multi-elf workflow where elf B references elf A's `/tmp` output.
5. **Tests:** `make test` must pass. New packages have unit tests covering `Store`, `Tokenizer`, `QualityTracker`, and the two new tools.