Apply gofmt -w across the codebase (struct field comment realignment
only — no semantic changes) and silence two errcheck warnings on
fmt.Sscanf / fmt.Fprintf return values in internal/router/discovery
with explicit `_, _ =` discards. Required so `make check` is green
before tagging v0.1.0.
When a stream errors out before producing any user-visible content
(text, thinking, or tool calls), the engine now transparently retries
on the next-best arm instead of bubbling the error to the TUI. Covers
the case from the post-SLM screenshot: subprocess CLI agents that
exit non-zero on auth/config failures, network drops mid-stream,
rate-limited arms whose error surfaces after Stream() already returned.
Mechanism: the stream-create + consume blocks are wrapped in a labeled
streamLoop. On s.Err() != nil with empty accumulator, the engine emits
a new EventFailover ("↻ <failed_arm> failed (<reason>) — retrying on
another arm"), excludes the failed arm via task.ExcludedArms, and
re-enters the loop. Cap of 4 failovers per round.
Guards:
- !acc.HasContent() — if text/tool calls already streamed, fail loud
rather than duplicate visible output on retry.
- isFailoverable(err) — deny-list approach: context.Canceled/Deadline
and HTTP 400/413 are fatal; everything else (auth, rate limit, 5xx,
subprocess exit, network) is failoverable.
- Router.ForcedArm() == "" — when the user pinned an arm via --provider,
failover is disabled by design.
- failoverAttempt < maxFailovers — bounded retry budget.
TUI renders EventFailover under the existing "cost" role styling.
shortFailReason strips the subprocess wrapper envelope so the user sees
"Invalid API key. Try again." instead of
"subprocess: exit status 1: Error: Invalid API key. Try again.".
Tests cover the classifier (isFailoverable, shortFailReason), end-to-end
auth-error failover, content-already-streamed guard, and context-cancel
guard. Deterministic across 10x -race runs by giving the failing arm
IsCLIAgent=true to anchor it in tier 0 ahead of the API-tier backup.
Closes the last remaining 2026-05-19 audit finding by documenting the
existing transitive guarantee rather than restructuring the hook
contract.
The audit observed that PostToolUse hooks receive raw tool output
before the firewall scan runs, and proposed reordering or splitting
the event into raw-local-only and redacted-for-LLM variants. After
Wave 1 (SafeProvider boundary at every router arm + non-engine
provider consumer), the audit's threat model is closed transitively:
- Shell hooks see raw output but never reach an LLM.
- Prompt hooks route Stream calls through routerStreamer → router →
arm.Provider, every arm.Provider is now *SafeProvider, outgoing
messages are scanned at the boundary.
- Agent hooks spawn an elf whose engine has Firewall set;
buildRequest scans inline.
Reordering would regress legitimate shell-hook use cases (audit,
forensic, local alert) that need raw access. Splitting the contract
forces every existing hook config to migrate and introduces a
wrong-variant footgun. Neither is justified by the residual risk.
Three changes ship with the ADR:
- ADR-004 records the decision and the conditions for re-opening it.
- Doc comments on hook.PostToolUse and the dispatcher call site in
the engine point at the ADR.
- internal/hook/posttooluse_redaction_test.go locks in the invariant:
a prompt PostToolUse hook firing on a secret-bearing tool result
produces a redacted prompt at the inner provider. If this test
fails, ADR-004's Position A is no longer correct and the audit
finding re-opens.
Closes the cluster of audit findings where gnoma's incognito promise
('no persistence, no learning, local-only routing') silently broke
because state was duplicated across the CLI flag, the firewall's
IncognitoMode, the router's localOnly flag, and the TUI's local
m.incognito field. Wave 2 makes security.IncognitoMode the canonical
source of truth.
W2-1 Router.Select rejects forced non-local arms when localOnly is on
rather than short-circuiting and silently routing to cloud. Main
fails fast when --incognito + --provider <cloud> are combined; the
TUI toggle (Ctrl+X, /incognito, config panel) refuses with an
actionable message when a non-local arm is pinned. Factored the
three duplicated toggle sites into Model.attemptIncognitoToggle.
W2-2 persist.Store.Save consults an IncognitoGate (local interface,
*security.IncognitoMode satisfies it). nil gate = always persist
(legacy behaviour for tests); non-nil gate is consulted on every
Save so TUI runtime toggles take effect without reconstructing the
store. File mode 0o600, dir mode 0o700.
W2-3 tui.New seeds m.incognito from cfg.Firewall.Incognito().Active().
Fixes the Ctrl+X-on-launch-with-incognito case where the first
toggle silently turned the firewall OFF because the local flag
started false out of sync with the firewall.
W2-4 saveQuality gates on both *incognito (defensive, covers the
window before fwRef.Set fires) and fw.Incognito().ShouldLearn() (so
TUI Ctrl+X suppresses the snapshot on exit). Quality restore skipped
under --incognito. Quality file written 0o600 in dir 0o700.
engine.reportOutcome and elf.Manager.ReportResult both gate on
fw.Incognito().ShouldLearn() — bandit signal no longer leaks out of
incognito sessions.
W2-5 session files written 0o600 in dirs 0o700 (was 0o644 / 0o755).
W2-6 IncognitoMode.LocalOnly dropped — dead field with no readers;
routing local-only state lives on the router, not the firewall.
Also wires rtr.SetLocalOnly(true) when --incognito at launch — main
previously activated the firewall's flag but never told the router to
filter, so even without the forced-arm bug, launching with
--incognito alone gave you 'incognito badge but full arm pool'.
Plan A from docs/superpowers/plans/2026-05-19-post-slm-unlock.md.
Small local SLMs (<=16k context) waste ~1500 tokens per turn on the
full tool catalogue. Two-stage routing replaces round-1 tools with a
single synthetic select_category schema; round-2+ sends only the
selected category's real tool schemas plus select_category for
re-selection.
- internal/tool/category.go: Category type, optional Categorized
interface, CategoryOf() with meta fallback. fs.read/fs.ls -> read,
fs.write/fs.edit -> write, fs.glob/fs.grep -> search, bash -> exec.
- internal/engine/twostage.go: synthetic select_category tool,
intercept helper, per-turn selectedCategory state under e.mu.
- Engine round 1 forces ToolChoiceRequired so SLMs don't fall back to
prose. State resets at the top and end of every runLoop.
- Activates automatically on a forced local arm with ContextWindow
<=16384, or via [router].force_two_stage TOML key.
- Integration test drives a 3-round trip and asserts: round 1 emits
exactly one schema (synthetic) with ToolChoiceRequired, round 2
contains only write-category schemas + select_category, real
fs.write executes. Invalid-category fallback round-trips back to
round-1 mode.
The TUI gave no indication that an SLM was configured or active.
You'd see the primary provider on the status line and nothing else,
even with [slm].enabled=true and a successfully booted backend.
Two surfaces added:
1. Status-bar SLM badge. The left side of the status line gains a
dim " · slm: <model> ⚙" suffix when the backend booted, " · slm: ✗"
when it failed, and nothing when SLM is disabled. The ⚙ marker
indicates the model advertises tool support.
2. Per-turn classifier visibility. The existing routing event already
produced "routed → <arm> (task: <type>)" lines in the chat history;
it now also reports which classifier made the decision, e.g.
"routed → ollama/ministral-3:3b (task: explain, by: slm_fallback)".
Lets you tell in real time whether the SLM is actually classifying
or falling back to the keyword heuristic.
Plumbing:
- new tui.SLMInfo struct on tui.Config
- main.go populates it after StartBackend returns
- stream.Event gains RoutingClassifier; engine.runLoop fills it from
task.ClassifierSource on the first round
Phase 4 routing decisions depend on knowing whether the SLM classifier
is actually firing or whether the heuristic is silently doing all the
work. Adds the instrumentation to make that observable.
router.ClassifierSource enum (heuristic / slm / slm_fallback) is set
on Task by every classifier:
- HeuristicClassifier → ClassifierHeuristic
- slm.Classifier → ClassifierSLM on success, ClassifierSLMFallback when
the SLM call fails or returns unparseable output
The source is plumbed through router.Outcome to QualityTracker, which
now maintains per-source counters alongside the existing per-arm × task
EMA scores. QualitySnapshot serializes both (classifier_counts is
omitempty for back-compat with pre-feature quality.json files).
lazyClassifier logs at INFO the first time it falls back to heuristic
because the SLM hasn't booted yet — distinguishes operational fallback
from an unconfigured-SLM run.
slm.Manager.Start() now records elapsed-to-healthy and the main.go
goroutine logs it as part of the "SLM ready" event. Confirms whether
short-lived runs are racing the boot cycle.
New `gnoma router stats` subcommand prints both tables (arm × task
quality, classifier source breakdown) from quality.json with a Phase 4
trust hint when the data is too sparse or the SLM share is low.
6 new tests cover ClassifierSource string/enum, heuristic + SLM source
propagation, QualityTracker counter round-trip, and back-compat
restore from a legacy quality.json without classifier_counts.
Brings the project to a clean `make lint` baseline (0 issues).
Mechanical:
- Wrap deferred resp.Body.Close() in closures (router/discovery.go,
router/probe.go) so the unchecked return surfaces as `_ = ...`.
- Apply `_ = ...` (single or multi-return blank) to test-file calls
that intentionally ignore errors: os.MkdirAll / os.WriteFile / os.Chdir
in setup paths, Close / Shutdown in teardown, Submit / Spawn / Send /
LoadDir in tests that assert on side effects.
Structural:
- engine.handleRequestTooLarge drops the unused req parameter and
rebuilds the request from compacted history (SA4009 — argument was
overwritten before first use).
- provider.ClassifyHTTPStatus and google.applyCapabilityOverrides switch
to tagged switches over the discriminator (QF1002).
- tui.app.go MouseWheel + inputMode and cmd/gnoma main slm-status use
tagged switches in place of equality chains (QF1003).
- cmd/gnoma main.go merges a var decl with its immediate assignment
(S1021).
- Three empty-branch sites (dispatcher_test, loader_test,
coordinator_test) become real assertions or get the dead `if` removed
(SA9003).
Adds three lightweight per-turn detectors that fire corrective user
messages back into the conversation when the model goes off the rails:
- RepetitionDetector: sliding-window scan over streamed text deltas;
trips when a 50/80/120-char pattern repeats >= 3 times in the trailing
200 chars. Breaks the active stream and injects a correction.
- PatchFailureTracker: per-path counter for fs.edit/fs.write failures;
trips on the 4th consecutive failure and steers the model to fs.write
rather than another fs.edit on the same path. Success decrements with
a floor of 0; paths are isolated.
- DetectGreeting: narrow allowlist for "how can I help" style replies;
only consulted after a round that used tools, so first-turn greetings
don't false-positive.
Detector state is per-turn (declared locally in runLoop), single-
goroutine use. Corrective messages are appended as user-role text to
both engine history and the context window. Telemetry: each trigger
logs at INFO with round + path where applicable.
Covered by 12 unit tests for the primitives and 5 loop-level integration
tests that drive the full agentic loop via the existing eventStream
mock.
Removes five unused funcs/vars/fields that golangci-lint had been
flagging (anthropic.toolCallDoneEvent, mistral.translateMessages,
hook.newError, subprocess.vibeParser.lastAssistantMsgID, tui.cBase),
two ineffectual assignments (tui/rendering.go visible-window loop,
subprocess stream_test setup), and a stale if/HasPrefix that's now a
strings.TrimPrefix.
Wires errcheck onto every subprocess / stream lifecycle path so a
failed close or shutdown is at least logged rather than silently
dropped:
- engine/loop.go: stream.Close on both the error and success paths
- mcp/manager.go: Shutdown when StartAll partial-fails; Transport
close after Initialize failure
- mcp/transport.go: stdin.Close + syscall.Kill on graceful-timeout
fallback
- slm/download.go: Close propagated as a named-return error on the
success path; explicitly discarded on the rollback path
- slm/classifier.go, slm/manager.go, hook/prompt.go, context/summarize.go,
config/write.go, cmd/gnoma/main.go, tool/fs/grep.go: explicit
ignores or error logging on Close / Shutdown / WalkDir / Scanln
Production-code errcheck and ineffassign are now zero. Remaining
golangci-lint output is test-only Close-in-defer noise plus
stylistic staticcheck QF suggestions, left alone.
Engine.history, usage, activatedTools, modelCaps, turnOpts, and
cfg.Provider/Model are now mutated and read under e.mu. The lock is
released across blocking provider.Stream calls so external setters
(SetProvider, SetHistory, InjectMessage, etc.) can interleave.
History() now returns a copy. Snapshot helpers (latestUserPrompt,
historySnapshot, snapshotTurnOpts, etc.) replace the unsynchronised
reads scattered through runLoop and buildRequest.
Closes audit finding H4. Adds a race regression test that fails under
-race before the fix and passes after.
- internal/router/classifier.go: TaskClassifier interface with
Classify(ctx, prompt, history) signature. HeuristicClassifier wraps
the existing ClassifyTask() with zero behavior change.
- engine.Config.Classifier: injectable TaskClassifier; nil defaults
to HeuristicClassifier. Engine.classify() helper handles nil + error
fallback transparently.
- loop.go: all four router.ClassifyTask() call sites replaced with
e.classify(ctx, prompt). SLMClassifier slots in without further
changes to the engine.
- Add tool.PathSensitiveTool interface (ExtractPaths); implement on all 6 fs tools
- Add engine.TurnOptions.AllowedPaths: restricts tool filesystem access per skill invocation
- Bash is denied outright when AllowedPaths is active (unparseable command args)
- fs tools with empty path (cwd default) resolved via os.Getwd() and validated
- Add engine.TurnOptions.AllowedTools + AllowedPaths wiring in pipe mode (main.go) and TUI skill dispatch (tui/app.go)
- Remove TODO(M8.3) from skill.Frontmatter — enforcement is now complete
Three compounding bugs prevented tool calling with llama.cpp:
- Stream parser set argsComplete on partial JSON (e.g. "{"), dropping
subsequent argument deltas — fix: use json.Valid to detect completeness
- Missing tool_choice default — llama.cpp needs explicit "auto" to
activate its GBNF grammar constraint; now set when tools are present
- Tool names in history used internal format (fs.ls) while definitions
used API format (fs_ls) — now re-sanitized in translateMessage
Additional changes:
- Disable SDK retries for local providers (500s are deterministic)
- Dynamic capability probing via /props (llama.cpp) and /api/show
(Ollama), replacing hardcoded model prefix list
- Engine respects forced arm ToolUse capability when router is active
- Bundled /init skill with Go template blocks, context-aware for local
vs cloud models, deduplication rules against CLAUDE.md
- Tool result compaction for local models — previous round results
replaced with size markers to stay within small context windows
- Text-only fallback when tool-parse errors occur on local models
- "text-only" TUI indicator when model lacks tool support
- Session ResetError for retry after stream failures
- AllowedTools per-turn filtering in engine buildRequest
- Split app.go (2091→1378 lines) into rendering.go, events.go, init.go
- Add EventRouting stream event for router arm transparency
- Add session auto-naming from first user message
- Add context window progress bar in status bar
- Add /keys cheatsheet, /replay for resumed sessions
- Add inline cost-per-turn after assistant responses
- Add diff previews in fs.write/fs.edit permission prompts
- Collapse tool output to 3 lines by default (ctrl+o expands)
- Use AddPrefix for system context instead of InjectMessage
- Handle ContentThinking and ContentToolResult in session resume
- Show session title in resume picker
- Add /model numeric selection snapshot safety
provider/openai:
- Fix doubled tool call args (argsComplete flag): Ollama sends complete
args in the first streaming chunk then repeats them as delta, causing
doubled JSON and 400 errors in elfs
- Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep)
- Add Reasoning field support for Ollama thinking output
cmd/gnoma:
- Early TTY detection so logger is created with correct destination
before any component gets a reference to it (fixes slog WARN bleed
into TUI textarea)
permission:
- Exempt spawn_elfs and agent tools from safety scanner: elf prompt
text may legitimately mention .env/.ssh/credentials patterns and
should not be blocked
tui/app:
- /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge
(ask for plain text output) → TUI fallback write from streamBuf
- looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback
content before writing (reject refusals, strip narrative preambles)
- Collapse thinking output to 3 lines; ctrl+o to expand (live stream
and committed messages)
- Stream-level filter for model pseudo-tool-call blocks: suppresses
<<tool_code>>...</tool_code>> and <<function_call>>...<tool_call|>
from entering streamBuf across chunk boundaries
- sanitizeAssistantText regex covers both block formats
- Reset streamFilterClose at every turn start
Gap 11 (M6): Fixed context prefix
- Window.PrefixMessages stores immutable docs (CLAUDE.md, .gnoma/GNOMA.md)
- Prefix stripped before compaction, prepended after — survives all compaction
- AllMessages() returns prefix + history for provider requests
- main.go loads CLAUDE.md and .gnoma/GNOMA.md at startup as prefix
Gap 12 (M6): Deferred tool loading
- DeferrableTool optional interface: ShouldDefer() bool
- buildRequest() skips deferred tools until activated
- Tools auto-activate on first model request (activatedTools map)
- agent + spawn_elfs marked as deferrable (large schemas, rarely needed early)
- Saves ~800 tokens per deferred tool per request
Gap 13 (M6): Pre/post compact hooks
- OnPreCompact/OnPostCompact callbacks in WindowConfig
- Called in doCompact() (shared by CompactIfNeeded + ForceCompact)
- M8 hooks system will extend these to full protocol
Engine retries transient errors (429, 5xx) up to 4 times with
1s/2s/4s/8s backoff. Respects Retry-After header from provider.
Batch tool staggers elf spawns by 300ms to avoid rate limit bursts
when all elfs hit the API simultaneously (Mistral's 1 req/s limit).
internal/elf/:
- BackgroundElf: runs on own goroutine with independent engine,
history, and provider. No shared mutable state.
- Manager: spawns elfs via router.Select() (picks best arm per
task type), tracks lifecycle, WaitAll(), CancelAll(), Cleanup().
internal/tool/agent/:
- Agent tool: LLM can call 'agent' to spawn sub-agents.
Supports task_type hint for routing, wait/background mode.
5-minute timeout, context cancellation propagated.
Concurrent tool execution:
- Read-only tools (fs.read, fs.grep, fs.glob, etc.) execute in
parallel via goroutines.
- Write tools (bash, fs.write, fs.edit) execute sequentially.
- Partition by tool.IsReadOnly().
TUI: /elf command explains how to use sub-agents.
5 elf tests. Exit criteria: parent spawns 3 background elfs on
different providers, collects and synthesizes results.
SummarizeStrategy: calls LLM to condense older messages into a
summary, preserving key decisions, file changes, tool outputs.
Falls back to truncation on failure. Keeps 6 recent messages.
Tool result persistence: outputs >50K chars saved to disk at
.gnoma/sessions/tool-results/{id}.txt with 2K preview inline.
TUI: /compact command for manual compaction, /clear now resets
engine history. Summarize strategy used by default (with
truncation fallback).
Tools now go through permission.Checker before executing:
- plan mode: denies all writes (fs.write, bash), allows reads
- bypass mode: allows all (deny rules still enforced)
- default mode: prompts user (pipe: stdin prompt, TUI: auto-approve for now)
- accept_edits: auto-allows file ops, prompts for bash
- deny mode: denies all without allow rules
CLI flags: --permission <mode>, --incognito
Pipe mode: console Y/N prompt on stderr
TUI mode: auto-approve (proper overlay TODO)
Verified: plan mode correctly blocks fs.write, model sees error.
- Fixed: chat content no longer overflows past allocated height.
Lines are measured for physical width and hard-truncated to
exactly the chat area height. Input + status bar always visible.
- Header scrolls with chat (not pinned), only input/status fixed
- Git branch in status bar (green, via git rev-parse)
- Alt screen mode — terminal scrollback disabled
- Mouse wheel + PgUp/PgDown scroll within TUI
- New EventToolResult: tool output as dimmed indented block
- Separator lines above/below input, no status bar backgrounds