- slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback,
heuristic baseline blended so Priority/RequiredEffort are never zeroed,
extractJSON strips markdown fences from small-model responses
- router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration
- router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior);
filterFeasible excludes arms when task.ComplexityScore > MaxComplexity
- config.SLMSection: [slm] enabled / model_url / data_dir
- openaicompat.NewLlamafile: no API key, model = "default", no retries
- slm.Manager: DefaultDataDir() (XDG), Manifest() accessor
- cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm
registered with MaxComplexity=0.3 when enabled + set up
- tui: /config shows slm status (ready/missing/not set up + base URL if running)
- docs: roadmap updated to reflect llamafile pivot from Ollama
- internal/router/classifier.go: TaskClassifier interface with
Classify(ctx, prompt, history) signature. HeuristicClassifier wraps
the existing ClassifyTask() with zero behavior change.
- engine.Config.Classifier: injectable TaskClassifier; nil defaults
to HeuristicClassifier. Engine.classify() helper handles nil + error
fallback transparently.
- loop.go: all four router.ClassifyTask() call sites replaced with
e.classify(ctx, prompt). SLMClassifier slots in without further
changes to the engine.
Adds explicit tier preference to arm selection so the router
deterministically prefers lower-cost arms before falling back:
tier 0: CLI agents (IsCLIAgent=true, subprocess/claude|gemini|vibe)
tier 1: local models (IsLocal=true, ollama/llamacpp)
tier 2: API providers (everything else)
Within a tier, quality/cost scoring still applies. filterFeasible still
gates on quality thresholds, so a low-quality local arm won't beat a
high-quality API arm when the task's minimum threshold rules it out.
Also adds Arm.Disabled: arms with Disabled=true are excluded from
auto-routing but remain selectable via ForceArm.
Implementation: armTier helper + selectBest refactored to try tiers in
order, bestScored picks within a tier. router.Select skips disabled arms
in allArms collection (forced arm bypasses disable check).
Add EffortLevel (auto/low/medium/high) as a provider-agnostic reasoning
control, replacing the Capabilities.Thinking bool. Each provider maps
the level to its native parameter: Anthropic budget tokens (1K/8K/16K),
OpenAI reasoning_effort (low/medium/high), Google thinking budget
(1K/8K/16K). Task classification auto-infers effort from TaskType and
complexity; filterFeasible excludes arms that lack the required level.
Three compounding bugs prevented tool calling with llama.cpp:
- Stream parser set argsComplete on partial JSON (e.g. "{"), dropping
subsequent argument deltas — fix: use json.Valid to detect completeness
- Missing tool_choice default — llama.cpp needs explicit "auto" to
activate its GBNF grammar constraint; now set when tools are present
- Tool names in history used internal format (fs.ls) while definitions
used API format (fs_ls) — now re-sanitized in translateMessage
Additional changes:
- Disable SDK retries for local providers (500s are deterministic)
- Dynamic capability probing via /props (llama.cpp) and /api/show
(Ollama), replacing hardcoded model prefix list
- Engine respects forced arm ToolUse capability when router is active
- Bundled /init skill with Go template blocks, context-aware for local
vs cloud models, deduplication rules against CLAUDE.md
- Tool result compaction for local models — previous round results
replaced with size markers to stay within small context windows
- Text-only fallback when tool-parse errors occur on local models
- "text-only" TUI indicator when model lacks tool support
- Session ResetError for retry after stream failures
- AllowedTools per-turn filtering in engine buildRequest
The discovery loop's reconcileArms removed the CLI-forced arm
(llamacpp/default) because the llama.cpp server reports the real model
name (e.g. gemma-26b), creating a mismatch. After 30s the forced arm
disappeared and all subsequent requests failed.
Three-layer fix:
- Eager: query the specific provider at startup to resolve the real
model name before registering the forced arm
- Lazy: reconcileArms detects placeholder "default" arm names and
atomically renames them when discovery reveals the real identity,
with an onReconcile callback to update the session and TUI
- Guard: the forced arm is never garbage-collected by the removal loop
Also fixes misleading /init error messaging — failed inits now show
"loaded from disk (init failed)" instead of "AGENTS.md written to".
Quality feedback integration: TestQualityTracker_InfluencesArmSelection
verifies that 5 successes vs 5 failures tips Router.Select() to the
high-quality arm once EMA has enough observations. Companion test
confirms heuristic fallback below minObservations.
Coordinator tests expanded from 2 → 5: added guidance content check
(parallel/serial/synthesize present), false-positive table extended with
7 cases including the reordered keywords from the previous fix.
Agent tool suite: tool interface contracts for all four tools (Name,
Description, Parameters validity, IsReadOnly). Extracted duplicated
2000-char truncation into truncateOutput() helper (format.go), removing
the inline copies in agent.go and batch.go. Four boundary tests cover
empty, short, exact-max, and over-max cases.
Operational task types (debug, review, refactor, test, explain) now gate
before orchestration in the keyword cascade. Previously, prompts like
"review the orchestration layer" or "refactor the pipeline dispatch"
matched "orchestrat"/"dispatch" and misclassified as TaskOrchestration.
Planning is also moved below the operational types.
Expanded orchestration keywords to cover common intent that the original
four keywords missed: "fan out", "subtask", "delegate to", "spawn elf".
Adds regression tests for false-positive cases and positive tests for new
keywords.
provider/openai:
- Fix doubled tool call args (argsComplete flag): Ollama sends complete
args in the first streaming chunk then repeats them as delta, causing
doubled JSON and 400 errors in elfs
- Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep)
- Add Reasoning field support for Ollama thinking output
cmd/gnoma:
- Early TTY detection so logger is created with correct destination
before any component gets a reference to it (fixes slog WARN bleed
into TUI textarea)
permission:
- Exempt spawn_elfs and agent tools from safety scanner: elf prompt
text may legitimately mention .env/.ssh/credentials patterns and
should not be blocked
tui/app:
- /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge
(ask for plain text output) → TUI fallback write from streamBuf
- looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback
content before writing (reject refusals, strip narrative preambles)
- Collapse thinking output to 3 lines; ctrl+o to expand (live stream
and committed messages)
- Stream-level filter for model pseudo-tool-call blocks: suppresses
<<tool_code>>...</tool_code>> and <<function_call>>...<tool_call|>
from entering streamBuf across chunk boundaries
- sanitizeAssistantText regex covers both block formats
- Reset streamFilterClose at every turn start
At startup, polls ollama (/api/tags) and llama.cpp (/v1/models) for
available models. Registers each as an arm in the router alongside
the CLI-specified provider.
Discovered: 7 ollama models + 1 llama.cpp model = 9 total arms.
Router can now select from multiple local models based on task type.
Discovery is non-blocking — failures logged and skipped.