a14fe8b504
The SLM had two intended jobs — classify every prompt and execute the
small ones itself — but in practice three independent gates kept it
out of nearly all real work:
1. llamafile cold-start blocked pipe-mode runs (always faster than
the 15 s health check)
2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm
(ToolUse=false) from 9/10 task types
3. armTier hard-coded CLI agents > local > API, so even when the SLM
arm was feasible a CLI agent won
Each gate is addressed below. The result is an SLM that actually does
its job — small stuff stays local, complex stuff routes up — gated by
arm capability rather than by accidents of the boot order.
Backend layer (the bigger change)
The original implementation hard-coded llamafile. That's fine if you
have nothing else, but most users with a local model setup already run
Ollama or llama.cpp. The new factory at internal/slm/backend.go picks
between:
- ollama (any local Ollama daemon)
- llamacpp (any llama.cpp server)
- llamafile (gnoma-managed, current behaviour)
- openaicompat (LM Studio, vLLM, remote API)
- auto (probes in order, picks first reachable)
- disabled
[slm].backend in config.toml selects which. Documented in
docs/slm-backends.md with copy-paste presets for each. The factory
probes the underlying model's actual capabilities (Ollama /api/show,
llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the
arm picks up simple file-read style tasks on tool-capable models and
stays knowledge-only on completion-only models.
Trivial-prompt heuristic (Gate 2)
ClassifyTask now flips RequiresTools=false for short, low-complexity
prompts whose task type doesn't imply existing code (Explain,
Generation, Boilerplate). Tool-needing tokens (read, write, run, test,
file, …) keep RequiresTools=true even when the prompt is brief.
Complexity-aware tier ordering (Gate 3)
armTier takes a Task and returns tier 0 for arms whose MaxComplexity
ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3.
For trivial tasks the SLM arm wins; for complex tasks the SLM falls
out of the feasible set (MaxComplexity exclusion) and the original
ordering reasserts.
Eager boot with user-facing wait (Gate 1)
Removed the original goroutine-only path. SLM startup now blocks
synchronously inside the factory; for llamafile that means up to
[slm].startup_timeout (default 5 s) of waiting on the first
invocation, with "Starting SLM…" → "SLM ready (backend, model, tools,
boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp
backends boot instantly because the daemon is already running.
waitHealthy() now respects the caller's context deadline instead of
its old hardcoded 15 s ceiling.
Classifier reliability
Classifier timeout bumped 2 s → 5 s for thinking-mode models like
Qwen3-distilled Tiny3.5. System prompt includes /no_think directive
for the same family. These help but don't eliminate small-model
JSON-contract failures — see the docs section on picking a model.
Probe + telemetry surfaces
gnoma slm status now prints the configured backend + model + a live
probe result (✓/✗) instead of just the llamafile manifest state.
`gnoma router stats` already (from the previous commit) shows the
classifier-source mix; with this change you can finally see slm /
slm_fallback / heuristic share rise from "always heuristic" to
something reflecting real SLM activity.
Tests
- 9 new backend-factory tests (httptest-backed Ollama probe, error
paths, auto-detection, capability flags)
- Tier-ordering tests cover the new "specialised small arm wins
trivial task" path
- Trivial-prompt heuristic tested for both halves (knowledge-only
flips RequiresTools=false; debug/file/run keeps it true)
Deletes the dead SLMManager field from the TUI Config — it was
declared but never read.
93 lines
2.6 KiB
Go
93 lines
2.6 KiB
Go
package router
|
|
|
|
import "testing"
|
|
|
|
func TestParseTaskType(t *testing.T) {
|
|
cases := []struct {
|
|
input string
|
|
want TaskType
|
|
}{
|
|
{"Debug", TaskDebug},
|
|
{"debug", TaskDebug},
|
|
{"DEBUG", TaskDebug},
|
|
{"Explain", TaskExplain},
|
|
{"explain", TaskExplain},
|
|
{"Generation", TaskGeneration},
|
|
{"generation", TaskGeneration},
|
|
{"Refactor", TaskRefactor},
|
|
{"refactor", TaskRefactor},
|
|
{"UnitTest", TaskUnitTest},
|
|
{"unit_test", TaskUnitTest},
|
|
{"unitTest", TaskUnitTest},
|
|
{"Boilerplate", TaskBoilerplate},
|
|
{"boilerplate", TaskBoilerplate},
|
|
{"Planning", TaskPlanning},
|
|
{"planning", TaskPlanning},
|
|
{"Orchestration", TaskOrchestration},
|
|
{"orchestration", TaskOrchestration},
|
|
{"SecurityReview", TaskSecurityReview},
|
|
{"security_review", TaskSecurityReview},
|
|
{"Review", TaskReview},
|
|
{"review", TaskReview},
|
|
// unknown falls back to TaskGeneration
|
|
{"", TaskGeneration},
|
|
{"unknown", TaskGeneration},
|
|
{"gibberish", TaskGeneration},
|
|
}
|
|
|
|
for _, tc := range cases {
|
|
got := ParseTaskType(tc.input)
|
|
if got != tc.want {
|
|
t.Errorf("ParseTaskType(%q) = %s, want %s", tc.input, got, tc.want)
|
|
}
|
|
}
|
|
}
|
|
|
|
func TestClassifyTask_TrivialPromptDropsRequiresTools(t *testing.T) {
|
|
// Short, knowledge-only prompts must opt out of RequiresTools so the
|
|
// SLM arm (ToolUse=false) is feasible.
|
|
cases := []struct {
|
|
prompt string
|
|
}{
|
|
{"what is 2+2"},
|
|
{"what is a closure"},
|
|
{"explain a goroutine"},
|
|
{"how does map work"},
|
|
}
|
|
for _, tc := range cases {
|
|
got := ClassifyTask(tc.prompt)
|
|
if got.RequiresTools {
|
|
t.Errorf("ClassifyTask(%q).RequiresTools = true, want false (trivial prompt)", tc.prompt)
|
|
}
|
|
}
|
|
}
|
|
|
|
func TestClassifyTask_FileVerbsKeepRequiresTools(t *testing.T) {
|
|
// Even short prompts must keep RequiresTools=true when they reference
|
|
// a file/shell action — those genuinely need tool execution.
|
|
cases := []struct {
|
|
prompt string
|
|
}{
|
|
{"read /etc/hosts"},
|
|
{"list files in /tmp"},
|
|
{"run tests"},
|
|
{"show me the diff"},
|
|
}
|
|
for _, tc := range cases {
|
|
got := ClassifyTask(tc.prompt)
|
|
if !got.RequiresTools {
|
|
t.Errorf("ClassifyTask(%q).RequiresTools = false, want true (file/shell verb)", tc.prompt)
|
|
}
|
|
}
|
|
}
|
|
|
|
func TestClassifyTask_LongPromptsKeepRequiresTools(t *testing.T) {
|
|
// Prompts longer than 12 words shouldn't be treated as trivial even
|
|
// if they don't reference file actions explicitly.
|
|
prompt := "I want you to think through how a generic interpreter handles closures across multiple call sites and many layers of indirection"
|
|
got := ClassifyTask(prompt)
|
|
if !got.RequiresTools {
|
|
t.Errorf("long prompt got RequiresTools=false; want true")
|
|
}
|
|
}
|