Files
gnoma/internal/router/task_test.go
T
vikingowl a14fe8b504 feat(slm): pluggable backends + trivial-prompt routing
The SLM had two intended jobs — classify every prompt and execute the
small ones itself — but in practice three independent gates kept it
out of nearly all real work:

  1. llamafile cold-start blocked pipe-mode runs (always faster than
     the 15 s health check)
  2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm
     (ToolUse=false) from 9/10 task types
  3. armTier hard-coded CLI agents > local > API, so even when the SLM
     arm was feasible a CLI agent won

Each gate is addressed below. The result is an SLM that actually does
its job — small stuff stays local, complex stuff routes up — gated by
arm capability rather than by accidents of the boot order.

Backend layer (the bigger change)

The original implementation hard-coded llamafile. That's fine if you
have nothing else, but most users with a local model setup already run
Ollama or llama.cpp. The new factory at internal/slm/backend.go picks
between:

  - ollama (any local Ollama daemon)
  - llamacpp (any llama.cpp server)
  - llamafile (gnoma-managed, current behaviour)
  - openaicompat (LM Studio, vLLM, remote API)
  - auto (probes in order, picks first reachable)
  - disabled

[slm].backend in config.toml selects which. Documented in
docs/slm-backends.md with copy-paste presets for each. The factory
probes the underlying model's actual capabilities (Ollama /api/show,
llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the
arm picks up simple file-read style tasks on tool-capable models and
stays knowledge-only on completion-only models.

Trivial-prompt heuristic (Gate 2)

ClassifyTask now flips RequiresTools=false for short, low-complexity
prompts whose task type doesn't imply existing code (Explain,
Generation, Boilerplate). Tool-needing tokens (read, write, run, test,
file, …) keep RequiresTools=true even when the prompt is brief.

Complexity-aware tier ordering (Gate 3)

armTier takes a Task and returns tier 0 for arms whose MaxComplexity
ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3.
For trivial tasks the SLM arm wins; for complex tasks the SLM falls
out of the feasible set (MaxComplexity exclusion) and the original
ordering reasserts.

Eager boot with user-facing wait (Gate 1)

Removed the original goroutine-only path. SLM startup now blocks
synchronously inside the factory; for llamafile that means up to
[slm].startup_timeout (default 5 s) of waiting on the first
invocation, with "Starting SLM…" → "SLM ready (backend, model, tools,
boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp
backends boot instantly because the daemon is already running.

waitHealthy() now respects the caller's context deadline instead of
its old hardcoded 15 s ceiling.

Classifier reliability

Classifier timeout bumped 2 s → 5 s for thinking-mode models like
Qwen3-distilled Tiny3.5. System prompt includes /no_think directive
for the same family. These help but don't eliminate small-model
JSON-contract failures — see the docs section on picking a model.

Probe + telemetry surfaces

gnoma slm status now prints the configured backend + model + a live
probe result (✓/✗) instead of just the llamafile manifest state.

`gnoma router stats` already (from the previous commit) shows the
classifier-source mix; with this change you can finally see slm /
slm_fallback / heuristic share rise from "always heuristic" to
something reflecting real SLM activity.

Tests

  - 9 new backend-factory tests (httptest-backed Ollama probe, error
    paths, auto-detection, capability flags)
  - Tier-ordering tests cover the new "specialised small arm wins
    trivial task" path
  - Trivial-prompt heuristic tested for both halves (knowledge-only
    flips RequiresTools=false; debug/file/run keeps it true)

Deletes the dead SLMManager field from the TUI Config — it was
declared but never read.
2026-05-19 18:53:32 +02:00

93 lines
2.6 KiB
Go

package router
import "testing"
func TestParseTaskType(t *testing.T) {
cases := []struct {
input string
want TaskType
}{
{"Debug", TaskDebug},
{"debug", TaskDebug},
{"DEBUG", TaskDebug},
{"Explain", TaskExplain},
{"explain", TaskExplain},
{"Generation", TaskGeneration},
{"generation", TaskGeneration},
{"Refactor", TaskRefactor},
{"refactor", TaskRefactor},
{"UnitTest", TaskUnitTest},
{"unit_test", TaskUnitTest},
{"unitTest", TaskUnitTest},
{"Boilerplate", TaskBoilerplate},
{"boilerplate", TaskBoilerplate},
{"Planning", TaskPlanning},
{"planning", TaskPlanning},
{"Orchestration", TaskOrchestration},
{"orchestration", TaskOrchestration},
{"SecurityReview", TaskSecurityReview},
{"security_review", TaskSecurityReview},
{"Review", TaskReview},
{"review", TaskReview},
// unknown falls back to TaskGeneration
{"", TaskGeneration},
{"unknown", TaskGeneration},
{"gibberish", TaskGeneration},
}
for _, tc := range cases {
got := ParseTaskType(tc.input)
if got != tc.want {
t.Errorf("ParseTaskType(%q) = %s, want %s", tc.input, got, tc.want)
}
}
}
func TestClassifyTask_TrivialPromptDropsRequiresTools(t *testing.T) {
// Short, knowledge-only prompts must opt out of RequiresTools so the
// SLM arm (ToolUse=false) is feasible.
cases := []struct {
prompt string
}{
{"what is 2+2"},
{"what is a closure"},
{"explain a goroutine"},
{"how does map work"},
}
for _, tc := range cases {
got := ClassifyTask(tc.prompt)
if got.RequiresTools {
t.Errorf("ClassifyTask(%q).RequiresTools = true, want false (trivial prompt)", tc.prompt)
}
}
}
func TestClassifyTask_FileVerbsKeepRequiresTools(t *testing.T) {
// Even short prompts must keep RequiresTools=true when they reference
// a file/shell action — those genuinely need tool execution.
cases := []struct {
prompt string
}{
{"read /etc/hosts"},
{"list files in /tmp"},
{"run tests"},
{"show me the diff"},
}
for _, tc := range cases {
got := ClassifyTask(tc.prompt)
if !got.RequiresTools {
t.Errorf("ClassifyTask(%q).RequiresTools = false, want true (file/shell verb)", tc.prompt)
}
}
}
func TestClassifyTask_LongPromptsKeepRequiresTools(t *testing.T) {
// Prompts longer than 12 words shouldn't be treated as trivial even
// if they don't reference file actions explicitly.
prompt := "I want you to think through how a generic interpreter handles closures across multiple call sites and many layers of indirection"
got := ClassifyTask(prompt)
if !got.RequiresTools {
t.Errorf("long prompt got RequiresTools=false; want true")
}
}