Files
gnoma/internal/router/task.go
T
vikingowl a14fe8b504 feat(slm): pluggable backends + trivial-prompt routing
The SLM had two intended jobs — classify every prompt and execute the
small ones itself — but in practice three independent gates kept it
out of nearly all real work:

  1. llamafile cold-start blocked pipe-mode runs (always faster than
     the 15 s health check)
  2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm
     (ToolUse=false) from 9/10 task types
  3. armTier hard-coded CLI agents > local > API, so even when the SLM
     arm was feasible a CLI agent won

Each gate is addressed below. The result is an SLM that actually does
its job — small stuff stays local, complex stuff routes up — gated by
arm capability rather than by accidents of the boot order.

Backend layer (the bigger change)

The original implementation hard-coded llamafile. That's fine if you
have nothing else, but most users with a local model setup already run
Ollama or llama.cpp. The new factory at internal/slm/backend.go picks
between:

  - ollama (any local Ollama daemon)
  - llamacpp (any llama.cpp server)
  - llamafile (gnoma-managed, current behaviour)
  - openaicompat (LM Studio, vLLM, remote API)
  - auto (probes in order, picks first reachable)
  - disabled

[slm].backend in config.toml selects which. Documented in
docs/slm-backends.md with copy-paste presets for each. The factory
probes the underlying model's actual capabilities (Ollama /api/show,
llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the
arm picks up simple file-read style tasks on tool-capable models and
stays knowledge-only on completion-only models.

Trivial-prompt heuristic (Gate 2)

ClassifyTask now flips RequiresTools=false for short, low-complexity
prompts whose task type doesn't imply existing code (Explain,
Generation, Boilerplate). Tool-needing tokens (read, write, run, test,
file, …) keep RequiresTools=true even when the prompt is brief.

Complexity-aware tier ordering (Gate 3)

armTier takes a Task and returns tier 0 for arms whose MaxComplexity
ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3.
For trivial tasks the SLM arm wins; for complex tasks the SLM falls
out of the feasible set (MaxComplexity exclusion) and the original
ordering reasserts.

Eager boot with user-facing wait (Gate 1)

Removed the original goroutine-only path. SLM startup now blocks
synchronously inside the factory; for llamafile that means up to
[slm].startup_timeout (default 5 s) of waiting on the first
invocation, with "Starting SLM…" → "SLM ready (backend, model, tools,
boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp
backends boot instantly because the daemon is already running.

waitHealthy() now respects the caller's context deadline instead of
its old hardcoded 15 s ceiling.

Classifier reliability

Classifier timeout bumped 2 s → 5 s for thinking-mode models like
Qwen3-distilled Tiny3.5. System prompt includes /no_think directive
for the same family. These help but don't eliminate small-model
JSON-contract failures — see the docs section on picking a model.

Probe + telemetry surfaces

gnoma slm status now prints the configured backend + model + a live
probe result (✓/✗) instead of just the llamafile manifest state.

`gnoma router stats` already (from the previous commit) shows the
classifier-source mix; with this change you can finally see slm /
slm_fallback / heuristic share rise from "always heuristic" to
something reflecting real SLM activity.

Tests

  - 9 new backend-factory tests (httptest-backed Ollama probe, error
    paths, auto-detection, capability flags)
  - Tier-ordering tests cover the new "specialised small arm wins
    trivial task" path
  - Trivial-prompt heuristic tested for both halves (knowledge-only
    flips RequiresTools=false; debug/file/run keeps it true)

Deletes the dead SLMManager field from the TUI Config — it was
declared but never read.
2026-05-19 18:53:32 +02:00

353 lines
11 KiB
Go
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
package router
import (
"fmt"
"strings"
"somegit.dev/Owlibou/gnoma/internal/provider"
)
// TaskType classifies a task for routing purposes.
type TaskType int
const (
TaskBoilerplate TaskType = iota // simple scaffolding, templates
TaskGeneration // new code creation
TaskRefactor // restructuring existing code
TaskReview // code review, analysis
TaskUnitTest // writing tests
TaskPlanning // architecture, design
TaskOrchestration // multi-step coordination
TaskSecurityReview // security-focused analysis
TaskDebug // finding and fixing bugs
TaskExplain // explaining code or concepts
)
func (t TaskType) String() string {
switch t {
case TaskBoilerplate:
return "boilerplate"
case TaskGeneration:
return "generation"
case TaskRefactor:
return "refactor"
case TaskReview:
return "review"
case TaskUnitTest:
return "unit_test"
case TaskPlanning:
return "planning"
case TaskOrchestration:
return "orchestration"
case TaskSecurityReview:
return "security_review"
case TaskDebug:
return "debug"
case TaskExplain:
return "explain"
default:
return fmt.Sprintf("unknown(%d)", t)
}
}
// Priority indicates task importance for routing decisions.
type Priority int
const (
PriorityLow Priority = iota
PriorityNormal
PriorityHigh
PriorityCritical
)
// ClassifierSource identifies which classifier produced a Task.
// Phase 4 routing decisions depend on knowing whether the SLM is actually
// firing or whether the heuristic is silently doing all the work.
type ClassifierSource int
const (
ClassifierUnknown ClassifierSource = iota // unset / pre-classification
ClassifierHeuristic // router.HeuristicClassifier
ClassifierSLM // slm.Classifier (SLM call succeeded)
ClassifierSLMFallback // slm.Classifier fell back internally (timeout, parse error)
)
func (s ClassifierSource) String() string {
switch s {
case ClassifierHeuristic:
return "heuristic"
case ClassifierSLM:
return "slm"
case ClassifierSLMFallback:
return "slm_fallback"
default:
return "unknown"
}
}
// Task represents a classified unit of work for routing.
type Task struct {
Type TaskType
Priority Priority
EstimatedTokens int
RequiresTools bool
ComplexityScore float64 // 0-1
RequiredEffort provider.EffortLevel // EffortAuto = no constraint on thinking
ExcludedArms []ArmID // Arms to avoid (e.g. due to recent 429 errors)
ClassifierSource ClassifierSource // which classifier produced this Task
}
// ValueScore computes a routing value based on priority and type.
func (t Task) ValueScore() float64 {
base := map[Priority]float64{
PriorityLow: 0.5,
PriorityNormal: 1.0,
PriorityHigh: 2.0,
PriorityCritical: 5.0,
}[t.Priority]
return base * taskTypeMultiplier[t.Type]
}
var taskTypeMultiplier = map[TaskType]float64{
TaskBoilerplate: 0.6,
TaskGeneration: 1.0,
TaskRefactor: 0.9,
TaskReview: 1.1,
TaskUnitTest: 0.8,
TaskPlanning: 1.4,
TaskOrchestration: 1.5,
TaskSecurityReview: 2.0,
TaskDebug: 1.2,
TaskExplain: 0.7,
}
// QualityThreshold defines minimum acceptable quality for a task type.
type QualityThreshold struct {
Minimum float64 // below → output is harmful, never accept
Acceptable float64 // good enough
Target float64 // ideal
}
// DefaultThresholds are calibrated for M4 heuristic scores (range ~00.85).
// M9 will replace these with bandit-derived values once quality data accumulates.
var DefaultThresholds = map[TaskType]QualityThreshold{
TaskBoilerplate: {0.40, 0.55, 0.70}, // any capable arm works
TaskGeneration: {0.45, 0.60, 0.75},
TaskRefactor: {0.50, 0.65, 0.78},
TaskReview: {0.55, 0.68, 0.80},
TaskUnitTest: {0.45, 0.60, 0.75},
TaskPlanning: {0.60, 0.72, 0.82},
TaskOrchestration: {0.65, 0.75, 0.83},
TaskSecurityReview: {0.70, 0.78, 0.84}, // requires thinking or large context window
TaskDebug: {0.50, 0.65, 0.78},
TaskExplain: {0.40, 0.55, 0.72},
}
// inferEffort derives the minimum required reasoning effort from task type and complexity.
func inferEffort(task Task) provider.EffortLevel {
switch task.Type {
case TaskSecurityReview:
return provider.EffortHigh
case TaskOrchestration:
if task.ComplexityScore >= 0.5 {
return provider.EffortHigh
}
return provider.EffortMedium
case TaskPlanning:
if task.ComplexityScore >= 0.7 {
return provider.EffortHigh
}
return provider.EffortMedium
case TaskDebug, TaskRefactor, TaskReview:
if task.ComplexityScore >= 0.7 {
return provider.EffortMedium
}
if task.ComplexityScore >= 0.4 {
return provider.EffortLow
}
return provider.EffortAuto
case TaskGeneration:
if task.ComplexityScore >= 0.8 {
return provider.EffortMedium
}
return provider.EffortAuto
default:
return provider.EffortAuto
}
}
// ClassifyTask infers a TaskType from the user's prompt using keyword heuristics.
func ClassifyTask(prompt string) Task {
lower := strings.ToLower(prompt)
task := Task{
Priority: PriorityNormal,
RequiresTools: true, // assume tools needed by default
}
// Check for task type keywords (order matters — more specific/common first).
// Orchestration is placed late: its keywords ("dispatch", "pipeline", "orchestrat")
// appear as nouns in non-orchestration prompts (e.g. "refactor the pipeline dispatch",
// "review the orchestration layer"). Operational task types must gate first.
switch {
case containsAny(lower, "security", "vulnerability", "cve", "owasp", "xss", "injection", "audit security"):
task.Type = TaskSecurityReview
task.Priority = PriorityHigh
case containsAny(lower, "debug", "fix", "troubleshoot", "not working", "error", "crash", "failing", "bug"):
task.Type = TaskDebug
case containsAny(lower, "review", "check", "analyze", "audit", "inspect"):
task.Type = TaskReview
case containsAny(lower, "refactor", "restructure", "reorganize", "clean up", "simplify"):
task.Type = TaskRefactor
case containsAny(lower, "test", "spec", "coverage", "assert"):
task.Type = TaskUnitTest
case containsAny(lower, "explain", "what is", "how does", "describe", "tell me about"):
task.Type = TaskExplain
task.RequiresTools = false
case containsAny(lower, "plan", "architect", "design", "strategy", "roadmap"):
task.Type = TaskPlanning
case containsAny(lower, "orchestrat", "coordinate", "dispatch", "pipeline",
"fan out", "subtask", "delegate to", "spawn elf"):
task.Type = TaskOrchestration
task.Priority = PriorityHigh
case containsAny(lower, "create", "implement", "build", "add", "write", "generate", "make"):
task.Type = TaskGeneration
case containsAny(lower, "scaffold", "boilerplate", "template", "stub", "skeleton"):
task.Type = TaskBoilerplate
default:
task.Type = TaskGeneration // default
}
// Estimate complexity from prompt length and keywords
task.ComplexityScore = estimateComplexity(lower)
// Trivial-prompt override: short, knowledge-only prompts whose task
// type doesn't imply existing code to read or modify can run without
// tools — making the SLM arm (ToolUse=false) feasible for genuinely
// tiny questions like "what is 2+2?" or "explain a closure".
if isTrivialPrompt(lower, task.Type, task.ComplexityScore) {
task.RequiresTools = false
}
task.RequiredEffort = inferEffort(task)
return task
}
// trivialEligibleTypes are the task types where a "no tools needed" verdict
// is plausible from a short prompt alone. Debug / Refactor / Review / Test /
// SecurityReview / Orchestration all imply existing code or processes to
// touch — keep RequiresTools=true even if the wording is brief.
var trivialEligibleTypes = map[TaskType]bool{
TaskExplain: true,
TaskGeneration: true,
TaskBoilerplate: true,
}
// toolNeedingTokens name actions/objects that always require tool execution.
// Matched as whole words (string-fields), not substrings — avoids treating
// "tester" as "test".
var toolNeedingTokens = map[string]bool{
"read": true, "write": true, "edit": true, "create": true,
"delete": true, "remove": true, "list": true, "find": true,
"search": true, "grep": true, "open": true, "save": true,
"run": true, "execute": true, "compile": true, "build": true,
"test": true, "tests": true, "install": true, "commit": true,
"push": true, "pull": true, "diff": true, "file": true, "files": true,
}
// isTrivialPrompt is true when the prompt is short, low-complexity, of a
// type compatible with knowledge-only answers, and contains no token that
// implies a file/shell action.
func isTrivialPrompt(lower string, taskType TaskType, complexity float64) bool {
if !trivialEligibleTypes[taskType] {
return false
}
if complexity > 0.15 {
return false
}
fields := strings.Fields(lower)
if len(fields) > 12 {
return false
}
for _, w := range fields {
w = strings.Trim(w, ".,!?;:")
if toolNeedingTokens[w] {
return false
}
}
return true
}
func containsAny(s string, keywords ...string) bool {
for _, kw := range keywords {
if strings.Contains(s, kw) {
return true
}
}
return false
}
func estimateComplexity(prompt string) float64 {
score := 0.0
// Length contributes to complexity
words := len(strings.Fields(prompt))
score += float64(words) / 200.0 // normalize: 200 words = 1.0
// Complexity keywords
complexKeywords := []string{"implement", "design", "architect", "system", "integration", "migrate", "optimize"}
for _, kw := range complexKeywords {
if strings.Contains(prompt, kw) {
score += 0.15
}
}
// Simple keywords reduce complexity
simpleKeywords := []string{"rename", "format", "add field", "change name", "typo", "simple"}
for _, kw := range simpleKeywords {
if strings.Contains(prompt, kw) {
score -= 0.15
}
}
// Clamp to [0, 1]
if score < 0 {
score = 0
}
if score > 1 {
score = 1
}
return score
}
// ParseTaskType converts a string from an SLM JSON response to a TaskType.
// Matching is case-insensitive. Unknown strings fall back to TaskGeneration.
func ParseTaskType(s string) TaskType {
switch strings.ToLower(strings.ReplaceAll(s, "_", "")) {
case "debug":
return TaskDebug
case "explain":
return TaskExplain
case "generation":
return TaskGeneration
case "refactor":
return TaskRefactor
case "unittest":
return TaskUnitTest
case "boilerplate":
return TaskBoilerplate
case "planning":
return TaskPlanning
case "orchestration":
return TaskOrchestration
case "securityreview":
return TaskSecurityReview
case "review":
return TaskReview
default:
return TaskGeneration
}
}