Files
gnoma/internal/slm/manager.go
T
vikingowl a14fe8b504 feat(slm): pluggable backends + trivial-prompt routing
The SLM had two intended jobs — classify every prompt and execute the
small ones itself — but in practice three independent gates kept it
out of nearly all real work:

  1. llamafile cold-start blocked pipe-mode runs (always faster than
     the 15 s health check)
  2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm
     (ToolUse=false) from 9/10 task types
  3. armTier hard-coded CLI agents > local > API, so even when the SLM
     arm was feasible a CLI agent won

Each gate is addressed below. The result is an SLM that actually does
its job — small stuff stays local, complex stuff routes up — gated by
arm capability rather than by accidents of the boot order.

Backend layer (the bigger change)

The original implementation hard-coded llamafile. That's fine if you
have nothing else, but most users with a local model setup already run
Ollama or llama.cpp. The new factory at internal/slm/backend.go picks
between:

  - ollama (any local Ollama daemon)
  - llamacpp (any llama.cpp server)
  - llamafile (gnoma-managed, current behaviour)
  - openaicompat (LM Studio, vLLM, remote API)
  - auto (probes in order, picks first reachable)
  - disabled

[slm].backend in config.toml selects which. Documented in
docs/slm-backends.md with copy-paste presets for each. The factory
probes the underlying model's actual capabilities (Ollama /api/show,
llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the
arm picks up simple file-read style tasks on tool-capable models and
stays knowledge-only on completion-only models.

Trivial-prompt heuristic (Gate 2)

ClassifyTask now flips RequiresTools=false for short, low-complexity
prompts whose task type doesn't imply existing code (Explain,
Generation, Boilerplate). Tool-needing tokens (read, write, run, test,
file, …) keep RequiresTools=true even when the prompt is brief.

Complexity-aware tier ordering (Gate 3)

armTier takes a Task and returns tier 0 for arms whose MaxComplexity
ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3.
For trivial tasks the SLM arm wins; for complex tasks the SLM falls
out of the feasible set (MaxComplexity exclusion) and the original
ordering reasserts.

Eager boot with user-facing wait (Gate 1)

Removed the original goroutine-only path. SLM startup now blocks
synchronously inside the factory; for llamafile that means up to
[slm].startup_timeout (default 5 s) of waiting on the first
invocation, with "Starting SLM…" → "SLM ready (backend, model, tools,
boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp
backends boot instantly because the daemon is already running.

waitHealthy() now respects the caller's context deadline instead of
its old hardcoded 15 s ceiling.

Classifier reliability

Classifier timeout bumped 2 s → 5 s for thinking-mode models like
Qwen3-distilled Tiny3.5. System prompt includes /no_think directive
for the same family. These help but don't eliminate small-model
JSON-contract failures — see the docs section on picking a model.

Probe + telemetry surfaces

gnoma slm status now prints the configured backend + model + a live
probe result (✓/✗) instead of just the llamafile manifest state.

`gnoma router stats` already (from the previous commit) shows the
classifier-source mix; with this change you can finally see slm /
slm_fallback / heuristic share rise from "always heuristic" to
something reflecting real SLM activity.

Tests

  - 9 new backend-factory tests (httptest-backed Ollama probe, error
    paths, auto-detection, capability flags)
  - Tier-ordering tests cover the new "specialised small arm wins
    trivial task" path
  - Trivial-prompt heuristic tested for both halves (knowledge-only
    flips RequiresTools=false; debug/file/run keeps it true)

Deletes the dead SLMManager field from the TUI Config — it was
declared but never read.
2026-05-19 18:53:32 +02:00

286 lines
7.3 KiB
Go

package slm
import (
"context"
"errors"
"fmt"
"log/slog"
"net"
"net/http"
"os"
"os/exec"
"path/filepath"
"strconv"
"strings"
"time"
)
const pidFile = "llamafile.pid"
// DefaultModelURL is the default llamafile to download when none is configured.
// TinyLlama 1.1B Chat Q5_K_M (~690 MB) — small enough to download quickly,
// sufficient for JSON classification tasks.
const DefaultModelURL = "https://huggingface.co/mozilla-ai/TinyLlama-1.1B-Chat-v1.0-llamafile/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile"
// DefaultDataDir returns the platform default SLM data directory.
// Follows XDG Base Directory Specification: $XDG_DATA_HOME/gnoma/slm,
// falling back to ~/.local/share/gnoma/slm.
func DefaultDataDir() string {
dir := os.Getenv("XDG_DATA_HOME")
if dir == "" {
home, _ := os.UserHomeDir()
dir = filepath.Join(home, ".local", "share")
}
return filepath.Join(dir, "gnoma", "slm")
}
// Status describes the setup state of the SLM.
type Status int
const (
StatusNotSetUp Status = iota // no manifest on disk
StatusReady // manifest + binary file both exist
StatusMissing // manifest exists but binary file is gone
)
func (s Status) String() string {
switch s {
case StatusNotSetUp:
return "not set up"
case StatusReady:
return "ready"
case StatusMissing:
return "file missing"
default:
return "unknown"
}
}
// Config holds Manager configuration.
type Config struct {
DataDir string // XDG data home / gnoma / slm; must be set
ModelURL string // required for Setup
}
// Manager controls the llamafile lifecycle.
type Manager struct {
cfg Config
process *os.Process
port int
logger *slog.Logger
startupBegin time.Time
startupDuration time.Duration // 0 until Start() returns healthy
}
// StartupDuration returns the elapsed time from Start() invocation to the
// first successful health check. Returns 0 when llamafile is not (yet) ready.
func (m *Manager) StartupDuration() time.Duration {
return m.startupDuration
}
// New creates a Manager. DataDir must be non-empty.
func New(cfg Config, logger *slog.Logger) *Manager {
if logger == nil {
logger = slog.Default()
}
return &Manager{cfg: cfg, logger: logger}
}
// IsSetUp returns true when Status() == StatusReady.
func (m *Manager) IsSetUp() bool {
return m.Status() == StatusReady
}
// Status returns the current setup state by inspecting the manifest and filesystem.
func (m *Manager) Status() Status {
mf, err := readManifest(m.cfg.DataDir)
if err != nil {
return StatusNotSetUp
}
if _, err := os.Stat(mf.FilePath); err != nil {
return StatusMissing
}
return StatusReady
}
// Setup downloads the llamafile from ModelURL, verifies the hash, and writes the manifest.
// progress receives (downloaded, total) byte counts; may be nil.
func (m *Manager) Setup(ctx context.Context, progress func(downloaded, total int64)) error {
if m.cfg.ModelURL == "" {
return fmt.Errorf("slm: ModelURL is required")
}
if m.Status() == StatusReady {
return nil
}
if err := os.MkdirAll(m.cfg.DataDir, 0700); err != nil {
return fmt.Errorf("slm: create data dir: %w", err)
}
name := filepath.Base(m.cfg.ModelURL)
if name == "" || name == "." {
name = "llamafile"
}
dst := filepath.Join(m.cfg.DataDir, name)
m.logger.Info("downloading llamafile", "url", m.cfg.ModelURL, "dst", dst)
sha256hex, size, err := download(ctx, m.cfg.ModelURL, dst, progress)
if err != nil {
return err
}
mf := &Manifest{
ModelURL: m.cfg.ModelURL,
FilePath: dst,
SHA256: sha256hex,
Size: size,
SetupAt: time.Now().UTC(),
}
return writeManifest(m.cfg.DataDir, mf)
}
// Start launches the llamafile subprocess and returns its base URL.
// Reaps a stale PID file from a previous run if present.
func (m *Manager) Start(ctx context.Context) (string, error) {
m.startupBegin = time.Now()
mf, err := readManifest(m.cfg.DataDir)
if err != nil {
return "", fmt.Errorf("slm: not set up: %w", err)
}
if _, err := os.Stat(mf.FilePath); err != nil {
return "", fmt.Errorf("slm: llamafile missing at %s", mf.FilePath)
}
m.reapStalePID()
port, err := freePort()
if err != nil {
return "", fmt.Errorf("slm: find free port: %w", err)
}
// Invoke via sh to bypass Wine binfmt_misc interception of APE polyglot binaries.
// llamafile is a valid POSIX shell script; sh executes the embedded launcher header.
cmd := exec.CommandContext(ctx, "sh", mf.FilePath,
"--server",
"--host", "127.0.0.1",
"--port", strconv.Itoa(port),
"--nobrowser",
)
if err := cmd.Start(); err != nil {
return "", fmt.Errorf("slm: start llamafile: %w", err)
}
m.process = cmd.Process
m.port = port
if err := os.WriteFile(m.pidPath(), []byte(strconv.Itoa(cmd.Process.Pid)), 0600); err != nil {
m.logger.Warn("failed to write pid file", "error", err)
}
baseURL := fmt.Sprintf("http://127.0.0.1:%d", port)
m.logger.Info("llamafile started", "pid", cmd.Process.Pid, "url", baseURL)
if err := waitHealthy(ctx, baseURL); err != nil {
_ = m.Stop()
return "", err
}
m.startupDuration = time.Since(m.startupBegin)
m.logger.Info("llamafile healthy", "url", baseURL, "startup", m.startupDuration)
return baseURL, nil
}
// Stop terminates the llamafile process and cleans up the PID file.
func (m *Manager) Stop() error {
if m.process == nil {
return nil
}
if err := m.process.Kill(); err != nil && !errors.Is(err, os.ErrProcessDone) {
return fmt.Errorf("slm: kill llamafile: %w", err)
}
m.process = nil
m.port = 0
_ = os.Remove(m.pidPath())
return nil
}
// BaseURL returns the current server base URL, or "" if not running.
func (m *Manager) BaseURL() string {
if m.process == nil || m.port == 0 {
return ""
}
return fmt.Sprintf("http://127.0.0.1:%d", m.port)
}
// Manifest returns the on-disk manifest if present, or nil.
func (m *Manager) Manifest() *Manifest {
mf, err := readManifest(m.cfg.DataDir)
if err != nil {
return nil
}
return mf
}
func (m *Manager) pidPath() string {
return filepath.Join(m.cfg.DataDir, pidFile)
}
func (m *Manager) reapStalePID() {
data, err := os.ReadFile(m.pidPath())
if err != nil {
return
}
pid, err := strconv.Atoi(strings.TrimSpace(string(data)))
if err != nil {
_ = os.Remove(m.pidPath())
return
}
proc, err := os.FindProcess(pid)
if err != nil {
_ = os.Remove(m.pidPath())
return
}
_ = proc.Kill()
_ = os.Remove(m.pidPath())
m.logger.Debug("reaped stale llamafile process", "pid", pid)
}
// freePort binds on :0 to let the OS pick an available port, then releases it.
// There is a small TOCTOU window between release and use, which is acceptable for a local dev tool.
func freePort() (int, error) {
l, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
return 0, err
}
port := l.Addr().(*net.TCPAddr).Port
_ = l.Close()
return port, nil
}
// waitHealthy polls baseURL/health until it returns 200 or ctx is cancelled.
// The ctx deadline governs how long we'll wait — callers should pass a
// context with a budget appropriate for first-launch cold start.
func waitHealthy(ctx context.Context, baseURL string) error {
client := &http.Client{Timeout: 2 * time.Second}
for {
select {
case <-ctx.Done():
return fmt.Errorf("slm: health check did not pass before context deadline: %w", ctx.Err())
default:
}
resp, err := client.Get(baseURL + "/health")
if err == nil {
_ = resp.Body.Close()
if resp.StatusCode == http.StatusOK {
return nil
}
}
time.Sleep(200 * time.Millisecond)
}
}