gnoma

Author	SHA1	Message	Date
vikingowl	0aabd19906	feat(router): per-arm strengths + cost weight (Phase D) Plan D from docs/superpowers/plans/2026-05-19-post-slm-unlock.md (static portion; dynamic bandit-driven promotion deferred to D-2). Routing previously let tier ordering (CLI > local > API) dominate selection — Opus, in tier 3, would lose to a tier-1 CLI agent for SecurityReview even though Opus is empirically stronger at that task. This change introduces explicit per-arm overrides: [[arms]] id = "anthropic/claude-opus-4-7" strengths = ["security_review", "planning"] cost_weight = 0.3 Strengths gate cross-tier promotion: arms matching task.Type bypass the tier loop and compete with each other directly. Promotion is a preference, not a pin — if no strength-tagged arm is feasible (backoff, pool capacity, tool support), selection falls through to the default tier order. CostWeight linearly dampens the cost penalty in scoreArm via effectiveCost = 1 + CostWeight * (cost - 1) CostWeight=1.0 (or unset) preserves current behavior; lower values trade cheapness for quality. The earlier draft used cost^CostWeight which inverts direction for sub-1 local-arm costs (raising a fraction <1 to a fractional power makes it bigger, not smaller); a monotonicity regression test prevents that drift. - internal/router/arm.go: Strengths []TaskType, CostWeight float64, HasStrength(), ResolvedCostWeight() (zero → 1.0). - internal/router/selector.go: scoreArm strength bonus const (strengthScoreBonus = 0.15) + linear cost dampening; selectBest cross-tier promotion before tier loop. - internal/router/router.go: ArmOverride type + ApplyArmOverrides() returns unknown IDs; unknown strength names skipped with per-name warning via slog. - internal/router/task.go: ParseTaskTypeStrict() returns ok bool; ParseTaskType now delegates so the two switches stay in sync. - internal/config/config.go: ArmConfig + [[arms]] TOML wiring. - cmd/gnoma/main.go: applies overrides after all initial arms register; logs a warning when an [[arms]] id has no matching registered arm. Tests cover: predicate helpers, scoring direction across two arms, linear-formula monotonicity on both sides of cost=1, cross-tier promotion, empty-Strengths preserves tier order, promoted arm in backoff falls through via full Router.Select path, observed-quality tiebreak between two strength-tagged arms, ApplyArmOverrides happy path + unknown-ID reporting + unknown-strength skipping.	2026-05-19 21:14:45 +02:00
vikingowl	eb0583f606	fix(router): unpin config-default provider + complexity floor by task type Two routing bugs were keeping the SLM out of every real prompt and, once it was eligible, pulling complex tasks into it as well. Bug 1: ForceArm was called unconditionally when a primary provider was configured (cmd/gnoma/main.go:378). That short-circuited the entire router — every prompt went straight to whatever was set as [provider].default, regardless of tier, score, or feasibility. The SLM arm appeared in `gnoma router stats` registration logs but had zero observations after dozens of prompts. Fix: only pin when the user passed --provider on the command line. Config defaults register the arm but don't force it; the router picks freely. Verified end-to-end — trivial prompts now reach slm/ollama via the tier-0 priority. Bug 2: A short prompt like "refactor the SLM module" classifies as TaskRefactor with complexity 0.015 — well under the SLM arm's 0.3 ceiling. The arm became eligible despite the task being inherently non-trivial. Once eligible, tier-0 priority then pulled it in over the CLI agents. Fix: add MinComplexityForType, applied in both ClassifyTask (heuristic path) and slm.Classifier.Classify (SLM-overlay path). The floor is per-task-type: - TaskSecurityReview, TaskOrchestration → 0.60 - TaskRefactor, TaskPlanning, TaskDebug → 0.40 - TaskUnitTest, TaskReview → 0.35 Tasks like Explain/Generation/Boilerplate keep their organic complexity score so trivial knowledge prompts (≤0.15) still fall to the SLM. Tasks that imply existing code or multi-step reasoning are clamped above the SLM's MaxComplexity, naturally routing them to a bigger arm. After both fixes, observed routing in a clean run: What is 2+2? → slm/ollama (complexity 0.015) Define a closure → slm/ollama (complexity 0.015) What is HTTP? → slm/ollama (complexity 0.015) Refactor the SLM module → subprocess/gemini (complexity 0.40) Audit for race conditions → subprocess/gemini (complexity 0.35) Plan a migration → subprocess/gemini (complexity 0.40)	2026-05-19 19:22:16 +02:00
vikingowl	a14fe8b504	feat(slm): pluggable backends + trivial-prompt routing The SLM had two intended jobs — classify every prompt and execute the small ones itself — but in practice three independent gates kept it out of nearly all real work: 1. llamafile cold-start blocked pipe-mode runs (always faster than the 15 s health check) 2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm (ToolUse=false) from 9/10 task types 3. armTier hard-coded CLI agents > local > API, so even when the SLM arm was feasible a CLI agent won Each gate is addressed below. The result is an SLM that actually does its job — small stuff stays local, complex stuff routes up — gated by arm capability rather than by accidents of the boot order. Backend layer (the bigger change) The original implementation hard-coded llamafile. That's fine if you have nothing else, but most users with a local model setup already run Ollama or llama.cpp. The new factory at internal/slm/backend.go picks between: - ollama (any local Ollama daemon) - llamacpp (any llama.cpp server) - llamafile (gnoma-managed, current behaviour) - openaicompat (LM Studio, vLLM, remote API) - auto (probes in order, picks first reachable) - disabled [slm].backend in config.toml selects which. Documented in docs/slm-backends.md with copy-paste presets for each. The factory probes the underlying model's actual capabilities (Ollama /api/show, llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the arm picks up simple file-read style tasks on tool-capable models and stays knowledge-only on completion-only models. Trivial-prompt heuristic (Gate 2) ClassifyTask now flips RequiresTools=false for short, low-complexity prompts whose task type doesn't imply existing code (Explain, Generation, Boilerplate). Tool-needing tokens (read, write, run, test, file, …) keep RequiresTools=true even when the prompt is brief. Complexity-aware tier ordering (Gate 3) armTier takes a Task and returns tier 0 for arms whose MaxComplexity ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3. For trivial tasks the SLM arm wins; for complex tasks the SLM falls out of the feasible set (MaxComplexity exclusion) and the original ordering reasserts. Eager boot with user-facing wait (Gate 1) Removed the original goroutine-only path. SLM startup now blocks synchronously inside the factory; for llamafile that means up to [slm].startup_timeout (default 5 s) of waiting on the first invocation, with "Starting SLM…" → "SLM ready (backend, model, tools, boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp backends boot instantly because the daemon is already running. waitHealthy() now respects the caller's context deadline instead of its old hardcoded 15 s ceiling. Classifier reliability Classifier timeout bumped 2 s → 5 s for thinking-mode models like Qwen3-distilled Tiny3.5. System prompt includes /no_think directive for the same family. These help but don't eliminate small-model JSON-contract failures — see the docs section on picking a model. Probe + telemetry surfaces gnoma slm status now prints the configured backend + model + a live probe result (✓/✗) instead of just the llamafile manifest state. `gnoma router stats` already (from the previous commit) shows the classifier-source mix; with this change you can finally see slm / slm_fallback / heuristic share rise from "always heuristic" to something reflecting real SLM activity. Tests - 9 new backend-factory tests (httptest-backed Ollama probe, error paths, auto-detection, capability flags) - Tier-ordering tests cover the new "specialised small arm wins trivial task" path - Trivial-prompt heuristic tested for both halves (knowledge-only flips RequiresTools=false; debug/file/run keeps it true) Deletes the dead SLMManager field from the TUI Config — it was declared but never read.	2026-05-19 18:53:32 +02:00
vikingowl	58beb7ce3c	feat(router): classifier-source telemetry + router stats command Phase 4 routing decisions depend on knowing whether the SLM classifier is actually firing or whether the heuristic is silently doing all the work. Adds the instrumentation to make that observable. router.ClassifierSource enum (heuristic / slm / slm_fallback) is set on Task by every classifier: - HeuristicClassifier → ClassifierHeuristic - slm.Classifier → ClassifierSLM on success, ClassifierSLMFallback when the SLM call fails or returns unparseable output The source is plumbed through router.Outcome to QualityTracker, which now maintains per-source counters alongside the existing per-arm × task EMA scores. QualitySnapshot serializes both (classifier_counts is omitempty for back-compat with pre-feature quality.json files). lazyClassifier logs at INFO the first time it falls back to heuristic because the SLM hasn't booted yet — distinguishes operational fallback from an unconfigured-SLM run. slm.Manager.Start() now records elapsed-to-healthy and the main.go goroutine logs it as part of the "SLM ready" event. Confirms whether short-lived runs are racing the boot cycle. New `gnoma router stats` subcommand prints both tables (arm × task quality, classifier source breakdown) from quality.json with a Phase 4 trust hint when the data is too sparse or the SLM share is low. 6 new tests cover ClassifierSource string/enum, heuristic + SLM source propagation, QualityTracker counter round-trip, and back-compat restore from a legacy quality.json without classifier_counts.	2026-05-19 18:18:22 +02:00
vikingowl	135c8afe80	feat: various improvements to engine, router, and TUI - engine/loop: enhanced loop handling - router: dynamic model discovery and task improvements - tui: suggestion box, input mode indicator, completions enhancements	2026-05-07 22:51:50 +02:00
vikingowl	a9213ec382	feat(slm): Wave C — SLM classifier, MaxComplexity routing, CLI subcommands, TUI status - slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback, heuristic baseline blended so Priority/RequiredEffort are never zeroed, extractJSON strips markdown fences from small-model responses - router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration - router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior); filterFeasible excludes arms when task.ComplexityScore > MaxComplexity - config.SLMSection: [slm] enabled / model_url / data_dir - openaicompat.NewLlamafile: no API key, model = "default", no retries - slm.Manager: DefaultDataDir() (XDG), Manifest() accessor - cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm registered with MaxComplexity=0.3 when enabled + set up - tui: /config shows slm status (ready/missing/not set up + base URL if running) - docs: roadmap updated to reflect llamafile pivot from Ollama	2026-05-07 16:44:32 +02:00
vikingowl	7fbb5454ee	feat(router): normalize effort/thinking abstraction across providers Add EffortLevel (auto/low/medium/high) as a provider-agnostic reasoning control, replacing the Capabilities.Thinking bool. Each provider maps the level to its native parameter: Anthropic budget tokens (1K/8K/16K), OpenAI reasoning_effort (low/medium/high), Google thinking budget (1K/8K/16K). Task classification auto-infers effort from TaskType and complexity; filterFeasible excludes arms that lack the required level.	2026-05-07 14:08:50 +02:00
vikingowl	07a976c32a	fix: ClassifyTask priority ordering — orchestration below operational types Operational task types (debug, review, refactor, test, explain) now gate before orchestration in the keyword cascade. Previously, prompts like "review the orchestration layer" or "refactor the pipeline dispatch" matched "orchestrat"/"dispatch" and misclassified as TaskOrchestration. Planning is also moved below the operational types. Expanded orchestration keywords to cover common intent that the original four keywords missed: "fan out", "subtask", "delegate to", "spawn elf". Adds regression tests for false-positive cases and positive tests for new keywords.	2026-04-06 00:58:54 +02:00
vikingowl	4f1e0cf567	feat: Ollama/gemma4 compat — /init flow, stream filter, safety fixes provider/openai: - Fix doubled tool call args (argsComplete flag): Ollama sends complete args in the first streaming chunk then repeats them as delta, causing doubled JSON and 400 errors in elfs - Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep) - Add Reasoning field support for Ollama thinking output cmd/gnoma: - Early TTY detection so logger is created with correct destination before any component gets a reference to it (fixes slog WARN bleed into TUI textarea) permission: - Exempt spawn_elfs and agent tools from safety scanner: elf prompt text may legitimately mention .env/.ssh/credentials patterns and should not be blocked tui/app: - /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge (ask for plain text output) → TUI fallback write from streamBuf - looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback content before writing (reject refusals, strip narrative preambles) - Collapse thinking output to 3 lines; ctrl+o to expand (live stream and committed messages) - Stream-level filter for model pseudo-tool-call blocks: suppresses <<tool_code>>...</tool_code>> and <<function_call>>...<tool_call\|> from entering streamBuf across chunk boundaries - sanitizeAssistantText regex covers both block formats - Reset streamFilterClose at every turn start	2026-04-05 19:24:51 +02:00
vikingowl	847735a9f7	feat: add router foundation with task classification and arm selection internal/router/ — core routing layer: - Task classification: 10 types (boilerplate, generation, refactor, review, unit_test, planning, orchestration, security_review, debug, explain) with keyword heuristics and complexity scoring - Arm registry: provider+model pairs with capabilities and cost - Limit pools: shared resource budgets with scarcity multipliers, optimistic reservation, use-it-or-lose-it discounting - Heuristic selector: score = (quality × value) / effective_cost Prefers tools, thinking for planning, penalizes small models on complex tasks - Router: Select() picks best feasible arm, ForceArm() for CLI override Engine now routes through router.Select() when configured. Wired into CLI — arm registered per --provider/--model flags. 20 router tests. 173 tests total across 13 packages.	2026-04-03 14:23:15 +02:00

10 Commits