gnoma

Author	SHA1	Message	Date
vikingowl	43ea2e562d	feat(engine): two-stage tool routing for small local arms Plan A from docs/superpowers/plans/2026-05-19-post-slm-unlock.md. Small local SLMs (<=16k context) waste ~1500 tokens per turn on the full tool catalogue. Two-stage routing replaces round-1 tools with a single synthetic select_category schema; round-2+ sends only the selected category's real tool schemas plus select_category for re-selection. - internal/tool/category.go: Category type, optional Categorized interface, CategoryOf() with meta fallback. fs.read/fs.ls -> read, fs.write/fs.edit -> write, fs.glob/fs.grep -> search, bash -> exec. - internal/engine/twostage.go: synthetic select_category tool, intercept helper, per-turn selectedCategory state under e.mu. - Engine round 1 forces ToolChoiceRequired so SLMs don't fall back to prose. State resets at the top and end of every runLoop. - Activates automatically on a forced local arm with ContextWindow <=16384, or via [router].force_two_stage TOML key. - Integration test drives a 3-round trip and asserts: round 1 emits exactly one schema (synthetic) with ToolChoiceRequired, round 2 contains only write-category schemas + select_category, real fs.write executes. Invalid-category fallback round-trips back to round-1 mode.	2026-05-19 20:53:21 +02:00
vikingowl	0b4de6054d	feat(tui): surface SLM backend + per-turn classifier in status bar The TUI gave no indication that an SLM was configured or active. You'd see the primary provider on the status line and nothing else, even with [slm].enabled=true and a successfully booted backend. Two surfaces added: 1. Status-bar SLM badge. The left side of the status line gains a dim " · slm: <model> ⚙" suffix when the backend booted, " · slm: ✗" when it failed, and nothing when SLM is disabled. The ⚙ marker indicates the model advertises tool support. 2. Per-turn classifier visibility. The existing routing event already produced "routed → <arm> (task: <type>)" lines in the chat history; it now also reports which classifier made the decision, e.g. "routed → ollama/ministral-3:3b (task: explain, by: slm_fallback)". Lets you tell in real time whether the SLM is actually classifying or falling back to the keyword heuristic. Plumbing: - new tui.SLMInfo struct on tui.Config - main.go populates it after StartBackend returns - stream.Event gains RoutingClassifier; engine.runLoop fills it from task.ClassifierSource on the first round	2026-05-19 19:06:26 +02:00
vikingowl	58beb7ce3c	feat(router): classifier-source telemetry + router stats command Phase 4 routing decisions depend on knowing whether the SLM classifier is actually firing or whether the heuristic is silently doing all the work. Adds the instrumentation to make that observable. router.ClassifierSource enum (heuristic / slm / slm_fallback) is set on Task by every classifier: - HeuristicClassifier → ClassifierHeuristic - slm.Classifier → ClassifierSLM on success, ClassifierSLMFallback when the SLM call fails or returns unparseable output The source is plumbed through router.Outcome to QualityTracker, which now maintains per-source counters alongside the existing per-arm × task EMA scores. QualitySnapshot serializes both (classifier_counts is omitempty for back-compat with pre-feature quality.json files). lazyClassifier logs at INFO the first time it falls back to heuristic because the SLM hasn't booted yet — distinguishes operational fallback from an unconfigured-SLM run. slm.Manager.Start() now records elapsed-to-healthy and the main.go goroutine logs it as part of the "SLM ready" event. Confirms whether short-lived runs are racing the boot cycle. New `gnoma router stats` subcommand prints both tables (arm × task quality, classifier source breakdown) from quality.json with a Phase 4 trust hint when the data is too sparse or the SLM share is low. 6 new tests cover ClassifierSource string/enum, heuristic + SLM source propagation, QualityTracker counter round-trip, and back-compat restore from a legacy quality.json without classifier_counts.	2026-05-19 18:18:22 +02:00
vikingowl	ec9433d783	chore(lint): clear remaining errcheck and staticcheck findings Brings the project to a clean `make lint` baseline (0 issues). Mechanical: - Wrap deferred resp.Body.Close() in closures (router/discovery.go, router/probe.go) so the unchecked return surfaces as `_ = ...`. - Apply `_ = ...` (single or multi-return blank) to test-file calls that intentionally ignore errors: os.MkdirAll / os.WriteFile / os.Chdir in setup paths, Close / Shutdown in teardown, Submit / Spawn / Send / LoadDir in tests that assert on side effects. Structural: - engine.handleRequestTooLarge drops the unused req parameter and rebuilds the request from compacted history (SA4009 — argument was overwritten before first use). - provider.ClassifyHTTPStatus and google.applyCapabilityOverrides switch to tagged switches over the discriminator (QF1002). - tui.app.go MouseWheel + inputMode and cmd/gnoma main slm-status use tagged switches in place of equality chains (QF1003). - cmd/gnoma main.go merges a var decl with its immediate assignment (S1021). - Three empty-branch sites (dispatcher_test, loader_test, coordinator_test) become real assertions or get the dead `if` removed (SA9003).	2026-05-19 17:53:42 +02:00
vikingowl	397a39250c	feat(engine): early-stop detection for runaway agent loops Adds three lightweight per-turn detectors that fire corrective user messages back into the conversation when the model goes off the rails: - RepetitionDetector: sliding-window scan over streamed text deltas; trips when a 50/80/120-char pattern repeats >= 3 times in the trailing 200 chars. Breaks the active stream and injects a correction. - PatchFailureTracker: per-path counter for fs.edit/fs.write failures; trips on the 4th consecutive failure and steers the model to fs.write rather than another fs.edit on the same path. Success decrements with a floor of 0; paths are isolated. - DetectGreeting: narrow allowlist for "how can I help" style replies; only consulted after a round that used tools, so first-turn greetings don't false-positive. Detector state is per-turn (declared locally in runLoop), single- goroutine use. Corrective messages are appended as user-role text to both engine history and the context window. Telemetry: each trigger logs at INFO with round + path where applicable. Covered by 12 unit tests for the primitives and 5 loop-level integration tests that drive the full agentic loop via the existing eventStream mock.	2026-05-19 17:39:35 +02:00
vikingowl	13b2f5e14d	chore(lint): clear dead code and tighten lifecycle errcheck Removes five unused funcs/vars/fields that golangci-lint had been flagging (anthropic.toolCallDoneEvent, mistral.translateMessages, hook.newError, subprocess.vibeParser.lastAssistantMsgID, tui.cBase), two ineffectual assignments (tui/rendering.go visible-window loop, subprocess stream_test setup), and a stale if/HasPrefix that's now a strings.TrimPrefix. Wires errcheck onto every subprocess / stream lifecycle path so a failed close or shutdown is at least logged rather than silently dropped: - engine/loop.go: stream.Close on both the error and success paths - mcp/manager.go: Shutdown when StartAll partial-fails; Transport close after Initialize failure - mcp/transport.go: stdin.Close + syscall.Kill on graceful-timeout fallback - slm/download.go: Close propagated as a named-return error on the success path; explicitly discarded on the rollback path - slm/classifier.go, slm/manager.go, hook/prompt.go, context/summarize.go, config/write.go, cmd/gnoma/main.go, tool/fs/grep.go: explicit ignores or error logging on Close / Shutdown / WalkDir / Scanln Production-code errcheck and ineffassign are now zero. Remaining golangci-lint output is test-only Close-in-defer noise plus stylistic staticcheck QF suggestions, left alone.	2026-05-19 17:05:54 +02:00
vikingowl	5cd3ccd931	fix(engine): guard mutable state with a mutex Engine.history, usage, activatedTools, modelCaps, turnOpts, and cfg.Provider/Model are now mutated and read under e.mu. The lock is released across blocking provider.Stream calls so external setters (SetProvider, SetHistory, InjectMessage, etc.) can interleave. History() now returns a copy. Snapshot helpers (latestUserPrompt, historySnapshot, snapshotTurnOpts, etc.) replace the unsynchronised reads scattered through runLoop and buildRequest. Closes audit finding H4. Adds a race regression test that fails under -race before the fix and passes after.	2026-05-19 16:18:17 +02:00
vikingowl	135c8afe80	feat: various improvements to engine, router, and TUI - engine/loop: enhanced loop handling - router: dynamic model discovery and task improvements - tui: suggestion box, input mode indicator, completions enhancements	2026-05-07 22:51:50 +02:00
vikingowl	8b2202e8ec	feat(classifier): Wave A — TaskClassifier interface + HeuristicClassifier - internal/router/classifier.go: TaskClassifier interface with Classify(ctx, prompt, history) signature. HeuristicClassifier wraps the existing ClassifyTask() with zero behavior change. - engine.Config.Classifier: injectable TaskClassifier; nil defaults to HeuristicClassifier. Engine.classify() helper handles nil + error fallback transparently. - loop.go: all four router.ClassifyTask() call sites replaced with e.classify(ctx, prompt). SLMClassifier slots in without further changes to the engine.	2026-05-07 16:11:20 +02:00
vikingowl	176926924c	feat(engine): M8 cleanup — Wave B skill enforcement - Add tool.PathSensitiveTool interface (ExtractPaths); implement on all 6 fs tools - Add engine.TurnOptions.AllowedPaths: restricts tool filesystem access per skill invocation - Bash is denied outright when AllowedPaths is active (unparseable command args) - fs tools with empty path (cwd default) resolved via os.Getwd() and validated - Add engine.TurnOptions.AllowedTools + AllowedPaths wiring in pipe mode (main.go) and TUI skill dispatch (tui/app.go) - Remove TODO(M8.3) from skill.Frontmatter — enforcement is now complete	2026-05-07 15:29:33 +02:00
vikingowl	9fb520fba6	feat(engine): M8 cleanup — Wave A wiring gaps - Remove stale TODO(P0c) comment from main.go (resolved by P0c tier routing) - Wire config.Provider.Temperature → engine.Config.Temperature → provider.Request - Add WithMaxFileSize option to fs.write; wire cfg.Tools.MaxFileSize in main.go - Wire router.ReportOutcome after each runLoop return (success = err == nil) - Fix nil-callback guard on EventRouting dispatch (pre-existing bug exposed by new test)	2026-05-07 15:22:22 +02:00
vikingowl	d71bd942c4	feat: local model reliability — SDK retries, capability probing, init skill, context compaction Three compounding bugs prevented tool calling with llama.cpp: - Stream parser set argsComplete on partial JSON (e.g. "{"), dropping subsequent argument deltas — fix: use json.Valid to detect completeness - Missing tool_choice default — llama.cpp needs explicit "auto" to activate its GBNF grammar constraint; now set when tools are present - Tool names in history used internal format (fs.ls) while definitions used API format (fs_ls) — now re-sanitized in translateMessage Additional changes: - Disable SDK retries for local providers (500s are deterministic) - Dynamic capability probing via /props (llama.cpp) and /api/show (Ollama), replacing hardcoded model prefix list - Engine respects forced arm ToolUse capability when router is active - Bundled /init skill with Go template blocks, context-aware for local vs cloud models, deduplication rules against CLAUDE.md - Tool result compaction for local models — previous round results replaced with size markers to stay within small context windows - Text-only fallback when tool-parse errors occur on local models - "text-only" TUI indicator when model lacks tool support - Session ResetError for retry after stream failures - AllowedTools per-turn filtering in engine buildRequest	2026-04-13 02:01:01 +02:00
vikingowl	ce5f9d3dc9	feat(tui): Tier 3-4 UX improvements — split, routing, session naming, context bar - Split app.go (2091→1378 lines) into rendering.go, events.go, init.go - Add EventRouting stream event for router arm transparency - Add session auto-naming from first user message - Add context window progress bar in status bar - Add /keys cheatsheet, /replay for resumed sessions - Add inline cost-per-turn after assistant responses - Add diff previews in fs.write/fs.edit permission prompts - Collapse tool output to 3 lines by default (ctrl+o expands) - Use AddPrefix for system context instead of InjectMessage - Handle ContentThinking and ContentToolResult in session resume - Show session title in resume picker - Add /model numeric selection snapshot safety	2026-04-12 05:13:16 +02:00
vikingowl	c07ec63419	feat(skill): enhanced coordinator prompt with fan-out and concurrency guidance	2026-04-07 02:24:49 +02:00
vikingowl	1ec90b0ad7	feat: engine hook integration — PreToolUse, PostToolUse, Stop	2026-04-07 01:02:55 +02:00
vikingowl	8d86bc75fd	test: M7 audit — quality feedback, coordinator, agent tool coverage Quality feedback integration: TestQualityTracker_InfluencesArmSelection verifies that 5 successes vs 5 failures tips Router.Select() to the high-quality arm once EMA has enough observations. Companion test confirms heuristic fallback below minObservations. Coordinator tests expanded from 2 → 5: added guidance content check (parallel/serial/synthesize present), false-positive table extended with 7 cases including the reordered keywords from the previous fix. Agent tool suite: tool interface contracts for all four tools (Name, Description, Parameters validity, IsReadOnly). Extracted duplicated 2000-char truncation into truncateOutput() helper (format.go), removing the inline copies in agent.go and batch.go. Four boundary tests cover empty, short, exact-max, and over-max cases.	2026-04-06 00:59:12 +02:00
vikingowl	b421439087	feat: Engine.SetHistory/SetUsage/SetActivatedTools for session restore	2026-04-05 23:39:38 +02:00
vikingowl	26666e6d2c	feat: coordinator mode — system prompt injection for orchestration tasks	2026-04-05 23:07:56 +02:00
vikingowl	f7a2228765	feat: coordinator mode — system prompt injection for orchestration tasks	2026-04-05 23:06:23 +02:00
vikingowl	350b7bbe05	feat: accurate context window sizing from arm capabilities + prefix token baseline + tokenizer wiring	2026-04-05 22:26:31 +02:00
vikingowl	dae2c488e5	feat: wire persist.Store into engine, elf manager, and agent tools	2026-04-05 21:59:55 +02:00
vikingowl	4f1e0cf567	feat: Ollama/gemma4 compat — /init flow, stream filter, safety fixes provider/openai: - Fix doubled tool call args (argsComplete flag): Ollama sends complete args in the first streaming chunk then repeats them as delta, causing doubled JSON and 400 errors in elfs - Handle fs: prefix (gemma4 uses fs:grep instead of fs.grep) - Add Reasoning field support for Ollama thinking output cmd/gnoma: - Early TTY detection so logger is created with correct destination before any component gets a reference to it (fixes slog WARN bleed into TUI textarea) permission: - Exempt spawn_elfs and agent tools from safety scanner: elf prompt text may legitimately mention .env/.ssh/credentials patterns and should not be blocked tui/app: - /init retry chain: no-tool-calls → spawn_elfs nudge → write nudge (ask for plain text output) → TUI fallback write from streamBuf - looksLikeAgentsMD + extractMarkdownDoc: validate and clean fallback content before writing (reject refusals, strip narrative preambles) - Collapse thinking output to 3 lines; ctrl+o to expand (live stream and committed messages) - Stream-level filter for model pseudo-tool-call blocks: suppresses <<tool_code>>...</tool_code>> and <<function_call>>...<tool_call\|> from entering streamBuf across chunk boundaries - sanitizeAssistantText regex covers both block formats - Reset streamFilterClose at every turn start	2026-04-05 19:24:51 +02:00
vikingowl	95dfd0cf0c	feat: M1-M7 gap audit phase 3 — context prefix, deferred tools, compact hooks Gap 11 (M6): Fixed context prefix - Window.PrefixMessages stores immutable docs (CLAUDE.md, .gnoma/GNOMA.md) - Prefix stripped before compaction, prepended after — survives all compaction - AllMessages() returns prefix + history for provider requests - main.go loads CLAUDE.md and .gnoma/GNOMA.md at startup as prefix Gap 12 (M6): Deferred tool loading - DeferrableTool optional interface: ShouldDefer() bool - buildRequest() skips deferred tools until activated - Tools auto-activate on first model request (activatedTools map) - agent + spawn_elfs marked as deferrable (large schemas, rarely needed early) - Saves ~800 tokens per deferred tool per request Gap 13 (M6): Pre/post compact hooks - OnPreCompact/OnPostCompact callbacks in WindowConfig - Called in doCompact() (shared by CompactIfNeeded + ForceCompact) - M8 hooks system will extend these to full protocol	2026-04-04 20:46:50 +02:00
vikingowl	de1798ff5c	fix: M1-M7 gap audit phase 1 — bug fix + 5 quick wins Bug fix: - window.go: token ratio after compaction used len(w.messages) after reassignment, always producing ratio ~1.0. Fixed by saving original length before assignment. Gap 1 (M3): Scanner patterns 13 → 47 - Added 34 new patterns: Azure, DigitalOcean, HuggingFace, Grafana, GitHub extended (app/oauth/refresh), Shopify, Twilio, SendGrid, NPM, PyPI, Databricks, Pulumi, Postman, Sentry, Anthropic admin, OpenAI extended, Vault, Supabase, Telegram, Discord, JWT, Heroku, Mailgun, Figma Gap 2 (M3): Config security section - SecuritySection with EntropyThreshold + custom PatternConfig - Wire custom patterns from TOML into scanner at startup Gap 3 (M4): Polling discovery loop - StartDiscoveryLoop with 30s ticker, reconciles arms vs discovered - Router.RemoveArm for disappeared local models Gap 4 (M5): Incognito LocalOnly enforcement - Router.SetLocalOnly filters non-local arms in Select() - TUI incognito toggle (Ctrl+X, /incognito) sets local-only routing Gap 5 (M6): Reactive 413 compaction - Window.ForceCompact() bypasses ShouldCompact threshold - Engine handles 413 with emergency compact + retry	2026-04-03 23:11:08 +02:00
vikingowl	6aea2a9e3a	fix: retry with exponential backoff on 429, stagger elf spawns Engine retries transient errors (429, 5xx) up to 4 times with 1s/2s/4s/8s backoff. Respects Retry-After header from provider. Batch tool staggers elf spawns by 300ms to avoid rate limit bursts when all elfs hit the API simultaneously (Mistral's 1 req/s limit).	2026-04-03 21:08:20 +02:00
vikingowl	13db7521b1	feat: M7 Elfs — sub-agents with router-integrated spawning internal/elf/: - BackgroundElf: runs on own goroutine with independent engine, history, and provider. No shared mutable state. - Manager: spawns elfs via router.Select() (picks best arm per task type), tracks lifecycle, WaitAll(), CancelAll(), Cleanup(). internal/tool/agent/: - Agent tool: LLM can call 'agent' to spawn sub-agents. Supports task_type hint for routing, wait/background mode. 5-minute timeout, context cancellation propagated. Concurrent tool execution: - Read-only tools (fs.read, fs.grep, fs.glob, etc.) execute in parallel via goroutines. - Write tools (bash, fs.write, fs.edit) execute sequentially. - Partition by tool.IsReadOnly(). TUI: /elf command explains how to use sub-agents. 5 elf tests. Exit criteria: parent spawns 3 background elfs on different providers, collects and synthesizes results.	2026-04-03 19:16:46 +02:00
vikingowl	63f4c1389e	feat: M6 complete — summarize strategy + tool result persistence SummarizeStrategy: calls LLM to condense older messages into a summary, preserving key decisions, file changes, tool outputs. Falls back to truncation on failure. Keeps 6 recent messages. Tool result persistence: outputs >50K chars saved to disk at .gnoma/sessions/tool-results/{id}.txt with 2K preview inline. TUI: /compact command for manual compaction, /clear now resets engine history. Summarize strategy used by default (with truncation fallback).	2026-04-03 18:51:28 +02:00
vikingowl	704f3a7302	feat: M6 context intelligence — token tracker + truncation compaction internal/context/: - Tracker: monitors token usage with OK/Warning/Critical states (thresholds from CC: 20K warning buffer, 13K autocompact buffer) - TruncateStrategy: drops oldest messages, preserves system prompt + recent N turns, adds compaction boundary marker - Window: manages message history with auto-compaction trigger, circuit breaker after 3 consecutive failures Engine integration: - Context window tracks usage per turn - Auto-compacts when critical threshold reached - History syncs with context window after compaction TUI status bar: - Token count with percentage (tokens: 1234 (5%)) - Color-coded: green=ok, yellow=warning, red=critical Session Status extended: TokensMax, TokenPercent, TokenState. 7 context tests.	2026-04-03 18:46:03 +02:00
vikingowl	9a78af7b05	feat: inject mode changes into engine conversation history Engine.InjectMessage() appends messages to history without triggering a turn. When permission mode or incognito changes, the notification is injected as a user+assistant pair so the model sees it as context. Fixes: model now knows permissions changed and will retry tool calls instead of remembering old denials from previous mode.	2026-04-03 16:42:52 +02:00
vikingowl	24b5126d66	feat: wire permission checker into engine tool execution Tools now go through permission.Checker before executing: - plan mode: denies all writes (fs.write, bash), allows reads - bypass mode: allows all (deny rules still enforced) - default mode: prompts user (pipe: stdin prompt, TUI: auto-approve for now) - accept_edits: auto-allows file ops, prompts for bash - deny mode: denies all without allow rules CLI flags: --permission <mode>, --incognito Pipe mode: console Y/N prompt on stderr TUI mode: auto-approve (proper overlay TODO) Verified: plan mode correctly blocks fs.write, model sees error.	2026-04-03 16:15:41 +02:00
vikingowl	5b14b0ac84	fix: TUI overflow, scrollable header, tool output, git branch - Fixed: chat content no longer overflows past allocated height. Lines are measured for physical width and hard-truncated to exactly the chat area height. Input + status bar always visible. - Header scrolls with chat (not pinned), only input/status fixed - Git branch in status bar (green, via git rev-parse) - Alt screen mode — terminal scrollback disabled - Mouse wheel + PgUp/PgDown scroll within TUI - New EventToolResult: tool output as dimmed indented block - Separator lines above/below input, no status bar backgrounds	2026-04-03 15:53:42 +02:00
vikingowl	847735a9f7	feat: add router foundation with task classification and arm selection internal/router/ — core routing layer: - Task classification: 10 types (boilerplate, generation, refactor, review, unit_test, planning, orchestration, security_review, debug, explain) with keyword heuristics and complexity scoring - Arm registry: provider+model pairs with capabilities and cost - Limit pools: shared resource budgets with scarcity multipliers, optimistic reservation, use-it-or-lose-it discounting - Heuristic selector: score = (quality × value) / effective_cost Prefers tools, thinking for planning, penalizes small models on complex tasks - Router: Select() picks best feasible arm, ForceArm() for CLI override Engine now routes through router.Select() when configured. Wired into CLI — arm registered per --provider/--model flags. 20 router tests. 173 tests total across 13 packages.	2026-04-03 14:23:15 +02:00
vikingowl	09f102bdec	feat: add security firewall with secret scanning and incognito mode internal/security/ — core security layer baked into gnoma: - Secret scanner: gitleaks-derived regex patterns (Anthropic, OpenAI, AWS, GitHub, GitLab, Slack, Stripe, private keys, DB URLs, generic secrets) + Shannon entropy detection for unknown formats - Redactor: replaces matched secrets with [REDACTED], merges overlapping ranges, preserves surrounding context - Unicode sanitizer: NFKC normalization, strips Cf/Co categories, tag characters (ASCII smuggling), zero-width chars, RTL overrides - Incognito mode: suppresses persistence, learning, content logging - Firewall: wraps engine, scans outgoing messages + system prompt + tool results before they reach the provider Wired into engine and CLI. 21 security tests.	2026-04-03 14:07:50 +02:00
vikingowl	69f5dba091	feat: complete M1 — core engine with Mistral provider Mistral provider adapter with streaming, tool calls (single-chunk pattern), stop reason inference, model listing, capabilities, and JSON output support. Tool system: bash (7 security checks, shell alias harvesting for bash/zsh/fish), file ops (read, write, edit, glob, grep, ls). Alias harvesting collects 300+ aliases from user's shell config. Engine agentic loop: stream → tool execution → re-query → until done. Tool gating on model capabilities. Max turns safety limit. CLI pipe mode: echo "prompt" \| gnoma streams response to stdout. Flags: --provider, --model, --system, --api-key, --max-turns, --verbose, --version. Provider interface expanded: Models(), DefaultModel(), Capabilities (ToolUse, JSONOutput, Vision, Thinking, ContextWindow, MaxOutput), ResponseFormat with JSON schema support. Live verified: text streaming + tool calling with devstral-small. 117 tests across 8 packages, 10MB binary.	2026-04-03 12:01:55 +02:00

34 Commits