Commit Graph

135 Commits

Author SHA1 Message Date
vikingowl 635dad660c feat(config): per-profile config layering with --profile flag (Phase C-1)
Adds opt-in user profiles for swapping API keys, CLI binaries, and
permission modes between contexts (work/private/experiment/...).

Profile mode engages only when ~/.config/gnoma/profiles/ exists, so
existing single-config installations are untouched. Selection order:
--profile flag → default_profile in base config → fatal error.

Layering: defaults → ~/.config/gnoma/config.toml → profiles/<name>.toml
→ <projectRoot>/.gnoma/config.toml → env. Map sections merge per-key;
[[arms]] and [[mcp_servers]] merge by id/name; [[hooks]] appends.

Per-profile data: quality-<name>.json and sessions/<name>/ keep the
bandit and session list from cross-contaminating between profiles.

Profile names restricted to [A-Za-z0-9_-] to block --profile=../foo
path traversal into derived paths.
2026-05-19 21:35:33 +02:00
vikingowl 0aabd19906 feat(router): per-arm strengths + cost weight (Phase D)
Plan D from docs/superpowers/plans/2026-05-19-post-slm-unlock.md
(static portion; dynamic bandit-driven promotion deferred to D-2).

Routing previously let tier ordering (CLI > local > API) dominate
selection — Opus, in tier 3, would lose to a tier-1 CLI agent for
SecurityReview even though Opus is empirically stronger at that task.
This change introduces explicit per-arm overrides:

  [[arms]]
  id = "anthropic/claude-opus-4-7"
  strengths = ["security_review", "planning"]
  cost_weight = 0.3

Strengths gate cross-tier promotion: arms matching task.Type bypass
the tier loop and compete with each other directly. Promotion is a
preference, not a pin — if no strength-tagged arm is feasible
(backoff, pool capacity, tool support), selection falls through to
the default tier order.

CostWeight linearly dampens the cost penalty in scoreArm via
  effectiveCost = 1 + CostWeight * (cost - 1)
CostWeight=1.0 (or unset) preserves current behavior; lower values
trade cheapness for quality. The earlier draft used cost^CostWeight
which inverts direction for sub-1 local-arm costs (raising a
fraction <1 to a fractional power makes it bigger, not smaller); a
monotonicity regression test prevents that drift.

- internal/router/arm.go: Strengths []TaskType, CostWeight float64,
  HasStrength(), ResolvedCostWeight() (zero → 1.0).
- internal/router/selector.go: scoreArm strength bonus const
  (strengthScoreBonus = 0.15) + linear cost dampening; selectBest
  cross-tier promotion before tier loop.
- internal/router/router.go: ArmOverride type + ApplyArmOverrides()
  returns unknown IDs; unknown strength names skipped with per-name
  warning via slog.
- internal/router/task.go: ParseTaskTypeStrict() returns ok bool;
  ParseTaskType now delegates so the two switches stay in sync.
- internal/config/config.go: ArmConfig + [[arms]] TOML wiring.
- cmd/gnoma/main.go: applies overrides after all initial arms
  register; logs a warning when an [[arms]] id has no matching
  registered arm.

Tests cover: predicate helpers, scoring direction across two arms,
linear-formula monotonicity on both sides of cost=1, cross-tier
promotion, empty-Strengths preserves tier order, promoted arm in
backoff falls through via full Router.Select path, observed-quality
tiebreak between two strength-tagged arms, ApplyArmOverrides happy
path + unknown-ID reporting + unknown-strength skipping.
2026-05-19 21:14:45 +02:00
vikingowl b331dcd61a feat(subprocess): per-agent binary override via [cli_agents] config
Plan B from docs/superpowers/plans/2026-05-19-post-slm-unlock.md.

Users with aliased CLI binaries (claude-priv, claude-work,
gemini-personal) can now point gnoma's auto-discovery at them
without renaming. The override flows through to the actual subprocess
spawn at internal/provider/subprocess/provider.go:56, so routing
through the alias is functional, not cosmetic.

Config:
  [cli_agents]
  claude = "claude-priv"   # discovery uses claude-priv instead of claude
  gemini = ""              # empty value = no override (fall back to canonical)
  # vibe is absent = canonical name used

- internal/config/config.go: CLIAgentsSection map[string]string;
  TOML [cli_agents] key.
- internal/provider/subprocess/agent.go:
  - Package-level lookPath = exec.LookPath for test injection.
  - resolveAgentBinary(canonical, override) → (path, binName, err).
    Override='' falls back to canonical. Override set but missing from
    PATH returns an error (no silent fallback — masks user typos).
  - DiscoveredAgent.OverrideBinary records the override binary name
    when one was used; empty otherwise.
  - DiscoverCLIAgents(ctx, overrides) signature; warning logged when
    an override is configured but the binary isn't on PATH.
- cmd/gnoma/main.go: both call sites pass cfg.CLIAgents. The
  `gnoma providers` listing renders `claude-priv (via [cli_agents].claude)`
  when an override is in effect.

Tests cover: 5 resolver cases (no override, override set, empty
override falls back, override missing, canonical missing); 4
discovery cases (no overrides, override resolves alias, empty value
falls back, override missing skips agent); 2 config round-trip cases.
2026-05-19 21:02:16 +02:00
vikingowl 342b3903e1 test(slm): align HappyPath with task-type complexity floor
The Debug floor (0.4) added in eb0583f was bumping the SLM-returned
0.25 up, breaking the HappyPath assertion. Bump the SLM value to 0.55
so the test still verifies "SLM value preserved" (its original
intent), and add a dedicated TestClassifier_AppliesTaskTypeFloor that
exercises the under-reporting case the floor was added to handle.
2026-05-19 20:54:27 +02:00
vikingowl 43ea2e562d feat(engine): two-stage tool routing for small local arms
Plan A from docs/superpowers/plans/2026-05-19-post-slm-unlock.md.

Small local SLMs (<=16k context) waste ~1500 tokens per turn on the
full tool catalogue. Two-stage routing replaces round-1 tools with a
single synthetic select_category schema; round-2+ sends only the
selected category's real tool schemas plus select_category for
re-selection.

- internal/tool/category.go: Category type, optional Categorized
  interface, CategoryOf() with meta fallback. fs.read/fs.ls -> read,
  fs.write/fs.edit -> write, fs.glob/fs.grep -> search, bash -> exec.
- internal/engine/twostage.go: synthetic select_category tool,
  intercept helper, per-turn selectedCategory state under e.mu.
- Engine round 1 forces ToolChoiceRequired so SLMs don't fall back to
  prose. State resets at the top and end of every runLoop.
- Activates automatically on a forced local arm with ContextWindow
  <=16384, or via [router].force_two_stage TOML key.
- Integration test drives a 3-round trip and asserts: round 1 emits
  exactly one schema (synthetic) with ToolChoiceRequired, round 2
  contains only write-category schemas + select_category, real
  fs.write executes. Invalid-category fallback round-trips back to
  round-1 mode.
2026-05-19 20:53:21 +02:00
vikingowl eb0583f606 fix(router): unpin config-default provider + complexity floor by task type
Two routing bugs were keeping the SLM out of every real prompt and,
once it was eligible, pulling complex tasks into it as well.

Bug 1: ForceArm was called unconditionally when a primary provider was
configured (cmd/gnoma/main.go:378). That short-circuited the entire
router — every prompt went straight to whatever was set as
[provider].default, regardless of tier, score, or feasibility. The SLM
arm appeared in `gnoma router stats` registration logs but had zero
observations after dozens of prompts.

Fix: only pin when the user passed --provider on the command line.
Config defaults register the arm but don't force it; the router picks
freely. Verified end-to-end — trivial prompts now reach slm/ollama
via the tier-0 priority.

Bug 2: A short prompt like "refactor the SLM module" classifies as
TaskRefactor with complexity 0.015 — well under the SLM arm's 0.3
ceiling. The arm became eligible despite the task being inherently
non-trivial. Once eligible, tier-0 priority then pulled it in over
the CLI agents.

Fix: add MinComplexityForType, applied in both ClassifyTask
(heuristic path) and slm.Classifier.Classify (SLM-overlay path). The
floor is per-task-type:

  - TaskSecurityReview, TaskOrchestration  → 0.60
  - TaskRefactor, TaskPlanning, TaskDebug  → 0.40
  - TaskUnitTest, TaskReview               → 0.35

Tasks like Explain/Generation/Boilerplate keep their organic
complexity score so trivial knowledge prompts (≤0.15) still fall to
the SLM. Tasks that imply existing code or multi-step reasoning are
clamped above the SLM's MaxComplexity, naturally routing them to a
bigger arm.

After both fixes, observed routing in a clean run:

  What is 2+2?              → slm/ollama (complexity 0.015)
  Define a closure          → slm/ollama (complexity 0.015)
  What is HTTP?             → slm/ollama (complexity 0.015)
  Refactor the SLM module   → subprocess/gemini (complexity 0.40)
  Audit for race conditions → subprocess/gemini (complexity 0.35)
  Plan a migration          → subprocess/gemini (complexity 0.40)
2026-05-19 19:22:16 +02:00
vikingowl 0b4de6054d feat(tui): surface SLM backend + per-turn classifier in status bar
The TUI gave no indication that an SLM was configured or active.
You'd see the primary provider on the status line and nothing else,
even with [slm].enabled=true and a successfully booted backend.

Two surfaces added:

1. Status-bar SLM badge. The left side of the status line gains a
   dim " · slm: <model> ⚙" suffix when the backend booted, " · slm: ✗"
   when it failed, and nothing when SLM is disabled. The ⚙ marker
   indicates the model advertises tool support.

2. Per-turn classifier visibility. The existing routing event already
   produced "routed → <arm> (task: <type>)" lines in the chat history;
   it now also reports which classifier made the decision, e.g.
   "routed → ollama/ministral-3:3b (task: explain, by: slm_fallback)".
   Lets you tell in real time whether the SLM is actually classifying
   or falling back to the keyword heuristic.

Plumbing:
  - new tui.SLMInfo struct on tui.Config
  - main.go populates it after StartBackend returns
  - stream.Event gains RoutingClassifier; engine.runLoop fills it from
    task.ClassifierSource on the first round
2026-05-19 19:06:26 +02:00
vikingowl a14fe8b504 feat(slm): pluggable backends + trivial-prompt routing
The SLM had two intended jobs — classify every prompt and execute the
small ones itself — but in practice three independent gates kept it
out of nearly all real work:

  1. llamafile cold-start blocked pipe-mode runs (always faster than
     the 15 s health check)
  2. ClassifyTask defaulted RequiresTools=true, excluding the SLM arm
     (ToolUse=false) from 9/10 task types
  3. armTier hard-coded CLI agents > local > API, so even when the SLM
     arm was feasible a CLI agent won

Each gate is addressed below. The result is an SLM that actually does
its job — small stuff stays local, complex stuff routes up — gated by
arm capability rather than by accidents of the boot order.

Backend layer (the bigger change)

The original implementation hard-coded llamafile. That's fine if you
have nothing else, but most users with a local model setup already run
Ollama or llama.cpp. The new factory at internal/slm/backend.go picks
between:

  - ollama (any local Ollama daemon)
  - llamacpp (any llama.cpp server)
  - llamafile (gnoma-managed, current behaviour)
  - openaicompat (LM Studio, vLLM, remote API)
  - auto (probes in order, picks first reachable)
  - disabled

[slm].backend in config.toml selects which. Documented in
docs/slm-backends.md with copy-paste presets for each. The factory
probes the underlying model's actual capabilities (Ollama /api/show,
llama.cpp /props) and sets the SLM arm's ToolUse accordingly — so the
arm picks up simple file-read style tasks on tool-capable models and
stays knowledge-only on completion-only models.

Trivial-prompt heuristic (Gate 2)

ClassifyTask now flips RequiresTools=false for short, low-complexity
prompts whose task type doesn't imply existing code (Explain,
Generation, Boilerplate). Tool-needing tokens (read, write, run, test,
file, …) keep RequiresTools=true even when the prompt is brief.

Complexity-aware tier ordering (Gate 3)

armTier takes a Task and returns tier 0 for arms whose MaxComplexity
ceiling fits the task. CLI agents drop to tier 1, local to 2, API to 3.
For trivial tasks the SLM arm wins; for complex tasks the SLM falls
out of the feasible set (MaxComplexity exclusion) and the original
ordering reasserts.

Eager boot with user-facing wait (Gate 1)

Removed the original goroutine-only path. SLM startup now blocks
synchronously inside the factory; for llamafile that means up to
[slm].startup_timeout (default 5 s) of waiting on the first
invocation, with "Starting SLM…" → "SLM ready (backend, model, tools,
boot=N)" / "SLM unavailable: …" messages on stderr. Ollama / llamacpp
backends boot instantly because the daemon is already running.

waitHealthy() now respects the caller's context deadline instead of
its old hardcoded 15 s ceiling.

Classifier reliability

Classifier timeout bumped 2 s → 5 s for thinking-mode models like
Qwen3-distilled Tiny3.5. System prompt includes /no_think directive
for the same family. These help but don't eliminate small-model
JSON-contract failures — see the docs section on picking a model.

Probe + telemetry surfaces

gnoma slm status now prints the configured backend + model + a live
probe result (✓/✗) instead of just the llamafile manifest state.

`gnoma router stats` already (from the previous commit) shows the
classifier-source mix; with this change you can finally see slm /
slm_fallback / heuristic share rise from "always heuristic" to
something reflecting real SLM activity.

Tests

  - 9 new backend-factory tests (httptest-backed Ollama probe, error
    paths, auto-detection, capability flags)
  - Tier-ordering tests cover the new "specialised small arm wins
    trivial task" path
  - Trivial-prompt heuristic tested for both halves (knowledge-only
    flips RequiresTools=false; debug/file/run keeps it true)

Deletes the dead SLMManager field from the TUI Config — it was
declared but never read.
2026-05-19 18:53:32 +02:00
vikingowl 58beb7ce3c feat(router): classifier-source telemetry + router stats command
Phase 4 routing decisions depend on knowing whether the SLM classifier
is actually firing or whether the heuristic is silently doing all the
work. Adds the instrumentation to make that observable.

router.ClassifierSource enum (heuristic / slm / slm_fallback) is set
on Task by every classifier:
- HeuristicClassifier → ClassifierHeuristic
- slm.Classifier → ClassifierSLM on success, ClassifierSLMFallback when
  the SLM call fails or returns unparseable output

The source is plumbed through router.Outcome to QualityTracker, which
now maintains per-source counters alongside the existing per-arm × task
EMA scores. QualitySnapshot serializes both (classifier_counts is
omitempty for back-compat with pre-feature quality.json files).

lazyClassifier logs at INFO the first time it falls back to heuristic
because the SLM hasn't booted yet — distinguishes operational fallback
from an unconfigured-SLM run.

slm.Manager.Start() now records elapsed-to-healthy and the main.go
goroutine logs it as part of the "SLM ready" event. Confirms whether
short-lived runs are racing the boot cycle.

New `gnoma router stats` subcommand prints both tables (arm × task
quality, classifier source breakdown) from quality.json with a Phase 4
trust hint when the data is too sparse or the SLM share is low.

6 new tests cover ClassifierSource string/enum, heuristic + SLM source
propagation, QualityTracker counter round-trip, and back-compat
restore from a legacy quality.json without classifier_counts.
2026-05-19 18:18:22 +02:00
vikingowl 9388479b03 feat(openai): lexical repair for malformed tool-call arguments
Local-model servers (Ollama, llama.cpp, llamafile) routed through the
OpenAI-compatible path frequently emit tool-call arguments that are
*almost* valid JSON — wrapped in markdown fences, padded with prose, or
trailing a stray comma. Strict parsing fails, the engine receives empty
args, and the agent loop has to retry or escalate.

Adds repairArgs(raw) at the EventToolCallDone boundary: strict-parse
first, then apply cheap lexical fixes (strip ```json fences, drop
trailing commas before }/], extract the first balanced {...} block with
proper string/escape awareness). On success, the repaired bytes flow
through unchanged; on failure, the original is returned and downstream
parsing surfaces the error as before.

Frontier providers (OpenAI proper, Anthropic, Mistral, Google) are
unaffected — their SDKs return structured args that pass strict parse.
The repair only does work when the upstream output is malformed.

11 unit tests cover: valid passthrough, empty, trailing commas,
single/double-line fences, prose-wrapped, braces-inside-strings,
multiple top-level objects (takes the first), and unrepairable input.
A stream-level test verifies the wiring through flushNextToolCall.
2026-05-19 17:59:05 +02:00
vikingowl ec9433d783 chore(lint): clear remaining errcheck and staticcheck findings
Brings the project to a clean `make lint` baseline (0 issues).

Mechanical:
- Wrap deferred resp.Body.Close() in closures (router/discovery.go,
  router/probe.go) so the unchecked return surfaces as `_ = ...`.
- Apply `_ = ...` (single or multi-return blank) to test-file calls
  that intentionally ignore errors: os.MkdirAll / os.WriteFile / os.Chdir
  in setup paths, Close / Shutdown in teardown, Submit / Spawn / Send /
  LoadDir in tests that assert on side effects.

Structural:
- engine.handleRequestTooLarge drops the unused req parameter and
  rebuilds the request from compacted history (SA4009 — argument was
  overwritten before first use).
- provider.ClassifyHTTPStatus and google.applyCapabilityOverrides switch
  to tagged switches over the discriminator (QF1002).
- tui.app.go MouseWheel + inputMode and cmd/gnoma main slm-status use
  tagged switches in place of equality chains (QF1003).
- cmd/gnoma main.go merges a var decl with its immediate assignment
  (S1021).
- Three empty-branch sites (dispatcher_test, loader_test,
  coordinator_test) become real assertions or get the dead `if` removed
  (SA9003).
2026-05-19 17:53:42 +02:00
vikingowl 397a39250c feat(engine): early-stop detection for runaway agent loops
Adds three lightweight per-turn detectors that fire corrective user
messages back into the conversation when the model goes off the rails:

- RepetitionDetector: sliding-window scan over streamed text deltas;
  trips when a 50/80/120-char pattern repeats >= 3 times in the trailing
  200 chars. Breaks the active stream and injects a correction.
- PatchFailureTracker: per-path counter for fs.edit/fs.write failures;
  trips on the 4th consecutive failure and steers the model to fs.write
  rather than another fs.edit on the same path. Success decrements with
  a floor of 0; paths are isolated.
- DetectGreeting: narrow allowlist for "how can I help" style replies;
  only consulted after a round that used tools, so first-turn greetings
  don't false-positive.

Detector state is per-turn (declared locally in runLoop), single-
goroutine use. Corrective messages are appended as user-role text to
both engine history and the context window. Telemetry: each trigger
logs at INFO with round + path where applicable.

Covered by 12 unit tests for the primitives and 5 loop-level integration
tests that drive the full agentic loop via the existing eventStream
mock.
2026-05-19 17:39:35 +02:00
vikingowl 13b2f5e14d chore(lint): clear dead code and tighten lifecycle errcheck
Removes five unused funcs/vars/fields that golangci-lint had been
flagging (anthropic.toolCallDoneEvent, mistral.translateMessages,
hook.newError, subprocess.vibeParser.lastAssistantMsgID, tui.cBase),
two ineffectual assignments (tui/rendering.go visible-window loop,
subprocess stream_test setup), and a stale if/HasPrefix that's now a
strings.TrimPrefix.

Wires errcheck onto every subprocess / stream lifecycle path so a
failed close or shutdown is at least logged rather than silently
dropped:

- engine/loop.go: stream.Close on both the error and success paths
- mcp/manager.go: Shutdown when StartAll partial-fails; Transport
  close after Initialize failure
- mcp/transport.go: stdin.Close + syscall.Kill on graceful-timeout
  fallback
- slm/download.go: Close propagated as a named-return error on the
  success path; explicitly discarded on the rollback path
- slm/classifier.go, slm/manager.go, hook/prompt.go, context/summarize.go,
  config/write.go, cmd/gnoma/main.go, tool/fs/grep.go: explicit
  ignores or error logging on Close / Shutdown / WalkDir / Scanln

Production-code errcheck and ineffassign are now zero. Remaining
golangci-lint output is test-only Close-in-defer noise plus
stylistic staticcheck QF suggestions, left alone.
2026-05-19 17:05:54 +02:00
vikingowl bb7892c0c2 chore(audit): polish remaining audit findings (M2, H1, H3)
- M2: stop echoing the matched pattern name in the user-visible
  [BLOCKED: ...] message returned by the firewall. The pattern (and
  the matched secret class) still appear in the operator log, but the
  string sent back into the prompt is now generic.

- H1: document Rule.Pattern semantics on the Rule type and pin them
  with a regression test. Pattern is a case-sensitive, exact substring
  match against the JSON-serialised tool arguments — not a glob,
  regex, or whitespace-insensitive match. The new test exercises both
  matches and the documented gotchas (double-space, case drift, tab).

- H3: every code path in CommandExecutor.Execute that converts a hook
  failure into Allow via FailOpen now emits a WARN naming the hook
  and the failure mode (timeout / launch_error / parse_error), so
  chronic hook failure or abuse is visible in operator logs.

Also tightens errcheck on permission/rule.go (Printer.Print on a
strings.Builder cannot error in practice; make the intent explicit).
2026-05-19 17:05:39 +02:00
vikingowl dc438ea181 feat(plugin): trust-on-first-use manifest pinning
Plugins are now verified against ~/.config/gnoma/plugins.pins.toml at
load time. Each plugin's plugin.json bytes are hashed (SHA-256) and:

- recorded automatically on first load (TOFU) with a prominent warning
- compared on subsequent loads
- refused with a clear error if the hash drifted, without overwriting
  the pin so the user can review and re-enrol deliberately

Pin-store I/O failures degrade to load-without-pinning rather than
locking the user out of previously-trusted plugins.

Closes audit finding C2. See ADR-003 for the decision rationale and
docs/plugins-trust.md for the end-user trust model.
2026-05-19 16:44:09 +02:00
vikingowl c44db99b41 fix(hook): execute hook Exec as a binary, not via sh -c
Plugin loader resolves HookSpec.Exec as a relative path joined to the
plugin directory, and manifest.checkSafePath rejects absolute paths and
'..' traversal — Exec was always meant to be an executable path.

The hook executor was wrapping it in 'sh -c', adding a redundant shell
interpretation step that turned any space, quote, or metacharacter in
the path into command-injection surface. Switch to exec.Command(path)
with no shell wrapping.

Closes audit finding C3. Adds a regression test that fails under the
old 'sh -c' code path: a canary file created via shell sequencing
remains absent when the executor treats Exec as a literal filename.

Hook command tests now write small /bin/sh scripts to t.TempDir and
point Exec at those — matching production semantics (resolved binary
path) rather than inline shell strings.
2026-05-19 16:30:23 +02:00
vikingowl beaa09a154 test(permission): lock in elf safety-pattern inheritance
Audit finding H2 hypothesised that spawn_elfs/agent's safetyCheck
exemption could be reached as a bypass route if the spawned elf failed
to enforce the same patterns. Verified by inspection that:

1. WithDenyPrompt copies safetyDenyPatterns into the elf checker.
2. Check() runs safetyCheck (Step 2) before ModeBypass (Step 3),
   so bypass cannot skip safety.
3. main.go always passes the parent permChecker to the elf Manager.

H2 is not exploitable in current code. This test pins the contract so
future refactors of WithDenyPrompt cannot silently drop pattern
inheritance.
2026-05-19 16:21:53 +02:00
vikingowl 2bf700eec2 test(elf): make mockProvider.calls atomic
Race detector flagged concurrent access to mockProvider.calls during
TestManager_SpawnAndList and TestManager_WaitAll, where multiple spawned
engines share the same mock. Switch to atomic.Int64.

Closes audit finding L1. `go test -race ./...` is now fully green.
2026-05-19 16:19:40 +02:00
vikingowl 5cd3ccd931 fix(engine): guard mutable state with a mutex
Engine.history, usage, activatedTools, modelCaps, turnOpts, and
cfg.Provider/Model are now mutated and read under e.mu. The lock is
released across blocking provider.Stream calls so external setters
(SetProvider, SetHistory, InjectMessage, etc.) can interleave.

History() now returns a copy. Snapshot helpers (latestUserPrompt,
historySnapshot, snapshotTurnOpts, etc.) replace the unsynchronised
reads scattered through runLoop and buildRequest.

Closes audit finding H4. Adds a race regression test that fails under
-race before the fix and passes after.
2026-05-19 16:18:17 +02:00
vikingowl b60aa02bfd feat(fs): enforce workspace boundary on fs tools
Adds a Guard that resolves every path against an allowlist of absolute
roots (default: cwd) and rejects anything escaping via relative segments,
absolute paths outside the root, or symlinks (including symlinked
parents on writes).

Closes audit finding C1: fs.read/fs.write/fs.edit/fs.glob/fs.grep/fs.ls
previously accepted any absolute path; the only protection was a
substring denylist (.env, .ssh/, ...) which missed /etc/shadow, kube
configs, IDE secrets, and anything reachable via symlink.
2026-05-19 16:07:29 +02:00
vikingowl 135c8afe80 feat: various improvements to engine, router, and TUI
- engine/loop: enhanced loop handling
- router: dynamic model discovery and task improvements
- tui: suggestion box, input mode indicator, completions enhancements
2026-05-07 22:51:50 +02:00
vikingowl 0d2d825e52 feat: add dynamic model discovery within providers
- OpenAI provider: use Models.ListAutoPaging() to discover available models
- Anthropic provider: use Models.ListAutoPaging() to discover available models
- Google provider: use Models.All() iterator to discover available models
- All providers fall back to hardcoded lists if API calls fail
- Add capability inference functions for each provider based on model ID
- Add tests for model discovery fallback behavior

This enables gnoma to dynamically discover new models as they become available
from cloud providers, while maintaining backward compatibility with fallback
lists for offline use or API failures.
2026-05-07 22:27:24 +02:00
vikingowl befcbdcfef feat(tui): suggestion box above input, input mode indicator, ! execute
- Suggestion dropdown now renders between separator and input (not in
  chat area) — no more box at the top of an empty chat
- Ghost text suppressed when dropdown is visible (eliminates the
  'fig' / trailing text on the right)
- Bottom separator shows purple 'cmd' label when typing '/' and
  yellow 'exec' label when typing '!'
- '! <cmd>' prefix executes a raw shell command inline and shows
  output in the chat (same as /shell but one-shot)
2026-05-07 17:35:45 +02:00
vikingowl d2139c6f0c perf+feat: parallel startup discovery + slash-command suggestion dropdown
Startup: HarvestAliases, HarvestInventory, DiscoverCLIAgents, and
DiscoverLocalModels now run concurrently. Worst case latency drops
from sum(all) to max(all) — eliminates the 15s inventory timeout
from blocking the main path.

TUI: typing '/co' now shows a bordered dropdown of all matching
commands with descriptions. ↑↓ navigate, Tab/Enter accepts the
highlighted entry, Esc dismisses. Ghost-text still works for
unique unambiguous matches.
2026-05-07 17:30:16 +02:00
vikingowl f8867f5d78 feat(tui): /config opens interactive settings panel
Replaces the text dump with a navigable bordered overlay.
↑↓ to move, Enter to cycle/toggle values, Esc to close.
Shows: Model (cycles through discovered arms), Permission mode,
Incognito toggle.
2026-05-07 17:23:43 +02:00
vikingowl 9037a0d195 fix(slm): skip re-download when already set up
Setup() now returns early if Status() == StatusReady.
CLI also prints the existing path/size instead of starting a download.
2026-05-07 17:10:16 +02:00
vikingowl 329610209a fix(slm): invoke llamafile via sh to bypass Wine binfmt_misc
APE polyglot binaries start with MZ magic bytes which Wine's
binfmt_misc rule intercepts on Linux. llamafile is also a valid
POSIX shell script; running it via 'sh' bypasses the kernel's
binfmt_misc lookup entirely.
2026-05-07 17:08:52 +02:00
vikingowl 0a1730943f fix: provider-agnostic startup + slm setup auto-config
Remove the hardcoded mistral default so gnoma starts without any
provider configured. TUI mode uses a stubProvider that lets CLI agent
arms (claude, gemini, etc.) handle routing; pipe mode prints a clear
setup message.

Also: gnoma slm setup now auto-writes the default model_url to the
global config when none is set, instead of erroring.
2026-05-07 17:05:06 +02:00
vikingowl a9213ec382 feat(slm): Wave C — SLM classifier, MaxComplexity routing, CLI subcommands, TUI status
- slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback,
  heuristic baseline blended so Priority/RequiredEffort are never zeroed,
  extractJSON strips markdown fences from small-model responses
- router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration
- router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior);
  filterFeasible excludes arms when task.ComplexityScore > MaxComplexity
- config.SLMSection: [slm] enabled / model_url / data_dir
- openaicompat.NewLlamafile: no API key, model = "default", no retries
- slm.Manager: DefaultDataDir() (XDG), Manifest() accessor
- cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm
  registered with MaxComplexity=0.3 when enabled + set up
- tui: /config shows slm status (ready/missing/not set up + base URL if running)
- docs: roadmap updated to reflect llamafile pivot from Ollama
2026-05-07 16:44:32 +02:00
vikingowl d1a5c79fa4 feat(slm): Wave B — Manager, Manifest, download, subprocess lifecycle
- Manifest: JSON read/write with atomic rename; presence = ready invariant
- download: HTTP fetch with SHA256 computation, progress callback, cleanup on failure
- Manager: Status (NotSetUp/Ready/Missing), Setup (download + manifest write),
  Start (freePort, exec, PID file, health check), Stop, BaseURL
- waitHealthy: polls /health with 15s ceiling and context cancellation
- reapStalePID: kills stale process from previous run on next Start
- 28 tests; all pass
2026-05-07 16:23:46 +02:00
vikingowl 8b2202e8ec feat(classifier): Wave A — TaskClassifier interface + HeuristicClassifier
- internal/router/classifier.go: TaskClassifier interface with
  Classify(ctx, prompt, history) signature. HeuristicClassifier wraps
  the existing ClassifyTask() with zero behavior change.

- engine.Config.Classifier: injectable TaskClassifier; nil defaults
  to HeuristicClassifier. Engine.classify() helper handles nil + error
  fallback transparently.

- loop.go: all four router.ClassifyTask() call sites replaced with
  e.classify(ctx, prompt). SLMClassifier slots in without further
  changes to the engine.
2026-05-07 16:11:20 +02:00
vikingowl 0b1392cf6b feat(pty): Phase 2 — interactive shell and bash interactive detection
- /shell [cmd]: launch user's $SHELL via tea.ExecProcess (PTY handoff)
  hands terminal to the shell and restores TUI on exit.
  /shell <cmd> runs that command in the shell directly.
  Detects $SHELL > $COMSPEC > /bin/sh|powershell.exe in order.

- bash tool: detect interactive commands before execution
  Prefix-interactive: sudo, ssh, passwd, vim/vi/nano, less/more,
  htop/top, mysql/psql, ftp/sftp, git push.
  Exact-interactive (REPL): python3/python/node/irb/iex/ghci/julia.
  Returns a tool result with interactive=true metadata and a hint to
  use /shell instead of hanging or erroring.

- completions: add /shell to builtin command list
- help: document /shell [cmd]
2026-05-07 15:52:56 +02:00
vikingowl 176926924c feat(engine): M8 cleanup — Wave B skill enforcement
- Add tool.PathSensitiveTool interface (ExtractPaths); implement on all 6 fs tools
- Add engine.TurnOptions.AllowedPaths: restricts tool filesystem access per skill invocation
- Bash is denied outright when AllowedPaths is active (unparseable command args)
- fs tools with empty path (cwd default) resolved via os.Getwd() and validated
- Add engine.TurnOptions.AllowedTools + AllowedPaths wiring in pipe mode (main.go) and TUI skill dispatch (tui/app.go)
- Remove TODO(M8.3) from skill.Frontmatter — enforcement is now complete
2026-05-07 15:29:33 +02:00
vikingowl 9fb520fba6 feat(engine): M8 cleanup — Wave A wiring gaps
- Remove stale TODO(P0c) comment from main.go (resolved by P0c tier routing)
- Wire config.Provider.Temperature → engine.Config.Temperature → provider.Request
- Add WithMaxFileSize option to fs.write; wire cfg.Tools.MaxFileSize in main.go
- Wire router.ReportOutcome after each runLoop return (success = err == nil)
- Fix nil-callback guard on EventRouting dispatch (pre-existing bug exposed by new test)
2026-05-07 15:22:22 +02:00
vikingowl 6883c2a041 feat(router): tier-based routing — CLI > local > API, disabled arms
Adds explicit tier preference to arm selection so the router
deterministically prefers lower-cost arms before falling back:

  tier 0: CLI agents (IsCLIAgent=true, subprocess/claude|gemini|vibe)
  tier 1: local models (IsLocal=true, ollama/llamacpp)
  tier 2: API providers (everything else)

Within a tier, quality/cost scoring still applies. filterFeasible still
gates on quality thresholds, so a low-quality local arm won't beat a
high-quality API arm when the task's minimum threshold rules it out.

Also adds Arm.Disabled: arms with Disabled=true are excluded from
auto-routing but remain selectable via ForceArm.

Implementation: armTier helper + selectBest refactored to try tiers in
order, bestScored picks within a tier. router.Select skips disabled arms
in allArms collection (forced arm bypasses disable check).
2026-05-07 14:36:36 +02:00
vikingowl 44d0bdc032 feat(provider): subprocess CLI provider for claude, gemini, vibe
Adds internal/provider/subprocess — a provider.Provider that spawns CLI
agents (claude, gemini, vibe) as subprocesses and streams their output.

- FormatParser interface + three parsers for claude-stream-json,
  gemini-stream-json, and vibe-streaming formats; fixtures captured from
  real binaries
- subprocessStream: pull-based stream.Stream over subprocess stdout with
  bounded stderr capture (8KB) and guarded reap() to prevent double-Wait
- DiscoverCLIAgents: parallel PATH scan with 10s timeout, stable ordering
- Provider: only the last user message is passed as --prompt; all other
  request fields (history, tools, system prompt) are intentionally ignored
  (see package doc)
- main.go: discover and register CLI arms at startup; TODO(P0c) for
  tier-based routing to enforce preference order explicitly
2026-05-07 14:29:34 +02:00
vikingowl 7fbb5454ee feat(router): normalize effort/thinking abstraction across providers
Add EffortLevel (auto/low/medium/high) as a provider-agnostic reasoning
control, replacing the Capabilities.Thinking bool. Each provider maps
the level to its native parameter: Anthropic budget tokens (1K/8K/16K),
OpenAI reasoning_effort (low/medium/high), Google thinking budget
(1K/8K/16K). Task classification auto-infers effort from TaskType and
complexity; filterFeasible excludes arms that lack the required level.
2026-05-07 14:08:50 +02:00
vikingowl d71bd942c4 feat: local model reliability — SDK retries, capability probing, init skill, context compaction
Three compounding bugs prevented tool calling with llama.cpp:
- Stream parser set argsComplete on partial JSON (e.g. "{"), dropping
  subsequent argument deltas — fix: use json.Valid to detect completeness
- Missing tool_choice default — llama.cpp needs explicit "auto" to
  activate its GBNF grammar constraint; now set when tools are present
- Tool names in history used internal format (fs.ls) while definitions
  used API format (fs_ls) — now re-sanitized in translateMessage

Additional changes:
- Disable SDK retries for local providers (500s are deterministic)
- Dynamic capability probing via /props (llama.cpp) and /api/show
  (Ollama), replacing hardcoded model prefix list
- Engine respects forced arm ToolUse capability when router is active
- Bundled /init skill with Go template blocks, context-aware for local
  vs cloud models, deduplication rules against CLAUDE.md
- Tool result compaction for local models — previous round results
  replaced with size markers to stay within small context windows
- Text-only fallback when tool-parse errors occur on local models
- "text-only" TUI indicator when model lacks tool support
- Session ResetError for retry after stream failures
- AllowedTools per-turn filtering in engine buildRequest
2026-04-13 02:01:01 +02:00
vikingowl 2093beea58 fix: deterministic 500 retry, OpenAI error wrapping, local /init prompt
Stop retrying llama.cpp 500s that are deterministic tool-parse failures
by inspecting the error message body (ClassifyHTTPError). Wrap OpenAI SDK
errors as ProviderError so the engine's retry logic classifies them. Add
localInitPrompt for local models that uses sequential fs_* calls instead
of spawn_elfs (which local models can't produce reliably).
2026-04-12 18:35:18 +02:00
vikingowl 0caab0fed1 fix(router): discovery loop removes forced arm, breaking routing
The discovery loop's reconcileArms removed the CLI-forced arm
(llamacpp/default) because the llama.cpp server reports the real model
name (e.g. gemma-26b), creating a mismatch. After 30s the forced arm
disappeared and all subsequent requests failed.

Three-layer fix:
- Eager: query the specific provider at startup to resolve the real
  model name before registering the forced arm
- Lazy: reconcileArms detects placeholder "default" arm names and
  atomically renames them when discovery reveals the real identity,
  with an onReconcile callback to update the session and TUI
- Guard: the forced arm is never garbage-collected by the removal loop

Also fixes misleading /init error messaging — failed inits now show
"loaded from disk (init failed)" instead of "AGENTS.md written to".
2026-04-12 17:51:30 +02:00
vikingowl ce5f9d3dc9 feat(tui): Tier 3-4 UX improvements — split, routing, session naming, context bar
- Split app.go (2091→1378 lines) into rendering.go, events.go, init.go
- Add EventRouting stream event for router arm transparency
- Add session auto-naming from first user message
- Add context window progress bar in status bar
- Add /keys cheatsheet, /replay for resumed sessions
- Add inline cost-per-turn after assistant responses
- Add diff previews in fs.write/fs.edit permission prompts
- Collapse tool output to 3 lines by default (ctrl+o expands)
- Use AddPrefix for system context instead of InjectMessage
- Handle ContentThinking and ContentToolResult in session resume
- Show session title in resume picker
- Add /model numeric selection snapshot safety
2026-04-12 05:13:16 +02:00
vikingowl 48e63a9bc0 feat(tui): Tier 1-2 UX improvements — completions, usage, provider status
Tier 1 (launch blockers):
- Remove /shell from /help (advertised but unimplemented)
- Kill dead _ = closeLen assignment
- Cache glamour renderer by width — no longer recreated on every
  WindowSizeMsg when width hasn't changed

Tier 2 (ship-quality UX):
- Slash command ghost-text completion with Tab accept. Sources: static
  command list + dynamic skill names. /permission gets arg completion
  for the 6 modes.
- /compact reports before/after token counts (e.g. "32k → 18k tokens")
- /provider shows all registered arms grouped by provider, not just
  "restart required"
- /usage command: input/output/total tokens, context %, provider, turns
- Widen Ctrl+C quit window from 1s to 2s
- "new content below" indicator when scrolled up during streaming
- Permission prompt: inline chat notification when approval needed,
  so the user notices even if focused on input
2026-04-12 04:19:55 +02:00
vikingowl e04cacc215 fix: append mutation, pipe-mode hang, Mistral regex false positives
- Fix append footgun: allHooks/allMCPServers allocated fresh to avoid
  mutating cfg's backing array (lines 391/413 in main.go)
- Fix pipe-mode permission prompt: detect no-TTY stdin and auto-deny
  instead of blocking forever on fmt.Scanln EOF
- Tighten Mistral API key regex from bare [a-zA-Z0-9]{32} (matched
  commit hashes, UUIDs) to context-gated pattern requiring "mistral"
  keyword nearby. Added scanner test for positives and negatives.
- Remove README demo GIF TODO placeholder
- Unify version string: pass buildVersion from ldflags into tui.Config
  instead of hardcoding "v0.1.0-dev"
- Populate benchmarks doc with actual Go benchmark results
2026-04-12 03:49:47 +02:00
vikingowl 6bb9c33d04 fix(m8): replace_default map, error UX, benchmarks, and launch prep
- Fix replace_default positional bug: []string → map[string]string for
  explicit MCP tool → built-in name mapping
- Improve error messages for missing API keys (3 actionable options) and
  unknown providers (early validation with available list)
- Remove python3 dependency from MCP tests (pure bash grep/sed parsing)
- Add router benchmark scaffold (6 benchmarks in bench_test.go + docs)
- Add .goreleaser.yml for cross-platform binary releases with ldflags
- Add launch-ready README with quickstart, extensibility docs, GIF placeholder
- Add CONTRIBUTING.md and Gitea issue templates (bug report, feature request)
2026-04-12 03:34:58 +02:00
vikingowl 6c47f8643b feat(m8): MCP client, tool replaceability, and plugin system
Complete the remaining M8 extensibility deliverables:

- MCP client with JSON-RPC 2.0 over stdio transport, protocol
  lifecycle (initialize/tools-list/tools-call), and process group
  management for clean shutdown
- MCP tool adapter implementing tool.Tool with mcp__{server}__{tool}
  naming convention and replace_default for swapping built-in tools
- MCP manager for multi-server orchestration with parallel startup,
  tool discovery, and registry integration
- Plugin system with plugin.json manifest (name/version/capabilities),
  directory-based discovery (global + project scopes with precedence),
  loader that merges skills/hooks/MCP configs into existing registries,
  and install/uninstall/list lifecycle manager
- Config additions: MCPServerConfig, PluginsSection with opt-in/opt-out
  enabled/disabled resolution
- TUI /plugins command for listing installed plugins
- 54 tests across internal/mcp and internal/plugin packages
2026-04-12 03:09:05 +02:00
vikingowl c07ec63419 feat(skill): enhanced coordinator prompt with fan-out and concurrency guidance 2026-04-07 02:24:49 +02:00
vikingowl 48c7b7aad4 feat(skill): pipe mode support and main.go wiring 2026-04-07 02:19:42 +02:00
vikingowl 893880039b feat(skill): TUI integration — /skillname invokes skills, /skills lists them 2026-04-07 02:18:12 +02:00
vikingowl b60daf9940 feat(skill): registry with multi-directory loading and precedence 2026-04-07 02:17:17 +02:00
vikingowl 61adb24773 feat(skill): bundled /batch skill with go:embed 2026-04-07 02:16:35 +02:00