Commit Graph

140 Commits

Author SHA1 Message Date
vikingowl 2bf700eec2 test(elf): make mockProvider.calls atomic
Race detector flagged concurrent access to mockProvider.calls during
TestManager_SpawnAndList and TestManager_WaitAll, where multiple spawned
engines share the same mock. Switch to atomic.Int64.

Closes audit finding L1. `go test -race ./...` is now fully green.
2026-05-19 16:19:40 +02:00
vikingowl 5cd3ccd931 fix(engine): guard mutable state with a mutex
Engine.history, usage, activatedTools, modelCaps, turnOpts, and
cfg.Provider/Model are now mutated and read under e.mu. The lock is
released across blocking provider.Stream calls so external setters
(SetProvider, SetHistory, InjectMessage, etc.) can interleave.

History() now returns a copy. Snapshot helpers (latestUserPrompt,
historySnapshot, snapshotTurnOpts, etc.) replace the unsynchronised
reads scattered through runLoop and buildRequest.

Closes audit finding H4. Adds a race regression test that fails under
-race before the fix and passes after.
2026-05-19 16:18:17 +02:00
vikingowl b60aa02bfd feat(fs): enforce workspace boundary on fs tools
Adds a Guard that resolves every path against an allowlist of absolute
roots (default: cwd) and rejects anything escaping via relative segments,
absolute paths outside the root, or symlinks (including symlinked
parents on writes).

Closes audit finding C1: fs.read/fs.write/fs.edit/fs.glob/fs.grep/fs.ls
previously accepted any absolute path; the only protection was a
substring denylist (.env, .ssh/, ...) which missed /etc/shadow, kube
configs, IDE secrets, and anything reachable via symlink.
2026-05-19 16:07:29 +02:00
vikingowl 135c8afe80 feat: various improvements to engine, router, and TUI
- engine/loop: enhanced loop handling
- router: dynamic model discovery and task improvements
- tui: suggestion box, input mode indicator, completions enhancements
2026-05-07 22:51:50 +02:00
vikingowl 0d2d825e52 feat: add dynamic model discovery within providers
- OpenAI provider: use Models.ListAutoPaging() to discover available models
- Anthropic provider: use Models.ListAutoPaging() to discover available models
- Google provider: use Models.All() iterator to discover available models
- All providers fall back to hardcoded lists if API calls fail
- Add capability inference functions for each provider based on model ID
- Add tests for model discovery fallback behavior

This enables gnoma to dynamically discover new models as they become available
from cloud providers, while maintaining backward compatibility with fallback
lists for offline use or API failures.
2026-05-07 22:27:24 +02:00
vikingowl befcbdcfef feat(tui): suggestion box above input, input mode indicator, ! execute
- Suggestion dropdown now renders between separator and input (not in
  chat area) — no more box at the top of an empty chat
- Ghost text suppressed when dropdown is visible (eliminates the
  'fig' / trailing text on the right)
- Bottom separator shows purple 'cmd' label when typing '/' and
  yellow 'exec' label when typing '!'
- '! <cmd>' prefix executes a raw shell command inline and shows
  output in the chat (same as /shell but one-shot)
2026-05-07 17:35:45 +02:00
vikingowl d2139c6f0c perf+feat: parallel startup discovery + slash-command suggestion dropdown
Startup: HarvestAliases, HarvestInventory, DiscoverCLIAgents, and
DiscoverLocalModels now run concurrently. Worst case latency drops
from sum(all) to max(all) — eliminates the 15s inventory timeout
from blocking the main path.

TUI: typing '/co' now shows a bordered dropdown of all matching
commands with descriptions. ↑↓ navigate, Tab/Enter accepts the
highlighted entry, Esc dismisses. Ghost-text still works for
unique unambiguous matches.
2026-05-07 17:30:16 +02:00
vikingowl f8867f5d78 feat(tui): /config opens interactive settings panel
Replaces the text dump with a navigable bordered overlay.
↑↓ to move, Enter to cycle/toggle values, Esc to close.
Shows: Model (cycles through discovered arms), Permission mode,
Incognito toggle.
2026-05-07 17:23:43 +02:00
vikingowl 71f31559c2 feat(cli): add 'gnoma providers' subcommand
Lists configured provider, auto-discovered CLI agents (claude/gemini/vibe),
running local models (ollama/llamacpp), and SLM status in one shot.
2026-05-07 17:15:46 +02:00
vikingowl adb4f5db5d fix(slm): start llamafile in background; use lazyClassifier
Blocking Start() call (up to 15s) no longer delays TUI startup.
lazyClassifier falls back to heuristic until llamafile is healthy,
then atomically swaps in the SLM classifier.
2026-05-07 17:13:56 +02:00
vikingowl 9037a0d195 fix(slm): skip re-download when already set up
Setup() now returns early if Status() == StatusReady.
CLI also prints the existing path/size instead of starting a download.
2026-05-07 17:10:16 +02:00
vikingowl 329610209a fix(slm): invoke llamafile via sh to bypass Wine binfmt_misc
APE polyglot binaries start with MZ magic bytes which Wine's
binfmt_misc rule intercepts on Linux. llamafile is also a valid
POSIX shell script; running it via 'sh' bypasses the kernel's
binfmt_misc lookup entirely.
2026-05-07 17:08:52 +02:00
vikingowl 0a1730943f fix: provider-agnostic startup + slm setup auto-config
Remove the hardcoded mistral default so gnoma starts without any
provider configured. TUI mode uses a stubProvider that lets CLI agent
arms (claude, gemini, etc.) handle routing; pipe mode prints a clear
setup message.

Also: gnoma slm setup now auto-writes the default model_url to the
global config when none is set, instead of erroring.
2026-05-07 17:05:06 +02:00
vikingowl 062566a23d fix(cli): three UX issues — help output, TUI startup, setup command
- Custom flag.Usage: shows subcommands and usage patterns; -h is no longer useless
- system flag default is now '' (applies built-in at runtime); flag help no longer
  spews the entire system prompt
- API key check skips hard-exit in TUI mode; TUI starts and surfaces auth errors
  inline on first request instead of blocking at launch
- gnoma slm setup: progress shows speed (bytes/s), no hardcoded model URL in
  error message, points to llamafile releases page instead
2026-05-07 16:53:57 +02:00
vikingowl a9213ec382 feat(slm): Wave C — SLM classifier, MaxComplexity routing, CLI subcommands, TUI status
- slm.Classifier: openaicompat → llamafile, 2s timeout + heuristic fallback,
  heuristic baseline blended so Priority/RequiredEffort are never zeroed,
  extractJSON strips markdown fences from small-model responses
- router.ParseTaskType: case-insensitive string → TaskType, unknown → TaskGeneration
- router.Arm.MaxComplexity: zero = no ceiling (preserves existing arm behavior);
  filterFeasible excludes arms when task.ComplexityScore > MaxComplexity
- config.SLMSection: [slm] enabled / model_url / data_dir
- openaicompat.NewLlamafile: no API key, model = "default", no retries
- slm.Manager: DefaultDataDir() (XDG), Manifest() accessor
- cmd/gnoma: `gnoma slm setup` / `gnoma slm status` subcommands; SLM arm
  registered with MaxComplexity=0.3 when enabled + set up
- tui: /config shows slm status (ready/missing/not set up + base URL if running)
- docs: roadmap updated to reflect llamafile pivot from Ollama
2026-05-07 16:44:32 +02:00
vikingowl d1a5c79fa4 feat(slm): Wave B — Manager, Manifest, download, subprocess lifecycle
- Manifest: JSON read/write with atomic rename; presence = ready invariant
- download: HTTP fetch with SHA256 computation, progress callback, cleanup on failure
- Manager: Status (NotSetUp/Ready/Missing), Setup (download + manifest write),
  Start (freePort, exec, PID file, health check), Stop, BaseURL
- waitHealthy: polls /health with 15s ceiling and context cancellation
- reapStalePID: kills stale process from previous run on next Start
- 28 tests; all pass
2026-05-07 16:23:46 +02:00
vikingowl 8b2202e8ec feat(classifier): Wave A — TaskClassifier interface + HeuristicClassifier
- internal/router/classifier.go: TaskClassifier interface with
  Classify(ctx, prompt, history) signature. HeuristicClassifier wraps
  the existing ClassifyTask() with zero behavior change.

- engine.Config.Classifier: injectable TaskClassifier; nil defaults
  to HeuristicClassifier. Engine.classify() helper handles nil + error
  fallback transparently.

- loop.go: all four router.ClassifyTask() call sites replaced with
  e.classify(ctx, prompt). SLMClassifier slots in without further
  changes to the engine.
2026-05-07 16:11:20 +02:00
vikingowl 0b1392cf6b feat(pty): Phase 2 — interactive shell and bash interactive detection
- /shell [cmd]: launch user's $SHELL via tea.ExecProcess (PTY handoff)
  hands terminal to the shell and restores TUI on exit.
  /shell <cmd> runs that command in the shell directly.
  Detects $SHELL > $COMSPEC > /bin/sh|powershell.exe in order.

- bash tool: detect interactive commands before execution
  Prefix-interactive: sudo, ssh, passwd, vim/vi/nano, less/more,
  htop/top, mysql/psql, ftp/sftp, git push.
  Exact-interactive (REPL): python3/python/node/irb/iex/ghci/julia.
  Returns a tool result with interactive=true metadata and a hint to
  use /shell instead of hanging or erroring.

- completions: add /shell to builtin command list
- help: document /shell [cmd]
2026-05-07 15:52:56 +02:00
vikingowl 176926924c feat(engine): M8 cleanup — Wave B skill enforcement
- Add tool.PathSensitiveTool interface (ExtractPaths); implement on all 6 fs tools
- Add engine.TurnOptions.AllowedPaths: restricts tool filesystem access per skill invocation
- Bash is denied outright when AllowedPaths is active (unparseable command args)
- fs tools with empty path (cwd default) resolved via os.Getwd() and validated
- Add engine.TurnOptions.AllowedTools + AllowedPaths wiring in pipe mode (main.go) and TUI skill dispatch (tui/app.go)
- Remove TODO(M8.3) from skill.Frontmatter — enforcement is now complete
2026-05-07 15:29:33 +02:00
vikingowl 9fb520fba6 feat(engine): M8 cleanup — Wave A wiring gaps
- Remove stale TODO(P0c) comment from main.go (resolved by P0c tier routing)
- Wire config.Provider.Temperature → engine.Config.Temperature → provider.Request
- Add WithMaxFileSize option to fs.write; wire cfg.Tools.MaxFileSize in main.go
- Wire router.ReportOutcome after each runLoop return (success = err == nil)
- Fix nil-callback guard on EventRouting dispatch (pre-existing bug exposed by new test)
2026-05-07 15:22:22 +02:00
vikingowl 5569d4fb86 docs: consolidated roadmap, ADR-013, drop stale plans
- New 7-phase roadmap (2026-05-07-gnoma-roadmap.md) covering M8 cleanup,
  PTY interactive shell, SLM classifier, router revisit, USP security,
  ELF support, and distribution
- ADR-013 (002-slm-routing.md): SLM-first routing supersedes ADR-009;
  Thompson Sampling deferred pending SLM production data
- ADR-009 status updated to "Superseded by ADR-013"
- gemma-integration-analysis.md: header note that Node.js specifics
  (LiteRT-LM, daemon, PID) don't apply to gnoma's Go implementation
- TODO.md replaced with thin pointer to roadmap + stable backlog
- Deleted stale plan/spec files: m6-m7-closeout, m8-hooks-design
2026-05-07 15:06:54 +02:00
vikingowl 19c196eedd docs: note routing revisit after SLM integration 2026-05-07 14:41:37 +02:00
vikingowl 6883c2a041 feat(router): tier-based routing — CLI > local > API, disabled arms
Adds explicit tier preference to arm selection so the router
deterministically prefers lower-cost arms before falling back:

  tier 0: CLI agents (IsCLIAgent=true, subprocess/claude|gemini|vibe)
  tier 1: local models (IsLocal=true, ollama/llamacpp)
  tier 2: API providers (everything else)

Within a tier, quality/cost scoring still applies. filterFeasible still
gates on quality thresholds, so a low-quality local arm won't beat a
high-quality API arm when the task's minimum threshold rules it out.

Also adds Arm.Disabled: arms with Disabled=true are excluded from
auto-routing but remain selectable via ForceArm.

Implementation: armTier helper + selectBest refactored to try tiers in
order, bestScored picks within a tier. router.Select skips disabled arms
in allArms collection (forced arm bypasses disable check).
2026-05-07 14:36:36 +02:00
vikingowl 44d0bdc032 feat(provider): subprocess CLI provider for claude, gemini, vibe
Adds internal/provider/subprocess — a provider.Provider that spawns CLI
agents (claude, gemini, vibe) as subprocesses and streams their output.

- FormatParser interface + three parsers for claude-stream-json,
  gemini-stream-json, and vibe-streaming formats; fixtures captured from
  real binaries
- subprocessStream: pull-based stream.Stream over subprocess stdout with
  bounded stderr capture (8KB) and guarded reap() to prevent double-Wait
- DiscoverCLIAgents: parallel PATH scan with 10s timeout, stable ordering
- Provider: only the last user message is passed as --prompt; all other
  request fields (history, tools, system prompt) are intentionally ignored
  (see package doc)
- main.go: discover and register CLI arms at startup; TODO(P0c) for
  tier-based routing to enforce preference order explicitly
2026-05-07 14:29:34 +02:00
vikingowl 7fbb5454ee feat(router): normalize effort/thinking abstraction across providers
Add EffortLevel (auto/low/medium/high) as a provider-agnostic reasoning
control, replacing the Capabilities.Thinking bool. Each provider maps
the level to its native parameter: Anthropic budget tokens (1K/8K/16K),
OpenAI reasoning_effort (low/medium/high), Google thinking budget
(1K/8K/16K). Task classification auto-infers effort from TaskType and
complexity; filterFeasible excludes arms that lack the required level.
2026-05-07 14:08:50 +02:00
vikingowl 83240e907c docs: update TODO with Native SLM Runtime integration
- Replace Gemma Integration with expanded SLM Preflight Engine section
- Add Deep Intent Routing (Skill Decomposer, Context Flattener, HITL toggle)
- Add Security & Iron Law Integration (USP Pre-Audit, Hallucination Gate)
- Include Recommended Tiny Stack table (Gemma 3 270M, ollama/llm, Q4_K_M GGUF)
- Document the Integrated Flow for local vs frontier routing
2026-05-07 11:36:00 +02:00
vikingowl 488201b908 docs: add TODO roadmap for gemma routing, USP integration, local tmp, and ELF support 2026-05-07 00:21:52 +02:00
vikingowl d71bd942c4 feat: local model reliability — SDK retries, capability probing, init skill, context compaction
Three compounding bugs prevented tool calling with llama.cpp:
- Stream parser set argsComplete on partial JSON (e.g. "{"), dropping
  subsequent argument deltas — fix: use json.Valid to detect completeness
- Missing tool_choice default — llama.cpp needs explicit "auto" to
  activate its GBNF grammar constraint; now set when tools are present
- Tool names in history used internal format (fs.ls) while definitions
  used API format (fs_ls) — now re-sanitized in translateMessage

Additional changes:
- Disable SDK retries for local providers (500s are deterministic)
- Dynamic capability probing via /props (llama.cpp) and /api/show
  (Ollama), replacing hardcoded model prefix list
- Engine respects forced arm ToolUse capability when router is active
- Bundled /init skill with Go template blocks, context-aware for local
  vs cloud models, deduplication rules against CLAUDE.md
- Tool result compaction for local models — previous round results
  replaced with size markers to stay within small context windows
- Text-only fallback when tool-parse errors occur on local models
- "text-only" TUI indicator when model lacks tool support
- Session ResetError for retry after stream failures
- AllowedTools per-turn filtering in engine buildRequest
2026-04-13 02:01:01 +02:00
vikingowl 2093beea58 fix: deterministic 500 retry, OpenAI error wrapping, local /init prompt
Stop retrying llama.cpp 500s that are deterministic tool-parse failures
by inspecting the error message body (ClassifyHTTPError). Wrap OpenAI SDK
errors as ProviderError so the engine's retry logic classifies them. Add
localInitPrompt for local models that uses sequential fs_* calls instead
of spawn_elfs (which local models can't produce reliably).
2026-04-12 18:35:18 +02:00
vikingowl 0caab0fed1 fix(router): discovery loop removes forced arm, breaking routing
The discovery loop's reconcileArms removed the CLI-forced arm
(llamacpp/default) because the llama.cpp server reports the real model
name (e.g. gemma-26b), creating a mismatch. After 30s the forced arm
disappeared and all subsequent requests failed.

Three-layer fix:
- Eager: query the specific provider at startup to resolve the real
  model name before registering the forced arm
- Lazy: reconcileArms detects placeholder "default" arm names and
  atomically renames them when discovery reveals the real identity,
  with an onReconcile callback to update the session and TUI
- Guard: the forced arm is never garbage-collected by the removal loop

Also fixes misleading /init error messaging — failed inits now show
"loaded from disk (init failed)" instead of "AGENTS.md written to".
2026-04-12 17:51:30 +02:00
vikingowl ce5f9d3dc9 feat(tui): Tier 3-4 UX improvements — split, routing, session naming, context bar
- Split app.go (2091→1378 lines) into rendering.go, events.go, init.go
- Add EventRouting stream event for router arm transparency
- Add session auto-naming from first user message
- Add context window progress bar in status bar
- Add /keys cheatsheet, /replay for resumed sessions
- Add inline cost-per-turn after assistant responses
- Add diff previews in fs.write/fs.edit permission prompts
- Collapse tool output to 3 lines by default (ctrl+o expands)
- Use AddPrefix for system context instead of InjectMessage
- Handle ContentThinking and ContentToolResult in session resume
- Show session title in resume picker
- Add /model numeric selection snapshot safety
2026-04-12 05:13:16 +02:00
vikingowl 48e63a9bc0 feat(tui): Tier 1-2 UX improvements — completions, usage, provider status
Tier 1 (launch blockers):
- Remove /shell from /help (advertised but unimplemented)
- Kill dead _ = closeLen assignment
- Cache glamour renderer by width — no longer recreated on every
  WindowSizeMsg when width hasn't changed

Tier 2 (ship-quality UX):
- Slash command ghost-text completion with Tab accept. Sources: static
  command list + dynamic skill names. /permission gets arg completion
  for the 6 modes.
- /compact reports before/after token counts (e.g. "32k → 18k tokens")
- /provider shows all registered arms grouped by provider, not just
  "restart required"
- /usage command: input/output/total tokens, context %, provider, turns
- Widen Ctrl+C quit window from 1s to 2s
- "new content below" indicator when scrolled up during streaming
- Permission prompt: inline chat notification when approval needed,
  so the user notices even if focused on input
2026-04-12 04:19:55 +02:00
vikingowl e04cacc215 fix: append mutation, pipe-mode hang, Mistral regex false positives
- Fix append footgun: allHooks/allMCPServers allocated fresh to avoid
  mutating cfg's backing array (lines 391/413 in main.go)
- Fix pipe-mode permission prompt: detect no-TTY stdin and auto-deny
  instead of blocking forever on fmt.Scanln EOF
- Tighten Mistral API key regex from bare [a-zA-Z0-9]{32} (matched
  commit hashes, UUIDs) to context-gated pattern requiring "mistral"
  keyword nearby. Added scanner test for positives and negatives.
- Remove README demo GIF TODO placeholder
- Unify version string: pass buildVersion from ldflags into tui.Config
  instead of hardcoding "v0.1.0-dev"
- Populate benchmarks doc with actual Go benchmark results
2026-04-12 03:49:47 +02:00
vikingowl 6bb9c33d04 fix(m8): replace_default map, error UX, benchmarks, and launch prep
- Fix replace_default positional bug: []string → map[string]string for
  explicit MCP tool → built-in name mapping
- Improve error messages for missing API keys (3 actionable options) and
  unknown providers (early validation with available list)
- Remove python3 dependency from MCP tests (pure bash grep/sed parsing)
- Add router benchmark scaffold (6 benchmarks in bench_test.go + docs)
- Add .goreleaser.yml for cross-platform binary releases with ldflags
- Add launch-ready README with quickstart, extensibility docs, GIF placeholder
- Add CONTRIBUTING.md and Gitea issue templates (bug report, feature request)
2026-04-12 03:34:58 +02:00
vikingowl 6c47f8643b feat(m8): MCP client, tool replaceability, and plugin system
Complete the remaining M8 extensibility deliverables:

- MCP client with JSON-RPC 2.0 over stdio transport, protocol
  lifecycle (initialize/tools-list/tools-call), and process group
  management for clean shutdown
- MCP tool adapter implementing tool.Tool with mcp__{server}__{tool}
  naming convention and replace_default for swapping built-in tools
- MCP manager for multi-server orchestration with parallel startup,
  tool discovery, and registry integration
- Plugin system with plugin.json manifest (name/version/capabilities),
  directory-based discovery (global + project scopes with precedence),
  loader that merges skills/hooks/MCP configs into existing registries,
  and install/uninstall/list lifecycle manager
- Config additions: MCPServerConfig, PluginsSection with opt-in/opt-out
  enabled/disabled resolution
- TUI /plugins command for listing installed plugins
- 54 tests across internal/mcp and internal/plugin packages
2026-04-12 03:09:05 +02:00
vikingowl 8d97c6cd39 docs: mark M8.2 skill system deliverables complete in milestones.md 2026-04-07 02:25:29 +02:00
vikingowl c07ec63419 feat(skill): enhanced coordinator prompt with fan-out and concurrency guidance 2026-04-07 02:24:49 +02:00
vikingowl 48c7b7aad4 feat(skill): pipe mode support and main.go wiring 2026-04-07 02:19:42 +02:00
vikingowl 893880039b feat(skill): TUI integration — /skillname invokes skills, /skills lists them 2026-04-07 02:18:12 +02:00
vikingowl b60daf9940 feat(skill): registry with multi-directory loading and precedence 2026-04-07 02:17:17 +02:00
vikingowl 61adb24773 feat(skill): bundled /batch skill with go:embed 2026-04-07 02:16:35 +02:00
vikingowl ead91e6ccf feat(skill): template rendering with Go text/template 2026-04-07 02:15:51 +02:00
vikingowl edc0e97efc feat(skill): core Skill type and YAML frontmatter parser 2026-04-07 02:05:49 +02:00
vikingowl 7a0b3c5887 chore: add gopkg.in/yaml.v3 for skill frontmatter parsing 2026-04-07 02:04:36 +02:00
vikingowl 24f4a739a6 docs: mark M8.1 hook system deliverables complete in milestones.md 2026-04-07 01:09:07 +02:00
vikingowl 8d9c521f7a feat: wire hook dispatcher in main.go — SessionStart, SessionEnd, PreCompact 2026-04-07 01:08:40 +02:00
vikingowl 1ec90b0ad7 feat: engine hook integration — PreToolUse, PostToolUse, Stop 2026-04-07 01:02:55 +02:00
vikingowl 50bb5f2f6b feat: AgentExecutor — elf-based hook evaluation via elf.Manager 2026-04-07 00:55:19 +02:00
vikingowl 45c0d0c43e feat: PromptExecutor — LLM-based hook evaluation via router 2026-04-07 00:53:53 +02:00
vikingowl 685e3b97f2 feat: ParseHookDefs — config to HookDef conversion with validation 2026-04-07 00:52:00 +02:00