Initial project documentation for workflow-miner — a Rust CLI + zsh plugin that mines recurring command workflows from Atuin shell history.
13 KiB
13 KiB
Architecture
Overview
A Rust CLI + zsh plugin that reads Atuin's shell history SQLite database, mines recurring multi-step command workflows using sequential pattern mining, and surfaces them as reusable recipes.
System Diagram
┌─────────────────────────────────────────────────────┐
│ CLI binary (wfm) │
├──────────┬──────────┬───────────┬───────────────────┤
│ ingest │ mine │ store │ output │
│ │ │ │ │
│ read │ prefix │ own │ ┌─ tui dashboard │
│ atuin │ span │ sqlite │ ├─ zsh widget │
│ sqlite │ + BIDE │ for │ ├─ justfile gen │
│ (r/o) │ │ patterns │ ├─ json/yaml │
│ │ cluster │ │ └─ llm describe │
│ abstract│ │ │ (optional) │
│ commands│ │ │ │
└──────────┴──────────┴───────────┴───────────────────┘
▲ │
│ ▼
~/.local/share/atuin/history.db ~/.config/wfm/
~/.local/share/wfm/
workflows.db
justfile (gen)
Crate Structure (Cargo workspace)
wfm/
├── Cargo.toml # workspace root
├── crates/
│ ├── wfm-core/ # domain logic, no I/O
│ │ ├── src/
│ │ │ ├── lib.rs
│ │ │ ├── abstraction.rs # command normalization
│ │ │ ├── prefixspan.rs # PrefixSpan algorithm
│ │ │ ├── clustering.rs # similar pattern dedup
│ │ │ ├── context.rs # project/cwd/git inference
│ │ │ └── workflow.rs # Workflow struct + scoring
│ │ └── Cargo.toml
│ ├── wfm-store/ # persistence (read atuin, write own db)
│ │ ├── src/
│ │ │ ├── lib.rs
│ │ │ ├── atuin.rs # read atuin history.db
│ │ │ ├── workflows.rs # read/write workflows.db
│ │ │ └── models.rs # shared DB models
│ │ └── Cargo.toml
│ ├── wfm-cli/ # CLI binary
│ │ ├── src/
│ │ │ ├── main.rs
│ │ │ ├── commands/
│ │ │ │ ├── mod.rs
│ │ │ │ ├── scan.rs # mine workflows from history
│ │ │ │ ├── list.rs # list discovered workflows
│ │ │ │ ├── show.rs # show workflow detail
│ │ │ │ ├── export.rs # justfile / json / yaml export
│ │ │ │ ├── suggest.rs # suggest next command (for zsh)
│ │ │ │ ├── forget.rs # dismiss/hide a workflow
│ │ │ │ └── describe.rs # LLM naming (optional)
│ │ │ └── tui/
│ │ │ ├── mod.rs
│ │ │ ├── app.rs # ratatui app state
│ │ │ ├── dashboard.rs # main view
│ │ │ ├── detail.rs # workflow detail view
│ │ │ └── export.rs # export dialog
│ │ └── Cargo.toml
│ └── wfm-zsh/ # zsh plugin (shell scripts, not Rust)
│ ├── wfm.plugin.zsh # main plugin entry
│ ├── functions/
│ │ └── _wfm # completions
│ └── README.md
└── README.md
Core Data Flow
Phase 1: Ingest
atuin history.db
│
▼
SELECT command, timestamp, cwd, session, duration, exit
FROM history WHERE deleted_at IS NULL
ORDER BY session, timestamp
│
▼
Group by session → Vec<Session { commands: Vec<Entry> }>
│
▼
Abstract each command:
"git commit -m 'fix typo'" → ["git", "commit", "-m", "<arg>"]
"docker compose up -d" → ["docker", "compose", "up", "-d"]
"cargo test -- --nocapture"→ ["cargo", "test", "--", "<arg>"]
│
▼
Optionally enrich with context:
- infer project root from cwd (walk up for .git, Cargo.toml, package.json)
- record git branch if .git exists (shell out to git rev-parse)
Phase 2: Mine
Abstracted session sequences
│
▼
PrefixSpan(min_support=2, max_gap=5, min_length=2, max_length=20)
│
▼
Frequent sequential patterns with support counts
│
▼
BIDE filter → closed patterns only (remove redundant sub-patterns)
│
▼
Cluster similar patterns (edit distance on abstract command sequences)
│
▼
Score patterns:
- frequency (how often)
- recency (when last seen)
- consistency (same order every time? or variations?)
- context specificity (always in same cwd/project? or global?)
│
▼
Ranked list of Workflow candidates
Phase 3: Store
Own SQLite database at ~/.local/share/wfm/workflows.db:
CREATE TABLE workflows (
id INTEGER PRIMARY KEY,
name TEXT, -- auto-generated or LLM-named
description TEXT, -- optional, LLM-generated
commands TEXT NOT NULL, -- JSON array of abstracted commands
raw_example TEXT NOT NULL, -- JSON: one concrete instance
support INTEGER NOT NULL, -- how many sessions contain this
first_seen INTEGER NOT NULL, -- timestamp
last_seen INTEGER NOT NULL, -- timestamp
avg_duration INTEGER, -- total workflow duration
contexts TEXT, -- JSON array of {cwd, project, branch}
score REAL NOT NULL, -- composite ranking score
status TEXT DEFAULT 'active', -- active | dismissed | exported
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL
);
CREATE TABLE workflow_instances (
id INTEGER PRIMARY KEY,
workflow_id INTEGER REFERENCES workflows(id),
session TEXT NOT NULL,
timestamp INTEGER NOT NULL,
commands TEXT NOT NULL, -- JSON: concrete commands for this instance
cwd TEXT,
duration INTEGER
);
CREATE INDEX idx_workflows_score ON workflows(score DESC);
CREATE INDEX idx_workflows_status ON workflows(status);
CREATE INDEX idx_instances_workflow ON workflow_instances(workflow_id);
Phase 4: Output
CLI commands
wfm scan # mine workflows from atuin history
wfm scan --since 30d # only recent history
wfm scan --project . # only current project context
wfm list # list discovered workflows, ranked
wfm list --context . # workflows relevant to current dir
wfm show <id> # detail view of a workflow
wfm export justfile # generate Justfile from top workflows
wfm export json # JSON catalog
wfm export yaml # YAML catalog
wfm suggest # what should I do next? (context-aware)
wfm forget <id> # dismiss a workflow
wfm describe <id> # name/describe with LLM (optional)
wfm describe --all # batch LLM naming
wfm tui # interactive dashboard
Command Abstraction Strategy
The abstractor converts concrete commands into mineable tokens:
Level 0 (raw): git commit -m "fix: handle null response"
Level 1 (words): ["git", "commit", "-m", "fix: handle null response"]
Level 2 (abstract):["git", "commit", "-m", "<arg>"]
Level 3 (verb): ["git", "commit"]
Mining uses Level 2 by default (preserves flags, abstracts values). Level 3 is a fallback for very noisy histories.
Special handling:
- Pipelines: split on
|— abstract each segment independently - Compound: split on
&&,||,;— treat as sub-sequence - Subshells/redirections: strip
>,>>,2>&1,<()for pattern matching - Common tools with known flag semantics: git, docker, kubectl, cargo, npm — maintain a small registry of "flags that take values" for better abstraction
Scoring Formula
score = w_freq * log(support)
+ w_recency * decay(days_since_last_seen)
+ w_consistency * (1 - variance_in_ordering)
+ w_specificity * context_specificity_ratio
Default weights: w_freq=0.4, w_recency=0.3, w_consistency=0.2, w_specificity=0.1
TUI Dashboard (ratatui)
┌─ Workflow Miner ─────────────────────────────────────┐
│ [Workflows] [Contexts] [Export] │
├──────────────────────────────────────────────────────┤
│ # │ Name │ Steps │ Freq │ Last Seen │
│ 1 │ deploy-staging │ 5 │ 23 │ 2h ago │
│ 2 │ test-and-commit │ 3 │ 87 │ 15m ago │
│ 3 │ docker-rebuild │ 4 │ 12 │ 1d ago │
│ 4 │ debug-api-logs │ 6 │ 8 │ 3d ago │
│ ►5 │ db-migration │ 4 │ 5 │ 1w ago │
├──────────────────────────────────────────────────────┤
│ db-migration (5 occurrences, avg 45s) │
│ │
│ 1. cargo sqlx prepare │
│ 2. cargo sqlx migrate run │
│ 3. cargo test --test db_tests │
│ 4. git add migrations/ │
│ │
│ Context: ~/projects/myapp (branch: main, feature/*) │
│ │
│ [e]xport [d]escribe [f]orget [Enter] instances │
└──────────────────────────────────────────────────────┘
zsh Plugin
Suggestion modes (configurable)
- keybinding (default): User presses
Ctrl+Spaceto request a suggestion. Zero overhead. - ambient:
precmdhook callswfm suggestafter every command. Must stay under 50ms. - both: Ambient display + keybinding to accept.
Widget sketch
_wfm_suggest() {
local suggestion
suggestion=$("${WFM_BIN:-wfm}" suggest \
--cwd "$PWD" \
--last-cmd "$_wfm_last_cmd" \
--session "$_wfm_session" \
--format oneline 2>/dev/null)
if [[ -n "$suggestion" ]]; then
_wfm_suggestion="$suggestion"
zle reset-prompt
fi
}
_wfm_accept() {
if [[ -n "$_wfm_suggestion" ]]; then
BUFFER="$_wfm_suggestion"
CURSOR=${#BUFFER}
_wfm_suggestion=""
zle reset-prompt
fi
}
zle -N _wfm_suggest
zle -N _wfm_accept
bindkey '^@' _wfm_accept # Ctrl+Space
Configuration
~/.config/wfm/config.toml:
[source]
atuin_db = "~/.local/share/atuin/history.db" # auto-detected
[mining]
min_support = 2 # minimum occurrences to be a workflow
max_gap = 5 # max intervening commands between steps
min_length = 2 # minimum workflow steps
max_length = 20 # maximum workflow steps
scan_window = "90d" # how far back to look by default
[scoring]
freq_weight = 0.4
recency_weight = 0.3
consistency_weight = 0.2
specificity_weight = 0.1
[suggest]
enabled = true
mode = "keybinding" # keybinding | ambient | both
min_score = 0.5 # don't suggest low-confidence workflows
show_in = "rprompt" # rprompt | inline | notification
[llm]
enabled = false # opt-in
command = "llm" # or "ollama run llama3.2"
model = "" # provider-specific model name
[export]
justfile_path = "./Justfile.wfm"
Dependencies
[workspace.dependencies]
rusqlite = { version = "0.38", features = ["bundled"] }
shell-words = "1"
ratatui = "0.30"
crossterm = "0.28"
chrono = { version = "0.4", features = ["serde"] }
clap = { version = "4", features = ["derive"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
serde_yaml = "0.9"
anyhow = "1"
dirs = "6"
Implementation Phases
Phase 1: Foundation (MVP)
- Cargo workspace setup
- Atuin DB reader (rusqlite, read-only)
- Command abstractor (shell-words + custom normalizer)
- PrefixSpan implementation
- Own SQLite storage for discovered workflows
wfm scanandwfm listcommands
Phase 2: Output
wfm show <id>with detailed viewwfm export justfile— generate Justfile recipeswfm export json/wfm export yaml- Scoring and ranking
Phase 3: Interactive
- TUI dashboard (ratatui)
wfm suggestcommand- zsh plugin with precmd hook + suggestion widget
wfm forgetfor dismissing irrelevant patterns
Phase 4: Enrichment
- BIDE closed pattern filter (reduce noise)
- Pattern clustering (edit distance dedup)
- Project/git context inference
- Optional LLM integration (
wfm describe)