Files

vikingowl 359a107571 docs: add reddit-reader design spec

Architecture, data flow, schema, gRPC API, LLM abstraction,
TUI layout, config/setup, error handling, and testing strategy.

2026-04-03 10:54:30 +02:00

9.8 KiB

Raw Permalink Blame History

Reddit Reader — Design Spec

Overview

A Go TUI application that monitors subreddits for interesting posts, adds them to a reading list, and generates 5-bullet summaries using a local LLM (Ollama/llama.cpp) or Mistral Small 4 as fallback. Runs as a systemd user service for continuous monitoring; the TUI connects on launch.

Architecture

Single Go binary with three subcommands:

reddit-reader serve — monitor daemon + gRPC server
reddit-reader tui — Bubble Tea client, connects via gRPC
reddit-reader setup — interactive first-run wizard

Package Layout

cmd/
  serve.go          — cobra subcommand: starts monitor + gRPC server
  tui.go            — cobra subcommand: launches TUI client
  setup.go          — cobra subcommand: first-run wizard
  root.go           — cobra root command
internal/
  monitor/          — Reddit polling loop, orchestrates filter pipeline
  filter/           — keyword/regex pre-filter + LLM relevance scoring
  llm/              — Summarizer interface, Ollama/llama.cpp + Mistral backends
  store/            — SQLite operations (modernc.org/sqlite, pure Go)
  grpc/
    server/         — gRPC service implementation
    client/         — gRPC client used by TUI
  tui/              — Bubble Tea views and models
  config/           — TOML config parsing, env var overlay, first-run setup
proto/
  redditreader.proto — protobuf service definition

Data Flow

Monitor Loop (runs in `serve`)

every 2min, for each subreddit:
  1. go-reddit fetches /new or /hot listings
  2. Dedup: skip posts already in SQLite (keyed by reddit fullname t3_xxxxx)
  3. Keyword/regex pre-filter: match title/flair against configured patterns (cheap, no API calls)
  4. LLM relevance scoring: "rate 0.0-1.0 how relevant to [interests]" — includes recent feedback as few-shot context
  5. Posts above relevance threshold get 5-bullet summary from LLM
  6. Insert post + summary + score into SQLite
  7. Push to connected TUI clients via gRPC streaming

LLM Call Budget

With 10-25 subreddits polled every 2 minutes, only posts passing the keyword pre-filter reach the LLM. Expected: 5-15 LLM calls per cycle, well within local model throughput and Mistral free-tier limits.

Feedback Loop

User thumbs-up/down votes in TUI are stored in SQLite. Recent feedback examples become few-shot context in the relevance scoring prompt ("posts like X were marked interesting, posts like Y were not"). No fine-tuning — prompt engineering with history.

SQLite Schema

CREATE TABLE subreddits (
    name        TEXT PRIMARY KEY,
    enabled     INTEGER DEFAULT 1,
    poll_sort   TEXT DEFAULT 'new',
    added_at    TEXT DEFAULT (datetime('now'))
);

CREATE TABLE filters (
    id          INTEGER PRIMARY KEY,
    subreddit   TEXT REFERENCES subreddits(name),
    pattern     TEXT NOT NULL,
    is_regex    INTEGER DEFAULT 0
);

CREATE TABLE posts (
    id          TEXT PRIMARY KEY,    -- reddit fullname t3_xxxxx
    subreddit   TEXT NOT NULL,
    title       TEXT NOT NULL,
    author      TEXT,
    url         TEXT,
    selftext    TEXT,
    score       INTEGER,
    created_utc TEXT,
    fetched_at  TEXT DEFAULT (datetime('now')),
    relevance   REAL,
    summary     TEXT,
    read        INTEGER DEFAULT 0,
    starred     INTEGER DEFAULT 0,
    dismissed   INTEGER DEFAULT 0
);

CREATE TABLE feedback (
    id          INTEGER PRIMARY KEY,
    post_id     TEXT REFERENCES posts(id),
    vote        INTEGER NOT NULL,   -- +1 interesting, -1 not
    created_at  TEXT DEFAULT (datetime('now'))
);

gRPC Service

service RedditReader {
  rpc StreamPosts(StreamRequest) returns (stream Post);
  rpc ListPosts(ListRequest)     returns (ListResponse);
  rpc UpdatePost(UpdateRequest)  returns (Post);
  rpc SubmitFeedback(FeedbackRequest) returns (FeedbackResponse);
  rpc ListSubreddits(Empty)              returns (SubredditList);
  rpc AddSubreddit(AddSubredditRequest)  returns (Subreddit);
  rpc RemoveSubreddit(RemoveRequest)     returns (Empty);
  rpc UpdateFilters(FilterRequest)       returns (FilterResponse);
  rpc Status(Empty) returns (StatusResponse);
}

StreamPosts: server-side stream, TUI subscribes on launch for real-time pushes
ListPosts: supports filtering by subreddit, read/unread, starred, date range
All mutations go through gRPC — single writer to SQLite, no lock contention
Socket path: $XDG_RUNTIME_DIR/reddit-reader.sock (fallback /tmp/reddit-reader.sock)

LLM Abstraction

type Summarizer interface {
    Score(ctx context.Context, post Post, interests Interests) (float64, error)
    Summarize(ctx context.Context, post Post) (string, error)
}

Backends

Backend	Connection	When Used
Ollama	OpenAI-compatible HTTP at `localhost:11434`	Default — setup probes for it
llama.cpp server	OpenAI-compatible HTTP at configurable port	Alternative local
Mistral API	`somegit.dev/vikingowl/mistral-go-sdk`	Fallback when no local model available

Ollama and llama.cpp share one implementation (same OpenAI-compatible API, different base URLs). Mistral uses the dedicated SDK.

Backend Selection (in `setup`)

Probe localhost:11434 — if Ollama responds, use it, ask which model (default mistral-small)
Probe configurable llama.cpp endpoint if set
Fall back to Mistral API — prompt for API key
Store choice in config, overridable via env vars

Relevance Prompt Includes

User's declared interests (from config)
Last N feedback examples as few-shot context (from SQLite)
Post title + first ~500 chars of selftext

TUI

Built with Bubble Tea + Lip Gloss.

Views

Reading List — default view, scrollable post list sorted by relevance, unread first
Starred — favorited posts
Archive — dismissed and read posts
Settings — manage subreddits, keywords, LLM backend, relevance threshold (via gRPC)

Post List

* unread / o read indicators
Shows subreddit, relevance score, relative time
Enter expands to show 5-bullet summary in detail pane

Keybindings

j/k navigate, g/G top/bottom
enter expand/collapse summary
s star, d dismiss
o open in browser
+/- vote on relevance
/ filter, ? help
tab switch views

On Launch

Connect to gRPC Unix socket
If connection fails and socket activation is configured, systemd starts daemon
ListPosts populates initial view
Subscribe to StreamPosts for live updates

Configuration

Config File

~/.config/reddit-reader/config.toml

[reddit]
client_id = ""
client_secret = ""
username = ""
password = ""

[llm]
backend = "ollama"
endpoint = "localhost:11434"
model = "mistral-small"
api_key = ""
relevance_threshold = 0.6

[interests]
description = ""  # free-text, e.g. "Go programming, NixOS, systems programming, Linux kernel"

[monitor]
poll_interval = "2m"
max_posts_per_poll = 25

[grpc]
socket = "$XDG_RUNTIME_DIR/reddit-reader.sock"

Env var overrides: REDDIT_READER_REDDIT_CLIENT_ID, REDDIT_READER_LLM_API_KEY, etc.

First-Run Setup (`reddit-reader setup`)

Interactive terminal wizard:

Reddit OAuth — walk through creating a script app, prompt for credentials
LLM backend — probe local, let user pick or enter Mistral key
Subreddits — add initial subreddits with keyword filters
Interests — free-text description for relevance prompts
Validate — test Reddit auth, test LLM responds, create SQLite DB
Systemd — optionally write and enable service + socket units

Systemd Units

`reddit-reader.service`

[Unit]
Description=Reddit Reader Monitor
After=network-online.target

[Service]
Type=simple
ExecStart=%h/.local/bin/reddit-reader serve
Restart=on-failure

[Install]
WantedBy=default.target

`reddit-reader.socket`

[Unit]
Description=Reddit Reader Socket

[Socket]
ListenStream=%t/reddit-reader.sock

[Install]
WantedBy=sockets.target

Daemon is manually activated (systemctl --user start reddit-reader.service). Socket is always enabled — systemd starts the daemon on first TUI connection if it's not already running.

Error Handling

Reddit API failures: exponential backoff per subreddit, log warnings. After 5 consecutive failures, disable subreddit and notify TUI via gRPC stream.
LLM unavailable: store posts with relevance = NULL, summary = NULL. Retry on next cycle. TUI shows "pending summary" state.
SQLite write errors: fatal for daemon. Fail fast, let systemd restart.
gRPC connection lost: TUI shows disconnected state, retries with backoff, resyncs via ListPosts on reconnect.
Config missing/invalid: serve and tui check on startup, point to reddit-reader setup.

Testing Strategy

Unit tests: filter pipeline (keyword, regex), config parsing, LLM prompt construction, SQLite store operations (in-memory SQLite)
Integration tests: gRPC server/client round-trips with real SQLite, monitor loop with mocked Reddit API responses
No mocking of SQLite — use real in-memory databases
TDD: tests first for store operations, filter logic, gRPC service methods
Interfaces for boundaries: Summarizer, Reddit client, store — mock only at system boundaries

Dependencies

Package	Purpose
`github.com/vartanbeno/go-reddit/v2`	Reddit API client
`somegit.dev/vikingowl/mistral-go-sdk`	Mistral API backend
`modernc.org/sqlite`	Pure-Go SQLite
`github.com/charmbracelet/bubbletea`	TUI framework
`github.com/charmbracelet/lipgloss`	TUI styling
`github.com/spf13/cobra`	CLI subcommands
`github.com/pelletier/go-toml/v2`	Config parsing
`google.golang.org/grpc`	gRPC
`google.golang.org/protobuf`	Protobuf codegen

Go Version

Go 1.26.1 — use range-over-func, iterator patterns, and other 1.25/1.26 features where appropriate.

9.8 KiB Raw Permalink Blame History