marktvogt.de

Author	SHA1	Message	Date
vikingowl	3ddfd87408	feat(ai): migrate to Google Gemini 2.5 Flash-Lite, drop Mistral/Ollama Replace the Mistral + Ollama AI stack with a single Google Gemini provider backed by google.golang.org/genai. API key moves from env/Helm to the DB (AES-256-GCM, key derived from JWT_SECRET via HKDF) so it can be rotated via the admin UI without a pod restart. New: - pkg/crypto/secretbox — AES-256-GCM encrypt/decrypt for secrets at rest - pkg/ai/gemini — GeminiProvider with grounding, structured output, usage recording, and hot-reload (Reinitialize swaps client under mutex) - pkg/ai/usage — UsageRecorder interface + UsageEvent struct - domain/settings/store — DB-backed settings (model, grounding toggle, key) - domain/settings/usage — UsageRepo implementing UsageRecorder; ai_usage table - migrations 000021 (system_settings) + 000022 (ai_usage) - settings API: GET /ai, POST /ai/key, POST /ai/model, POST /ai/grounding, GET /ai/usage - admin UI: 4-card settings page — provider status, model selector, grounding toggle with quota, usage rollups + recent-calls table Removed: - pkg/ai/ollama, mistral_provider, ratelimiter (+ tests) - Helm AI_API_KEY, AI_PROVIDER, AI_MODEL_COMPLEX, AI_AGENT_DISCOVERY, AI_RATE_LIMIT_RPS env vars Call sites set Grounded+CallType: research (true/"research"), enrich Pass B (true/"enrich_b"), similarity (false/"similarity"). Integration test updated to use a stub ai.Provider instead of a fake Ollama HTTP server.	2026-04-25 09:54:49 +02:00
vikingowl	24e072b63d	feat(ai): pluggable provider interface, Ollama + Mistral impls, migrate Pass2 sites Replaces the Mistral-only ai.Client with an ai.Provider interface backed by Ollama and Mistral implementations. Migrates enrichment + similarity callers to ai.Provider.Chat. Research endpoint returns 501 until commit 2 reinstates it on the new orchestrator.	2026-04-24 16:35:18 +02:00
vikingowl	126cc58cbf	refactor(discovery-eval): share JSON helpers, trim narration, tighten signatures - Extract readJSONFile + writeJSONAtomic in cache.go; category cache reuses them (saveCategoryCache is one line, loadCategoryCache uses the standard load-or-empty shape). - Drop dead errMsg param from scoreCategoryResult (always ""). - Wrap writeCategoryReport errors with context for consistency. - Wrap runSimilarityMode / runCategoryMode's 5 per-mode flags into an evalConfig struct so params don't drift. - Promote validModes to a package-level var. - Remove redundant cache = new...() fallback after load* (both load helpers already return a non-nil empty cache on error). - Strip narrating / diff-referencing comments per CLAUDE.md; keep the one genuine WHY on normalizeCategory (divergence from normalize.Name). Net -54 lines across 4 files; go build + go vet + tests green.	2026-04-24 12:59:06 +02:00
vikingowl	88d0ae9d96	feat(discovery): category eval mode for the LLM enricher Ship 2 MR 5b. Extends discovery-eval with a second mode that grades MistralLLMEnricher's category output against labelled ground truth. Accuracy + per-label confusion matrix so mix-ups between similar categories (mittelaltermarkt vs ritterfest, weihnachtsmarkt vs kirchweih) are visible at a glance. Usage: -mode similarity — existing MR 5 path, unchanged. -mode category — new: scrapes quellen URLs, asks LLM for {category, opening_hours, description}, scores category only. Structure - main.go: split into runSimilarityMode + runCategoryMode. Both share ai.Client construction and the ctx timeout (bumped to 15min for category mode since scraping adds I/O). Mode dispatched on -mode flag; unknown modes exit 2. - category.go: fixture / cache / run / metrics / report — parallel to the similarity files, not shared because the data shapes differ enough that generics would add more noise than they save. Cache key is sha256(markt_name_lower\|stadt_lower\|year\|model); separate from SimilarityPairKey since that one takes two rows. - fixtures/category.json: 10 hand-labelled DACH-market rows exercising the categories we expect the LLM to produce — mittelaltermarkt, weihnachtsmarkt, ritterfest, ritterturnier, handwerkermarkt, schlossfest, kirchweih. Each row lists a quelle URL the enricher will scrape live (first run only; cache takes over after). - normalizeCategory: strips casing + German umlauts + the -märkte plural drift so a correctly-categorised row doesn't get scored wrong for cosmetic LLM output variation. Metrics: Accuracy + per-label confusion matrix. Confusion format is `want → predictions` with `!` markers on off-diagonal predictions — readable in a terminal, machine-parseable in the JSON report. Mismatches are listed at the end with want/got pairs so operators can spot prompt failures and patch either the prompt or the fixture. Threshold gate reads accuracy (not F1) — category is multi-class, precision/recall don't have a single-label meaning. Tests: normalisation edge cases (casing, umlaut, plural, trimming), scoring drift tolerance, metrics counts + confusion matrix shape, errors excluded from confusion, cache round-trip + model scoping, missing/corrupt file handling. .gitignore adds .cat-eval-cache.json and cat-eval-report.json. Follow-ups (MR 5c / later): opening_hours and description scoring. Both need fuzzier matching (regex structure vs LLM judge) which is its own design problem.	2026-04-24 12:44:26 +02:00
vikingowl	cf5408ab66	feat(discovery): eval harness for the AI similarity classifier Ship 2 MR 5. Adds a CLI that measures MistralSimilarityClassifier against a labelled fixture: precision, recall, F1, accuracy, plus a confidence calibration table so we can tell whether "90% confident" verdicts are actually right 90% of the time. Usage: go run ./backend/cmd/discovery-eval -fixture ... -cache ... -threshold 0.8 -report eval-report.json. Structure - main.go: arg parsing + wiring (ai.Client, classifier, cache, metrics). The work happens in realMain() which returns an exit code — keeps defers running on error paths. - fixture.go: parses labelled pairs JSON. Fixture authors only need to fill in name/stadt/year; name_normalized falls back to name when omitted. - cache.go: file-backed map keyed by SimilarityPairKey + model string. Symmetric (a,b) == (b,a). Atomic writes (temp file + rename) so a crashed run cannot corrupt the cache. Corrupt-file load returns an empty usable cache and reports the parse error. - run.go: executes each pair through the classifier, populating the cache. Individual classify errors are downgraded to "not correct" and logged — the run always finishes so the operator sees whatever data is available. - metrics.go: confusion matrix, P/R/F1/accuracy, per-confidence- bucket calibration ([0-0.5), [0.5-0.75), [0.75-0.9), [0.9-1.0]). Prints human summary + surfaces highest-confidence mismatches first (most actionable for prompt iteration). Optional JSON report. - Threshold gate: -threshold N exits non-zero when F1<N. Default 0 (gating disabled until we have a baseline F1). Fixture: seeds 15 hand-crafted DACH-market pairs covering the edge cases we actually care about — umlaut drift (Straßburg/Strassburg), year difference on a recurring series, word-reordering, distinct events at the same venue, historical proper names (Striezelmarkt), same city with multiple distinct Christmas markets. Operator extends over time; each pair carries a `note` explaining the case it locks. .gitignore adds .eval-cache.json and eval-report.json — neither should land in the repo. Tests cover metrics edge cases (all correct, imbalanced, no-positive-predictions-no-NaN, calibration bucket assignment, cache accounting, empty input) and cache behaviour (round-trip, symmetric lookup, model-scoped invalidation, missing/corrupt file handling, atomic-write leaves no temp files). Out of scope for MR 5: enrichment field accuracy (fuzzy text scoring is its own problem — tracked for a follow-up), CI wiring (needs a baseline F1 first).	2026-04-24 12:26:18 +02:00

5 Commits