marktvogt.de

Author	SHA1	Message	Date
vikingowl	67b2eb5d74	feat(market): in-backend research orchestrator with SearxNG + schema-validated LLM Adds pkg/search (SearxNG impl), domain/market/research (orchestrator + embedded German prompt and JSON schema), and reinstates POST /markets/:id/research on top of the new pipeline. Seeds URLs from crawler provenance; falls back to search when fewer than two distinct seed domains are known.	2026-04-24 17:06:04 +02:00
vikingowl	24e072b63d	feat(ai): pluggable provider interface, Ollama + Mistral impls, migrate Pass2 sites Replaces the Mistral-only ai.Client with an ai.Provider interface backed by Ollama and Mistral implementations. Migrates enrichment + similarity callers to ai.Provider.Chat. Research endpoint returns 501 until commit 2 reinstates it on the new orchestrator.	2026-04-24 16:35:18 +02:00
vikingowl	2adb4882c7	docs(planning): add implementation plan for pluggable AI provider migration	2026-04-24 15:43:12 +02:00
vikingowl	020f4069b5	docs(planning): add spec for pluggable AI provider and local research orchestrator	2026-04-24 15:31:47 +02:00
Christian Nachtigall	7a2e81c8c9	Merge branch 'feat/reverse-geocode' into 'main' feat(market): reverse geocoding See merge request vikingowl/marktvogt.de!23	2026-04-24 13:01:00 +00:00
vikingowl	c9a2f8622f	feat(market): reverse geocoding — lat/lng to address Complements the existing forward geocoder with Nominatim's /reverse endpoint so the admin edit form can populate the address from coordinates (useful when a crawl gave us lat/lng but no street, e.g. after running crawl-enrich). Backend: - geocode.Reverse(ctx, lat, lng) hits Nominatim /reverse with addressdetails=1 and accept-language=de, reuses the 1 rps mutex already guarding forward calls. Falls through city → town → village → municipality → hamlet for small places. Returns nil when Nominatim has no match so callers can distinguish "no hit" from "all-empty address." - New DTOs ReverseGeocodeRequest/Response. - GeocodeHandler.ReverseGeocode wired at POST /reverse-geocode behind the same geocodeLimit middleware as /geocode. Frontend: - /api/reverse-geocode SvelteKit proxy mirrors /api/geocode. - MarketForm gets a second button next to "Koordinaten aus Adresse ermitteln" — "Adresse aus Koordinaten ermitteln". Writes non-empty street/city/zip back into the form; empty result surfaces "Keine Adresse gefunden."	2026-04-24 15:00:23 +02:00
Christian Nachtigall	a250fddbc2	Merge branch 'fix/discovery-remove-auto-research' into 'main' fix(discovery): stop auto-firing research on Accept See merge request vikingowl/marktvogt.de!22	2026-04-24 12:56:43 +00:00
vikingowl	38834c56a3	fix(discovery): stop auto-firing research on Accept Accepting a row triggered a background POST to /admin/markets/<edition>/research. The intent was to "warm up" the edit page, but the result was discarded (fire-and-forget), the edit page only renders research from its own form action, and the backend's 5-minute-per-market cooldown still got set — so the operator's first manual "Mit KI recherchieren" click hit "Bitte warte 5 Minuten zwischen Recherche-Aufrufen" instead. Removes the auto-fire. Research runs on user click. If we want prefetched suggestions later, that needs server-side caching + a load-time fetch, not fire-and-forget.	2026-04-24 14:56:12 +02:00
Christian Nachtigall	b6d7ebd2b1	Merge branch 'fix/discovery-enrich-timeout' into 'main' fix(discovery): enrich-all timeout + partial progress See merge request vikingowl/marktvogt.de!21	2026-04-24 12:12:35 +00:00
vikingowl	9cbe654d55	fix(discovery): raise enrich-all timeout + surface partial progress Pain: a 1400+ row pending queue can't finish crawl-enrich inside the old 10-minute cap (Nominatim's 1 rps means ~23m minimum). Operators saw a scary red "Crawl-enrich fehlgeschlagen: context deadline exceeded" banner even though the pipeline is resumable. - Introduce enrichAllTimeout constant (45m) sized for ~2700 rows per press; the original 10m assumed 600 rows worst-case. - On context.DeadlineExceeded, translate to a user-facing message ("Zeitlimit erreicht nach N von M Zeilen. Erneut starten, um die verbleibenden Zeilen zu bearbeiten.") instead of raw Go error. - Always stash the summary in handler state, even on error, so the UI can show partial progress (N/M processed) alongside the message. - Service: populate DurationMs on early-return too, so the status endpoint's duration reflects the partial run instead of zero. Behavior unchanged when a run finishes cleanly; the queue remains resumable across presses as before.	2026-04-24 14:11:38 +02:00
Christian Nachtigall	950d01e3d4	Merge branch 'fix/discovery-accept-redirect-path' into 'main' fix(discovery): Accept redirect 404 See merge request vikingowl/marktvogt.de!20	2026-04-24 11:59:53 +00:00
vikingowl	20055acd2e	fix(discovery): correct redirect path after Accept The accept action redirected to /admin/maerkte/<id>/edit, but the route is /admin/maerkte/[id]/bearbeiten — every other admin link uses the German segment. Reviewers hit a 404 after every Accept.	2026-04-24 13:59:25 +02:00
Christian Nachtigall	8528af8492	Merge branch 'feat/discovery-public-preview' into 'main' Discovery preview modal See merge request vikingowl/marktvogt.de!19	2026-04-24 11:51:20 +00:00
vikingowl	2c0154e4ce	feat(web): discovery preview modal Adds a Vorschau button to the detail drawer header that opens a full-width modal showing an approximate public /markt/[slug] layout for the candidate row. Lets reviewers sanity-check the user-facing result before clicking Accept. - DiscoveryPreview.svelte: renders title, date range, venue/PLZ/city location line, organizer, description, opening hours, website link and a Leaflet map pin (if lat/lng present). Banner calls out which fields (street, admission prices, title image) will come later from the organizer so the preview's gaps are not mistaken for bugs. - DetailDrawer.svelte: adds previewOpen state, an eye-icon Vorschau button next to Accept/Reject, and an overlay at z-60 over the drawer. Backdrop click or ✕ closes the preview without closing the drawer.	2026-04-24 13:50:43 +02:00
vikingowl	c94289a758	chore: ignore local .claude tool state	2026-04-24 13:50:32 +02:00
Christian Nachtigall	ef89cc283e	Merge branch 'chore/discovery-drawer-polish-and-rank-fix' into 'main' Discovery drawer polish + rank fix See merge request vikingowl/marktvogt.de!18	2026-04-24 11:41:03 +00:00
vikingowl	a2dffcb112	fix(discovery): sort source_contributions by rank on read MergePendingSources re-aggregates the jsonb array with ORDER BY source_name for DB determinism, but the admin UI treats index 0 as "Rang 1 = winning source." Legacy auto-merged rows were therefore surfacing mittelalterkalender (alphabetically first) as Rang 1 instead of the actual rank-1 source mittelaltermarkt_online. - Export crawler.SourceRank (was unexported rankOf) so other packages in the discovery domain can reference the canonical rank map. - scanDiscoveredMarket: sort.SliceStable SourceContributions by rank after unmarshal. Every read path now sees contributions in rank order regardless of how they were persisted; legacy rows self-correct on next read, no migration needed.	2026-04-24 13:38:23 +02:00
vikingowl	0d2c9c0f7f	fix(web): silence svelte 5 warnings, add missing enrichment proxy - Wrap $state initializers that read props (MarketForm, ResearchPanel, maerkte +page) in untrack() so Svelte 5 stops warning about state_referenced_locally. Intent stays "take an initial snapshot of the prop" — the warning existed to make that intent explicit. - Add enrichment/crawl-all-status/+server.ts proxy route; the admin discovery page was polling this path and getting 404s in a tight loop because the equivalent SvelteKit proxy only existed for the plain /crawl-status endpoint.	2026-04-24 13:38:03 +02:00
vikingowl	2fdd8e8222	feat(web): polish discovery admin page and drawer Discovery drawer - Wrap each section in a rounded card so boundaries are visible without parsing the uppercase headers. - Header: N Quellen and enrichment_status become consistent pills, matching the existing konfidenz pill treatment. - Enrichment: replace the inline "(llm)"/"(crawl)" trailing text with a color-coded badge on the label side (purple = llm, sky = crawl). - Empty enrichment state now tells the operator how to trigger it. - Audit timestamp uses a local-time helper so the displayed time matches the browser timezone (was UTC-as-local). - Quellen list: prefix each URL with its hostname for scannability; long URLs truncated with full URL in the title attribute. ContributionsPanel - Amber border/background now only on conflict rows; every row previously got border-amber-100 unconditionally, which diluted the conflict signal. Rang 1 badge flipped to emerald so it reads as a positive "winner" marker, not a warning. Discovery page - Remove dead dateInputValue() function and the stale a11y_click_events_have_key_events suppression — both flagged by eslint after earlier refactors. - Render crawl/enrich timestamps in the browser's local timezone via a new fmtLocalStamp helper; the previous .slice(0,16).replace('T',' ') treated the ISO UTC string as if it were local time.	2026-04-24 13:37:39 +02:00
Christian Nachtigall	fcebc37bcb	Merge branch 'fix/discovery-route-param-collision' into 'main' fix(discovery): route param name collision in ClassifySimilarPair See merge request vikingowl/marktvogt.de!17	2026-04-24 11:08:50 +00:00
vikingowl	c69fe4c07d	fix(discovery): route param name collision in ClassifySimilarPair gin panics at startup with: ':aid' in new path '/api/v1/admin/discovery/queue/:aid/similar/:bid/classify' conflicts with existing wildcard ':id' in existing prefix '/api/v1/admin/discovery/queue/:id' Gin's trie requires identical parameter names at the same prefix position. All sibling routes use :id; the tiebreak route was registered with :aid, crashing the server on every deploy since `e0b73ac`. Prod has been running the pre-tiebreak image (`52f3e4c0`) the whole time because every Helm upgrade crash-looped and rolled back. Rename :aid to :id in both the route and the handler's c.Param read. :bid is in a different slot and stays.	2026-04-24 13:06:08 +02:00
Christian Nachtigall	24675cf176	Merge branch 'refactor/discovery-eval-simplify' into 'main' refactor(discovery-eval): share JSON helpers, trim narration, tighten signatures See merge request vikingowl/marktvogt.de!16	2026-04-24 11:00:03 +00:00
vikingowl	126cc58cbf	refactor(discovery-eval): share JSON helpers, trim narration, tighten signatures - Extract readJSONFile + writeJSONAtomic in cache.go; category cache reuses them (saveCategoryCache is one line, loadCategoryCache uses the standard load-or-empty shape). - Drop dead errMsg param from scoreCategoryResult (always ""). - Wrap writeCategoryReport errors with context for consistency. - Wrap runSimilarityMode / runCategoryMode's 5 per-mode flags into an evalConfig struct so params don't drift. - Promote validModes to a package-level var. - Remove redundant cache = new...() fallback after load* (both load helpers already return a non-nil empty cache on error). - Strip narrating / diff-referencing comments per CLAUDE.md; keep the one genuine WHY on normalizeCategory (divergence from normalize.Name). Net -54 lines across 4 files; go build + go vet + tests green.	2026-04-24 12:59:06 +02:00
vikingowl	95d5eabdb5	Merge branch 'feat/discovery-enrichment-eval' — MR 5b category eval mode for LLM enricher	2026-04-24 12:44:42 +02:00
vikingowl	88d0ae9d96	feat(discovery): category eval mode for the LLM enricher Ship 2 MR 5b. Extends discovery-eval with a second mode that grades MistralLLMEnricher's category output against labelled ground truth. Accuracy + per-label confusion matrix so mix-ups between similar categories (mittelaltermarkt vs ritterfest, weihnachtsmarkt vs kirchweih) are visible at a glance. Usage: -mode similarity — existing MR 5 path, unchanged. -mode category — new: scrapes quellen URLs, asks LLM for {category, opening_hours, description}, scores category only. Structure - main.go: split into runSimilarityMode + runCategoryMode. Both share ai.Client construction and the ctx timeout (bumped to 15min for category mode since scraping adds I/O). Mode dispatched on -mode flag; unknown modes exit 2. - category.go: fixture / cache / run / metrics / report — parallel to the similarity files, not shared because the data shapes differ enough that generics would add more noise than they save. Cache key is sha256(markt_name_lower\|stadt_lower\|year\|model); separate from SimilarityPairKey since that one takes two rows. - fixtures/category.json: 10 hand-labelled DACH-market rows exercising the categories we expect the LLM to produce — mittelaltermarkt, weihnachtsmarkt, ritterfest, ritterturnier, handwerkermarkt, schlossfest, kirchweih. Each row lists a quelle URL the enricher will scrape live (first run only; cache takes over after). - normalizeCategory: strips casing + German umlauts + the -märkte plural drift so a correctly-categorised row doesn't get scored wrong for cosmetic LLM output variation. Metrics: Accuracy + per-label confusion matrix. Confusion format is `want → predictions` with `!` markers on off-diagonal predictions — readable in a terminal, machine-parseable in the JSON report. Mismatches are listed at the end with want/got pairs so operators can spot prompt failures and patch either the prompt or the fixture. Threshold gate reads accuracy (not F1) — category is multi-class, precision/recall don't have a single-label meaning. Tests: normalisation edge cases (casing, umlaut, plural, trimming), scoring drift tolerance, metrics counts + confusion matrix shape, errors excluded from confusion, cache round-trip + model scoping, missing/corrupt file handling. .gitignore adds .cat-eval-cache.json and cat-eval-report.json. Follow-ups (MR 5c / later): opening_hours and description scoring. Both need fuzzier matching (regex structure vs LLM judge) which is its own design problem.	2026-04-24 12:44:26 +02:00
vikingowl	169fa1b3c4	Merge branch 'feat/discovery-keyboard-shortcuts' — MR 8 keyboard shortcuts for queue review	2026-04-24 12:40:22 +02:00
vikingowl	ef6e1def3d	feat(discovery): keyboard shortcuts for queue review Ship 2 MR 8. Operator-productivity layer on top of the detail drawer: j/k to walk rows, Enter to open, a/r to accept-reject the selection, e/s to jump into the drawer with AI enrich / Similar already visible, ? for a help modal listing everything. Escape closes the drawer (or the help modal if it's open). Implementation - selectedId $state drives a subtle indigo ring on the highlighted row. Follows drawerId when the drawer opens so Esc → j leaves you on the same row. Auto-resets to queue[0] if the selected row scrolls off the page (pagination / refresh). - Global <svelte:window onkeydown> listener. isTypingTarget() bails out when focus is inside an input/textarea/select/contenteditable so typing in the drawer's edit form doesn't trigger shortcuts. Cmd/Ctrl/Alt combos also skipped so browser shortcuts stay intact. - selectRelative() updates selectedId + scrolls the row into view (block: 'nearest') so keyboard-driven scanning through a long queue keeps the highlight visible. - submitRowAction() builds + submits a hidden <form> for a/r so the SvelteKit action pipeline (invalidations, form result propagation) runs the same way a button click would. Decisions baked in - 'e' (AI enrich) and 's' (Similar) open the drawer rather than firing the LLM call directly. LLM calls cost money; keeping the UI explicit avoids hidden side effects from a misclick. - Persistent '?' button bottom-right for discoverability — operators shouldn't have to read docs to find the help. - Modal uses click-outside-to-dismiss + Esc + ✕ button, all three. No backend changes. Frontend-only.	2026-04-24 12:40:12 +02:00
vikingowl	3516999345	Merge branch 'feat/discovery-detail-drawer' — MR 6 detail drawer replaces inline panels	2026-04-24 12:37:47 +02:00
vikingowl	5476578373	feat(discovery): per-row detail drawer replaces inline panels Ship 2 MR 6. Consolidates every market-specific action that used to expand into the queue table into a single side drawer. Queue rows keep Accept/Reject for fast-path review; clicking anywhere else on a row opens the drawer with the full context. State via URL param ?drawer=<id>. F5 preserves the open row; links like /admin/discovery?drawer=<uuid>&sort=konfidenz are shareable and compose with existing pagination/sort state. DetailDrawer.svelte (new) sections: - Header: name, konfidenz, source count, Accept/Reject, close (✕) - Identity: editable form (name, stadt, bundesland, start/end, website) - Enrichment: full payload with per-field provenance tags + AI enrich button; "Noch keine Enrichment-Daten" empty state - Quellen: URL list (link-out) - Quellen-Vergleich: per-source contribution diff (reuses ContributionsPanel) — only rendered when >=2 sources - Similar: candidates loaded lazily on drawer open; AI? tiebreak button per candidate shows ✓ same / ✗ diff chips with LLM reason - Audit: discovered_at, agent_status, hinweis +page.svelte: removed the three inline <tr> panels (Similar, Quellen-Vergleich, expanded) and their associated state (expandedId, similarOpenId, quellenVergleichOpenId, similarLoading, similarEntries, similarVerdicts, similarClassifying, toggleSimilar, classifySimilar, toggleQuellenVergleich). Row actions collapsed from 5 buttons (Accept/Reject/Similar/AI/Quellen-Vergleich) to 2 (Accept/Reject). The chevron glyph stays as a visual affordance but is inert — the whole row is clickable. Buttons/forms/links inside the row stop propagation via a closest()-based guard so fast-path Accept/Reject don't accidentally open the drawer. No backend changes; the drawer consumes existing queue data + existing endpoints (similar, similar/classify, enrich). Follow-ups: MR 8 adds keyboard shortcuts that naturally compose with the drawer (j/k navigation, Enter opens, Esc closes).	2026-04-24 12:37:38 +02:00
vikingowl	6218710453	Merge branch 'feat/discovery-eval-harness' — MR 5 eval harness for AI similarity classifier	2026-04-24 12:31:11 +02:00
vikingowl	cf5408ab66	feat(discovery): eval harness for the AI similarity classifier Ship 2 MR 5. Adds a CLI that measures MistralSimilarityClassifier against a labelled fixture: precision, recall, F1, accuracy, plus a confidence calibration table so we can tell whether "90% confident" verdicts are actually right 90% of the time. Usage: go run ./backend/cmd/discovery-eval -fixture ... -cache ... -threshold 0.8 -report eval-report.json. Structure - main.go: arg parsing + wiring (ai.Client, classifier, cache, metrics). The work happens in realMain() which returns an exit code — keeps defers running on error paths. - fixture.go: parses labelled pairs JSON. Fixture authors only need to fill in name/stadt/year; name_normalized falls back to name when omitted. - cache.go: file-backed map keyed by SimilarityPairKey + model string. Symmetric (a,b) == (b,a). Atomic writes (temp file + rename) so a crashed run cannot corrupt the cache. Corrupt-file load returns an empty usable cache and reports the parse error. - run.go: executes each pair through the classifier, populating the cache. Individual classify errors are downgraded to "not correct" and logged — the run always finishes so the operator sees whatever data is available. - metrics.go: confusion matrix, P/R/F1/accuracy, per-confidence- bucket calibration ([0-0.5), [0.5-0.75), [0.75-0.9), [0.9-1.0]). Prints human summary + surfaces highest-confidence mismatches first (most actionable for prompt iteration). Optional JSON report. - Threshold gate: -threshold N exits non-zero when F1<N. Default 0 (gating disabled until we have a baseline F1). Fixture: seeds 15 hand-crafted DACH-market pairs covering the edge cases we actually care about — umlaut drift (Straßburg/Strassburg), year difference on a recurring series, word-reordering, distinct events at the same venue, historical proper names (Striezelmarkt), same city with multiple distinct Christmas markets. Operator extends over time; each pair carries a `note` explaining the case it locks. .gitignore adds .eval-cache.json and eval-report.json — neither should land in the repo. Tests cover metrics edge cases (all correct, imbalanced, no-positive-predictions-no-NaN, calibration bucket assignment, cache accounting, empty input) and cache behaviour (round-trip, symmetric lookup, model-scoped invalidation, missing/corrupt file handling, atomic-write leaves no temp files). Out of scope for MR 5: enrichment field accuracy (fuzzy text scoring is its own problem — tracked for a follow-up), CI wiring (needs a baseline F1 first).	2026-04-24 12:26:18 +02:00
vikingowl	525a20b79c	Merge branch 'feat/discovery-auto-merge-crawl' — Ship 2 MRs 2–7 (enrichment foundation, crawl-enrich, LLM enrich, AI similarity, auto-merge) Brings the full Ship 2 feature stack (except the eval harness and detail drawer) into main. Conflicts resolved: - repository.go: kept MR 1's sort params + queueOrderByClause builder on ListQueue, AND MR 7's FindPendingMatch + MergePendingSources (MR 7 removed the old QueueHasPending). ListQueue SELECT keeps the enrichment columns MR 2 added. - mock_repo_test.go: kept both MR 1's listQueueCalls capture and the MR 2-4 enrichment/similarity hooks. - service_test.go: ListPendingQueuePaged uses MR 1's sort-param signature; NewService uses the MR 2-7 seven-arg form. - handler_test.go: TestListQueueSortParamWhitelist's NewService call bumped from 4 args to 7 (nil geocoder, nil llm enricher, nil sim classifier). Features landing on main: - MR 2: enrichment schema (migration 000019), jsonb payload, enrich package with Merge/CacheKey/NoopLLMEnricher. - MR 3: manual crawl-enrich-all button + async 202 status endpoint. - MR 3b: per-row LLM enrich via scrape-then-prompt (pkg/scrape + MistralLLMEnricher). - MR 4: AI similarity tiebreak (migration 000020), MistralSimilarityClassifier, per-candidate AI? button in the Similar panel. - MR 7: cross-crawl auto-merge for new sources on pending queue rows (FindPendingMatch + MergePendingSources, AutoMerged counter).	2026-04-24 12:13:30 +02:00
vikingowl	28202c71df	Merge branch 'feat/discovery-queue-sort' — MR 1 sortable queue, default konfidenz desc	2026-04-24 12:05:16 +02:00
vikingowl	c06788a63d	feat(discovery): auto-merge queue rows across crawl runs Ship 2 MR 7. Replaces the "drop on duplicate" branch of the crawl loop with a cross-run auto-merge: when a new crawl brings a source that a pending queue row doesn't yet carry, the new source's data merges into the existing row instead of spawning a second entry. Operator review burden stays bounded to one row per market even as coverage grows across sources. Konfidenz upgrades come for free: a row that starts with one source at konfidenz=mittel flips to hoch the moment a second independent source confirms the same (name, city, start_date) triple. Repo changes - QueueHasPending (bool) replaced by FindPendingMatch returning *DiscoveredMarket. Same exact-tuple lookup; now callers see the full match so they can merge. - MergePendingSources appends new sources/quellen/contributions onto a pending row using set-union semantics. source_contributions dedupe by SourceName so repeat crawls don't stack duplicate entries. Konfidenz and hinweis are overwritten with caller-computed values. - Idempotent: send the same delta twice, nothing changes the second time. Service.Crawl flow - On match + incoming source already on the row -> DedupedQueue. Same semantic as before, just more tightly scoped (same source re-emits an event; previously any match counted as dedup). - On match + incoming source not yet on the row -> auto-merge path: compute the source/quellen/contribution delta, call MergePendingSources, count in summary.AutoMerged. - The crawlerKonfidenz helper is now a thin wrapper over a shared konfidenzForSources(sources []string), reused by the merge path. Source-name constants extracted to un-hardcode the switch cases and the test references. Summary + UI - CrawlSummary gains AutoMerged int. Logged alongside the other counters. - +page.svelte crawl-result grid gets an "Auto-merged" tile. Tests - Same-source redundant pickup -> DedupedQueue=1, no MergePendingSources call, no insert. - New-source auto-merge -> AutoMerged=1, MergePendingSources called with exact delta (addSources=[new only], addQuellen=[new only], addContribs labelled with new source_name), konfidenz upgraded to hoch. - Existing TestServiceCrawlDedupQueue renamed to TestServiceCrawlDedupQueue_SameSourceRedundant reflecting the tightened semantic. No migration — existing text[] and jsonb columns support the union operations via SQL.	2026-04-24 12:01:01 +02:00
vikingowl	e0b73acfd6	feat(discovery): AI tiebreak for ambiguous similarity matches Ship 2 MR 4. Adds per-pair AI-backed classification for operator use inside the existing Similar panel: an "AI?" button next to each candidate asks Mistral whether the two queue rows refer to the same underlying market. Result shown inline as a green "✓ same N%" or grey "✗ diff N%" chip with the LLM's reason on hover. No scraping — the classifier works from (name, city, year) alone, which is enough for the common cases (same venue on two calendars, typos, cross-year recurrence). Call is short (usually <3s) so the handler is synchronous, 15s deadline. Caching - Migration 000020 adds similarity_ai_cache keyed on a content hash over (normalized_name\|stadt\|year) for both rows, sorted for symmetry. Survives queue row accept/reject because the hash is about markt-content, not queue-row lifecycle. - enrich.SimilarityPairKey computes the key. Classify(a,b) and Classify(b,a) hit the same entry. Stadt casing drift doesn't invalidate. - Repo methods GetSimilarityCache / SetSimilarityCache + corresponding mock hooks. DefaultSimilarityCacheTTL=30d. Mistral integration - enrich.MistralSimilarityClassifier reuses the same aiPass2 interface as the enricher. English system prompt asks for JSON-only output with {same_market, confidence 0..1, reason}. Confidence clamped to [0,1] because models occasionally return 1.2 or -0.1. Reason is short German justification. - NoopSimilarityClassifier returns an error — callers must check ai.Enabled() before deciding which binding to pass. Service.ClassifySimilarPair loads both rows, computes pair key, cache-first, calls classifier on miss, writes cache, returns verdict. Rejects self-comparison (pair-key collapses). Handler POST /admin/discovery/queue/:aid/similar/:bid/classify. UI: new AI? column inside the Similar panel. Per-candidate pending state via Set<string>, disabled button while in-flight, inline verdict chip after response. Tooltip shows the LLM's reason. Tests: pair-key symmetry + differentiation + casing tolerance; Mistral classifier happy path, clamping edge cases, error propagation, bad-JSON handling, Noop rejection. Service tests: happy path writes cache, cache-hit skips LLM, self-comparison rejected, classifier errors don't poison the cache. NewService signature grows by one param (sim enrich. SimilarityClassifier). All 14 existing callers (routes.go + tests) updated; tests pass nil.	2026-04-24 11:04:15 +02:00
vikingowl	ce32f76731	feat(discovery): per-row LLM enrichment via scrape-then-prompt Completes the manual two-pass enrichment flow: the crawl-enrich-all button (MR 3) fills deterministic fields across the queue; this MR adds a per-row "AI" button that scrapes the row's quellen URLs and asks Mistral to fill category, opening_hours, description. Flow per click: 1. Load row, compute CacheKey(name_normalized, stadt, year). 2. Cache hit -> skip LLM, merge cached payload onto current crawl-enrich base, persist, return. 3. Miss -> scrape up to 5 quellen URLs via pkg/scrape (goquery text extraction, 4000-char truncation), concatenate into labeled blocks, call ai.Client.Pass2 with JSON response format. 4. Parse response into Enrichment{category, opening_hours, description}, stamp provenance=llm + model + token counts. 5. Cache the raw LLM payload (not the merged one) under the tuple key with DefaultCacheTTL=30d, so later re-crawls can layer new crawl-enrich bases on the same cached answer. 6. Merge(crawl, llm) -- crawl fields survive. Persist via SetEnrichment(status=done). Return merged to the operator. ErrNoScrapedContent fails fast when zero URLs return usable text; LLMs without grounding hallucinate, and a 400-style operator error is better than inventing details. Individual scrape failures don't halt the flow as long as at least one source succeeds. pkg/scrape (new, reusable) - Client.Fetch: HTTP GET, strip script/style/nav/footer/aside via goquery, gather body text, collapse whitespace, truncate. DefaultTimeout=10s, DefaultMaxChars=4000. User-Agent configurable. - Tests cover noise stripping, whitespace collapsing, truncation, body-less fragments. enrich.MistralLLMEnricher - Takes ai.Client + Scraper (both injectable; tests use stubs). - Prompt: English system instructions asking for JSON-only output with category/opening_hours/description in German. User prompt includes markt identifiers, already-filled fields (so the LLM doesn't waste tokens re-deriving them), and scraped blocks. - Tests: happy path, all-scrapes-fail (-> ErrNoScrapedContent), partial-scrape-success, empty LLM fields yield no provenance, URL cap at 5. Service.RunLLMEnrichOne + handler POST /admin/discovery/queue/:id/ enrich (sync, 30s timeout). NewService gains llm enrich.LLMEnricher param; routes.go constructs a MistralLLMEnricher when ai.Client is enabled, falls back to NoopLLMEnricher otherwise. UI: per-row AI button next to Similar, tracks per-row pending state via a Set<string>, disables the button while the request is in flight and shows "AI..." label. Success invalidates the page, the row's expanded view picks up the new category/opening_hours/ description fields with llm provenance tags. Inline error message on the row if the enrich action fails.	2026-04-24 10:46:28 +02:00
vikingowl	afe9d916d6	feat(discovery): manual crawl-enrich-all button + payload display Replaces the originally-planned async-worker design with operator- triggered bulk runs (see memory/project_ship2_enrichment.md). Crawl- enrichment is cheap enough to always run against the whole list but runs only when the admin clicks — the flow stays predictable and the crawl itself stays fast. Endpoints - POST /admin/discovery/enrichment/crawl-all — 202 + goroutine, mirrors the crawl pattern. Per-process CAS gate prevents concurrent runs. - GET /admin/discovery/enrichment/crawl-all-status — polled shape identical to /crawl-status for UI reuse. Service RunCrawlEnrichAll iterates enrichment_status='pending' rows, builds an enrich.Input from each, runs CrawlEnrich (consolidation + Nominatim geocoding via the shared geocoder), and persists via SetEnrichment(status=done). Per-row errors count toward Failed and append to a bounded Errors slice; the pass never halts. Enrich package refactor - Enrichment, Sources, Provenance constants moved from discovery -> enrich (they are the enrich package's own types; discovery previously held them for historical reasons). - CrawlEnrich now takes a narrow enrich.Input / enrich.Contribution so the enrich package no longer imports the parent discovery package. This breaks the import cycle that appeared once discovery needed to call enrich (the MR 2 structure only worked because no caller went in that direction yet). - LLMEnricher takes an LLMRequest (primitives) instead of a DiscoveredMarket. NoopLLMEnricher updated; real Mistral impl lands in MR 3b. - CacheKey signature switched from (DiscoveredMarket) to primitive (nameNormalized, stadt, year). Service geocoder wiring: discovery.NewService gains a Geocoder param (routes.go passes the shared Nominatim client; the interface lives in discovery to avoid another circular edge with enrich). UI: "Run crawl-enrich" button next to "Run crawl"; identical poll + summary card pattern. Queue row expand shows enrichment status badge plus the PLZ/Venue/Organizer/Lat-Lng fields inline with per-field provenance tag. Tests: three new service tests (happy path, per-row SetEnrichment failure, empty-queue no-op). Existing enrich package tests updated for the primitive input signature. All 13 test NewService call-sites updated for the new geocoder param.	2026-04-24 10:29:58 +02:00
vikingowl	dcbf38f6e9	feat(discovery): enrichment foundation — schema, types, crawl-enrich, cache Lays infrastructure for Ship 2 crawl-time enrichment. Design principles (see memory/project_ship2_enrichment.md): - async worker (not inline in crawl) — MR 3 wires it up - single enrichment jsonb column, not typed columns — shape still in flux - per-row LLM budget, global soft cap logged - crawl-enrich runs first; LLM only fills gaps it cannot reach Migration 000019: adds discovered_markets.enrichment{,_status,_attempts} and enriched_at; partial index on enrichment_status for the worker's claim query; enrichment_cache table keyed by sha256(name\|city\|year). enrich package: - crawl.go — pure consolidator over SourceContributions (PLZ, venue, organizer), first non-empty wins. Optional Geocoder pulls lat/lng via Nominatim; failures are non-fatal. Everything marked provenance=crawl. - llm.go — LLMEnricher interface + NoopLLMEnricher. Real Mistral-backed impl lands in MR 3 along with the worker. - enrich.go — Merge(base, overlay) with base-wins semantics, enforcing the crawl-over-llm invariant at the type level: even a confident LLM pass can't overwrite a crawl-populated field. - cache.go — CacheKey() stable across re-crawls; DefaultCacheTTL=30d. Repository: scan/persist the new columns, GetEnrichmentCache / SetEnrichmentCache / SetEnrichment. The SetEnrichment UPDATE increments attempts server-side and stamps enriched_at only for terminal states (done\|failed) — 'skipped' keeps the previous timestamp. No UI changes and no worker binary yet. Noop LLM enricher in place so MR 3 can wire the worker without refactoring shape.	2026-04-24 09:55:38 +02:00
vikingowl	65027ca9aa	feat(discovery): sortable queue columns, default konfidenz desc Admin queue table gains clickable sort on Markt, Stadt, Datum, Quellen (count), and Konfidenz. Default on page load is konfidenz desc with start_datum ASC NULLS LAST as the within-tier tiebreaker — operators see highest-confidence, soonest-upcoming markets first. URL state (?sort=&order=) is the single source of truth; F5 preserves, localStorage is not used. Backend: ListQueue takes (sortBy, order); repository builds ORDER BY from a closed whitelist — konfidenz uses a CASE rank (hoch=3, mittel=2, niedrig=1), quellen_count uses cardinality(quellen). Handler normalisers reject anything off the whitelist and echo the effective values in meta.sort / meta.order so the UI can render arrows. Unit tests lock the emitted SQL per combination and assert raw input cannot leak into ORDER BY.	2026-04-24 09:38:53 +02:00
vikingowl	52f3e4c009	chore: replace personal emails with contact@marktvogt.de	2026-04-21 10:56:07 +02:00
vikingowl	d6b65501ec	security: redact agent ID from helm values; gitignore superpowers docs Remove Mistral agent ID from agentDiscovery comment in helm values.yaml. Add docs/superpowers/ to .gitignore to prevent re-tracking internal AI plans.	2026-04-21 09:48:32 +02:00
vikingowl	9232203dd3	Merge branch 'chore/access-ttl-and-ship2-handoff' — Ship 2 handoff + TTL bump	2026-04-19 01:06:14 +02:00
vikingowl	b52ac7d861	docs(ship-2): handoff note + chore(helm): bump JWT access TTL 15m to 2h Handoff captures end-of-Ship-1 state and Ship 2 scope (§4.10 expanded product additions: crawl-time enrichment, AI-augmented similarity, inline enrich-before-accept, detail drawer, eval harness, enrichment cache, auto-merge during crawl, keyboard shortcuts). §4.12 tracks the admin auth refresh-on-401 fix; pending that work JWT_ACCESS_TTL bumped from 15m to 2h as interim relief.	2026-04-19 01:05:52 +02:00
vikingowl	95a3dfdef8	Merge branch 'fix/queue-pagination-envelope' — queue UI renders rows again MR 6's backend + MR 7's UI had mismatched envelope assumptions. Backend returned pagination as sibling fields to data; UI's ApiResponse<T> wrapper only typed data, so 'body.data' (the queue) became undefined at runtime.	2026-04-19 00:46:49 +02:00
vikingowl	bddab60686	fix(admin): queue response uses meta envelope; UI reads total from meta MR 6 backend returned {data, total, limit, offset} as siblings but the shared ApiResponse<T> envelope only types the data field. The UI's load function treated queueRes.data as a wrapper and read body.data (undefined) as the row list. Result: empty queue in UI despite 1384 pending rows in the DB. Fix: backend moves total/limit/offset into meta (matches PaginationMeta convention from web/src/lib/api/types.ts). UI casts to read the meta slot alongside typed data.	2026-04-19 00:46:05 +02:00
vikingowl	b42a35c049	Merge branch 'feat/merge-conflict-display' — MR 7 per-source contributions visible Migration 000018 adds sources text[] + source_contributions jsonb to discovered_markets. Crawler preserves raw per-source RawEvents through Merge() and service persists them alongside the merged row. Admin UI gains a merged-sources chip + Datumskonflikt badge and an expandable Quellen-Vergleich panel showing per-field comparison across sources with conflicting values highlighted.	2026-04-19 00:28:24 +02:00
vikingowl	cc6c4f2efb	feat(discovery): persist and display per-source contributions for merged queue rows Migration 000018 adds sources text[] + source_contributions jsonb columns to discovered_markets. Crawler's merger now preserves the raw per-source RawEvents through Merge() so they can be stored alongside the merged row. Admin UI gains two surfaces: (a) compact "merged from source1 + source2" chip + amber Datumskonflikt badge when hinweis flags it, (b) expandable Quellen-Vergleich panel showing a per-field comparison table with diverging fields highlighted. Forensic visibility into what each source said vs what the merger picked.	2026-04-19 00:27:34 +02:00
vikingowl	f22a141615	Merge branch 'feat/admin-queue-pagination-and-similar' — MR 6 queue UX Queue endpoint returns {data, total, limit, offset}; admin UI exposes prev/next + page-size + Showing X-Y of Z. Per-row Similar button fetches MR 5's /queue/:id/similar via a SvelteKit proxy and renders matches inline. Essential for reviewing the 1000+ row queue post-fix.	2026-04-19 00:14:52 +02:00
vikingowl	2acd0cdc06	feat(admin): queue pagination + per-row Show similar button Queue endpoint now returns {data, total, limit, offset}. Admin UI reads ?page + ?limit from URL, renders prev/next + page-size selector + "Showing X-Y of Z" label. Per-row Similar button fetches the MR 5 /queue/:id/similar endpoint via a new SvelteKit proxy route and renders matches inline with score/name/city/date. Essential for navigating the 1000+ row queue after MR 5's crawl fixes.	2026-04-18 23:59:18 +02:00
vikingowl	5c363944b2	Merge branch 'feat/crawl-similarity-and-fixes' — MR 5 crawler cleanup + similarity Drops link-check from crawl path (was timing-bound, misleading counter). Fixes suendenfrei pagination footer-link infinite loop. Adds similarity helper with Levenshtein-based fuzzy name match + city match + date proximity, exposed as GET /queue/:id/similar for admin duplicate review.	2026-04-18 20:05:44 +02:00

1 2 3 4 5 ...

268 Commits