ce32f7673131fabae8d995474c6070ffe913d0a5
Completes the manual two-pass enrichment flow: the crawl-enrich-all
button (MR 3) fills deterministic fields across the queue; this MR
adds a per-row "AI" button that scrapes the row's quellen URLs and
asks Mistral to fill category, opening_hours, description.
Flow per click:
1. Load row, compute CacheKey(name_normalized, stadt, year).
2. Cache hit -> skip LLM, merge cached payload onto current
crawl-enrich base, persist, return.
3. Miss -> scrape up to 5 quellen URLs via pkg/scrape (goquery
text extraction, 4000-char truncation), concatenate into labeled
blocks, call ai.Client.Pass2 with JSON response format.
4. Parse response into Enrichment{category, opening_hours,
description}, stamp provenance=llm + model + token counts.
5. Cache the raw LLM payload (not the merged one) under the tuple
key with DefaultCacheTTL=30d, so later re-crawls can layer new
crawl-enrich bases on the same cached answer.
6. Merge(crawl, llm) -- crawl fields survive. Persist via
SetEnrichment(status=done). Return merged to the operator.
ErrNoScrapedContent fails fast when zero URLs return usable text;
LLMs without grounding hallucinate, and a 400-style operator error is
better than inventing details. Individual scrape failures don't halt
the flow as long as at least one source succeeds.
pkg/scrape (new, reusable)
- Client.Fetch: HTTP GET, strip script/style/nav/footer/aside via
goquery, gather body text, collapse whitespace, truncate.
DefaultTimeout=10s, DefaultMaxChars=4000. User-Agent configurable.
- Tests cover noise stripping, whitespace collapsing, truncation,
body-less fragments.
enrich.MistralLLMEnricher
- Takes ai.Client + Scraper (both injectable; tests use stubs).
- Prompt: English system instructions asking for JSON-only output
with category/opening_hours/description in German. User prompt
includes markt identifiers, already-filled fields (so the LLM
doesn't waste tokens re-deriving them), and scraped blocks.
- Tests: happy path, all-scrapes-fail (-> ErrNoScrapedContent),
partial-scrape-success, empty LLM fields yield no provenance,
URL cap at 5.
Service.RunLLMEnrichOne + handler POST /admin/discovery/queue/:id/
enrich (sync, 30s timeout). NewService gains llm enrich.LLMEnricher
param; routes.go constructs a MistralLLMEnricher when ai.Client is
enabled, falls back to NoopLLMEnricher otherwise.
UI: per-row AI button next to Similar, tracks per-row pending state
via a Set<string>, disables the button while the request is in
flight and shows "AI..." label. Success invalidates the page, the
row's expanded view picks up the new category/opening_hours/
description fields with llm provenance tags. Inline error message on
the row if the enrich action fails.
Description
No description provided
Languages
Go
60.3%
Svelte
20.3%
Dart
11.1%
TypeScript
5%
PLpgSQL
1.1%
Other
2.1%