vikingowl dcbf38f6e9 feat(discovery): enrichment foundation — schema, types, crawl-enrich, cache
Lays infrastructure for Ship 2 crawl-time enrichment. Design principles
(see memory/project_ship2_enrichment.md):
- async worker (not inline in crawl) — MR 3 wires it up
- single enrichment jsonb column, not typed columns — shape still in flux
- per-row LLM budget, global soft cap logged
- crawl-enrich runs first; LLM only fills gaps it cannot reach

Migration 000019: adds discovered_markets.enrichment{,_status,_attempts}
and enriched_at; partial index on enrichment_status for the worker's
claim query; enrichment_cache table keyed by sha256(name|city|year).

enrich package:
- crawl.go — pure consolidator over SourceContributions (PLZ, venue,
  organizer), first non-empty wins. Optional Geocoder pulls lat/lng via
  Nominatim; failures are non-fatal. Everything marked provenance=crawl.
- llm.go — LLMEnricher interface + NoopLLMEnricher. Real Mistral-backed
  impl lands in MR 3 along with the worker.
- enrich.go — Merge(base, overlay) with base-wins semantics, enforcing
  the crawl-over-llm invariant at the type level: even a confident LLM
  pass can't overwrite a crawl-populated field.
- cache.go — CacheKey() stable across re-crawls; DefaultCacheTTL=30d.

Repository: scan/persist the new columns, GetEnrichmentCache /
SetEnrichmentCache / SetEnrichment. The SetEnrichment UPDATE increments
attempts server-side and stamps enriched_at only for terminal states
(done|failed) — 'skipped' keeps the previous timestamp.

No UI changes and no worker binary yet. Noop LLM enricher in place so
MR 3 can wire the worker without refactoring shape.
2026-04-24 09:55:38 +02:00
2026-02-21 07:10:30 +01:00
Description
No description provided
1.7 MiB
Languages
Go 60.3%
Svelte 20.3%
Dart 11.1%
TypeScript 5%
PLpgSQL 1.1%
Other 2.1%