dcbf38f6e93057b947f23b847111d870cbd90f8b
Lays infrastructure for Ship 2 crawl-time enrichment. Design principles
(see memory/project_ship2_enrichment.md):
- async worker (not inline in crawl) — MR 3 wires it up
- single enrichment jsonb column, not typed columns — shape still in flux
- per-row LLM budget, global soft cap logged
- crawl-enrich runs first; LLM only fills gaps it cannot reach
Migration 000019: adds discovered_markets.enrichment{,_status,_attempts}
and enriched_at; partial index on enrichment_status for the worker's
claim query; enrichment_cache table keyed by sha256(name|city|year).
enrich package:
- crawl.go — pure consolidator over SourceContributions (PLZ, venue,
organizer), first non-empty wins. Optional Geocoder pulls lat/lng via
Nominatim; failures are non-fatal. Everything marked provenance=crawl.
- llm.go — LLMEnricher interface + NoopLLMEnricher. Real Mistral-backed
impl lands in MR 3 along with the worker.
- enrich.go — Merge(base, overlay) with base-wins semantics, enforcing
the crawl-over-llm invariant at the type level: even a confident LLM
pass can't overwrite a crawl-populated field.
- cache.go — CacheKey() stable across re-crawls; DefaultCacheTTL=30d.
Repository: scan/persist the new columns, GetEnrichmentCache /
SetEnrichmentCache / SetEnrichment. The SetEnrichment UPDATE increments
attempts server-side and stamps enriched_at only for terminal states
(done|failed) — 'skipped' keeps the previous timestamp.
No UI changes and no worker binary yet. Noop LLM enricher in place so
MR 3 can wire the worker without refactoring shape.
Description
No description provided
Languages
Go
60.3%
Svelte
20.3%
Dart
11.1%
TypeScript
5%
PLpgSQL
1.1%
Other
2.1%