108 Commits

Author SHA1 Message Date
5ad8126b81 fix(web): use proper umlauts in remaining server-side error messages
All checks were successful
ci/someci/push/web Pipeline was successful
Sweep of server action error strings — eight ASCII fallbacks replaced with
ä/ö/ü/ß across three +page.server.ts files: Pruefung -> Prüfung,
bestaetige -> bestätige, waehle -> wähle, fuelle -> fülle, Loeschen ->
Löschen, Statusaenderung -> Statusänderung, Ungueltiger -> Ungültiger.

Discovery agent_status enum literals ('bestaetigt', 'unklar', etc.) are
intentionally left as ASCII — they must match the LLM schema constants on
the backend.
2026-04-28 23:07:15 +02:00
8d8d96c231 fix(web): use proper umlauts in feedback dialog text
All checks were successful
ci/someci/push/web Pipeline was successful
Replace ASCII fallbacks (vollstaendig, uebermittelt, geprueft, Schliessen,
fuer, Rueckfragen, Datenschutzerklaerung) with proper German characters.
The ASCII-only convention applies to planning docs, not user-facing UI.
2026-04-28 22:43:10 +02:00
709fb6663a fix(web): center feedback dialog (Tailwind 4 preflight resets dialog margin)
All checks were successful
ci/someci/push/web Pipeline was successful
Native <dialog>.showModal() relies on the user-agent's margin: auto to
center the modal, but Tailwind 4's preflight resets margin to 0 on every
element, leaving the dialog pinned to the top-left edge of the viewport.

Add m-auto to the dialog class to restore the intended centering. Only one
dialog in the app, so a scoped class fix is sufficient — no global override
needed.
2026-04-28 22:35:44 +02:00
75a626b127 chore: switch CI to monolithic chart, delete old per-service charts
Some checks failed
ci/someci/push/backend Pipeline failed
ci/someci/push/web Pipeline failed
CI deploy steps now target helm/marktvogt with --reset-then-reuse-values,
preserving the other service's image tag across pipeline runs. Each pipeline
sets only its own X.image.tag.

App-level secrets (smtp/turnstile/discovery/ai/JWT/oauth) moved out of CI's
--set chain in the previous phase — now pre-created via
scripts/k8s-secrets-sync.sh from .env.helm. The chart's conditional secret
templates remain for backward-compat with the live release's stored values
but will be removed in a follow-up once those values are cleared.

Old per-service chart directories deleted; only the monolithic
helm/marktvogt/ remains.

MIGRATION.md updated with the actual procedure that worked, including the
several pitfalls hit during the live tenant-2 migration on 2026-04-28
(helm uninstall trap, SSA field-manager swap for CRDs, kyverno hostname
allowlist for new subdomains).
2026-04-28 16:33:53 +02:00
5b34252132 feat(web): add explicit /healthz endpoint for liveness/readiness probes
All checks were successful
ci/someci/push/web Pipeline was successful
2026-04-28 15:13:48 +02:00
921f329dab chore(web): switch runtime to bun (drop-in node replacement, ~50MB lighter)
All checks were successful
ci/someci/push/web Pipeline was successful
2026-04-28 14:51:17 +02:00
5f96daf7f3 feat(market): admin edit link and public feedback form 2026-04-28 13:43:22 +02:00
3d62ba9526 chore(web): integrate @typescript/native-preview (tsgo) for type-checking
Adds the native (Go) TypeScript compiler as a devDep and routes
svelte-check through it via --tsgo. Local pnpm run check goes from
~5s to ~3s on this codebase; pre-commit hook inherits the speed
automatically.

The linux-x64 prebuilt is a statically-linked Go binary (~25MB), so
the alpine builder in web/Dockerfile installs it cleanly even though
it never invokes svelte-check during the image build.
2026-04-28 12:56:41 +02:00
ba4dce1f76 fix(ai): per-model cost calc + thinking toggle and token tracking
estimateCost ignored the model name and billed every Gemini call at
hardcoded flash-lite rates ($0.10 / $0.40 per 1M), under-counting Pro
calls by ~12-25x. Switch to priceFor(model) and prefer resp.ModelVersion
so aliases like gemini-pro-latest resolve to their concrete family.

Capture ThoughtsTokenCount as a separate ThinkingTokens column on
ai_usage (migration 000030) and bill it at the output rate.

Add a global thinking on/off toggle that mirrors the grounding pattern:
provider holds an in-memory cache (read at startup from settings.Store),
handler keeps it in sync, Chat() applies ThinkingConfig.ThinkingBudget=0
only when disabled. Default true preserves SDK behavior. Grounding+
thinking get/set helpers folded into shared getBool/setBool to keep
goconst happy.

Web admin settings: new "Modell-Reasoning" toggle card; usage panel sums
include thinking tokens. Types are optional with `?? 0` defaults so a
brief web-before-backend rollout window cannot render NaN.
2026-04-28 12:56:04 +02:00
c6cdc11693 feat(auth): D5 cleanup + W3 web refresh UX
D5 — backend cleanup:
- Migration 000029 drops legacy token_hash column from sessions
- JWT_SECRET renamed to APP_SECRET (fallback + deprecation warning)

W3 — web session UX:
- AuthData type: session_token→refresh_token, remove expires_in
- cookies.ts: refresh_token cookie, non-HttpOnly access_expires_at
- client.server.ts: sends X-Refresh-Token header (not JSON body)
- hooks.server.ts: simplified two-path SSR refresh logic
- refresh.ts: single-flight client-side refresh
- client.ts: proactive refresh + 401 retry on non-auth paths
- /api/auth/refresh: SvelteKit proxy for HttpOnly cookie refresh
- OAuth callback, Datenschutz page updated to new cookie names
2026-04-26 13:25:48 +02:00
24dc46eeb8 fix(merge-plan): snapshot proposal prop to avoid structuredClone proxy throw
structuredClone on a Svelte 5 reactive Proxy throws DataCloneError during
component init, causing MergeProposalPanel to silently fail to mount.
Replace with \$state.snapshot which is the documented way to deep-copy a
reactive prop into a local editable state.
2026-04-26 00:28:21 +02:00
131d8c8ff0 fix(merge-plan): extend poll timeout to 270s + guard undefined proposal
Frontend budget was 180s — equal to the backend goroutine cap — so a race
determined which side timed out first. Bumped to 270s to guarantee the frontend
outlasts the backend's 3-minute window.

Added explicit null guard on result.proposal: if the LLM ever returns a
done-status without a proposal body the UI now surfaces a clear error instead
of silently assigning undefined (which kept the panel hidden with no feedback).

Also guards field_merges ?? {} in MergeProposalPanel to avoid Object.keys(null)
if the model returns a null map.
2026-04-26 00:11:31 +02:00
643ee77600 feat(merge-plan): convert to async polling to bypass nginx 60s timeout
POST /admin/markets/:id/merge-plan now returns 202 + job_id immediately
and runs the Gemini advisor in a detached goroutine. Frontend polls
GET .../merge-plan/:job_id until done, with backoff up to 3 minutes.

Adds in-memory job registry (keyed map + RWMutex, 5-min TTL sweep) and
handler tests covering the full pending→done and error paths.
2026-04-25 23:37:03 +02:00
caaad8adf4 fix(web): SSR calls use cluster-internal backend URL to bypass nginx timeout
All serverFetch calls were going to https://api.marktvogt.de (public
gateway), creating a second nginx hop for every SSR operation. Slow LLM
calls (merge-plan, research-plan) hit the 60s proxy_read_timeout.

- Add PRIVATE_API_BASE_URL=http://marktvogt-backend to web Helm config
- serverFetch now builds SERVER_API_BASE from PRIVATE_API_BASE_URL at
  runtime (falls back to PUBLIC_API_BASE_URL when not set)
- apiFetch accepts optional baseURL param; client-side calls unchanged
2026-04-25 22:25:31 +02:00
4916b0d6af fix(infra): increase gateway timeout for admin+market routes to 120s
Merge-plan and research-plan both call Gemini which can take >60s.
The default gateway timeout was killing connections with 504.

- Web HTTPRoute: add /admin/ rule with 120s request+backendRequest timeout
- Backend HTTPRoute: add /api/v1/admin/markets/ rule with 120s timeout
- MergePlan handler: add 110s context deadline for graceful degradation
  before the gateway cuts the upstream connection
2026-04-25 22:03:20 +02:00
3d922e50bf perf(admin): stream duplicate check async — don't block page render
LLM tiebreaker can take several seconds; return the duplicates fetch
as an unawaited Promise so the page renders immediately with market
data. Template uses {#await} to render the panel when it resolves.
2026-04-25 21:08:56 +02:00
11377b8463 fix(research): return full body from plan proxy, not res.data
The Plan handler returns {plan, research_result} directly without a
data wrapper. apiFetch casts the body to ApiResponse<T>, so res.data
was undefined, json(undefined) produced an empty response, and the
client either crashed (JSON.parse) or silently got a null plan.
2026-04-25 20:53:57 +02:00
73c30d2f5f feat(admin/dedup): merge UI + enrich enum fix + robust JSON parse
H1: Drop empty string from enricher_schema.json category enum —
Gemini rejects enum[7]: cannot be empty (Error 400). Remove category
from required so the model can omit it when no category fits.

H2: Research-plan/apply client reads response as text before
JSON.parse; empty or HTML error bodies now surface the actual HTTP
status instead of crashing with "unexpected end of data".

I: Dedup UI for approved markets:
- DuplicatesPanel: LLM verdict pills (same/not-same, confidence),
  llm_reason, per-candidate Merge-planen button
- MergeProposalPanel: summary, confidence, flags, per-field
  decisions with editable source radio (a/b/combined), current
  value context, confirm() before destructive apply
- Two SvelteKit proxy routes: merge-plan/ and merge-into/[targetId]/
- [id]/+page.svelte: wired with full state; navigates to survivor
  after successful merge
- [id]/+page.server.ts: load duplicates for all non-merged editions
  (was gated to status=rumored only)
- types.ts: DuplicateMarket gains llm_same/llm_confidence/llm_reason;
  add MarketMergeProposal + MergeFieldDecision; add merged to
  EditionStatus
2026-04-25 19:34:49 +02:00
9b308639fd feat(admin/ui): three-section merge plan UI + plan/apply proxy endpoints (D6) 2026-04-25 18:45:40 +02:00
eb169689d5 fix(admin): submit save form after applying research suggestions
applyResearch() populated form fields but never triggered a save.
After applying all suggestions and appending the KI-Recherche note,
call requestSubmit() on form[action="?/save"] so the data is persisted.
2026-04-25 17:41:19 +02:00
33539b703a feat(research): add logo_url field + require per-field hints
- Add logo_url as a distinct DB column (migration 000023) and expose it
  through model, DTOs, repository, service, and all frontend types
- Update KI-Recherche prompt and both JSON schemas: logo_url field rule,
  clarified bild_url rule, hinweis now mandatory non-null (maxLength 200)
- imageURLReachable now also verifies Content-Type: image/* for both
  bild_url and logo_url before surfacing suggestions
- MarketCard: image-first with cover style, logo fallback with contain
  style, city-initial placeholder as last resort
- /markt/[slug]: hero section follows same image→logo→nothing precedence;
  OG/JSON-LD updated accordingly
- Map view on search page: pagination hidden, map height increased to 600px
- Fix einstellungen Svelte warning: wrap showKeyInput init in untrack()
2026-04-25 16:58:47 +02:00
bde41be767 fix(research): surface errors to UI + proceed without pages when all fetches fail 2026-04-25 13:59:05 +02:00
0bff6771ce feat(admin): filter markets by missing fields + row indicators
- Add missing= query param (description/image/website/location) to
  AdminSearchParams; both AdminSearch and AdminSearchGrouped apply the
  SQL condition
- Add has_description/has_image/has_website/has_location booleans to
  AdminMarketSummary, populated in ToAdminSummary from existing Market fields
- Dropdown filter in the admin market list routes to the missing param
- Coloured dot indicators per row (amber=image, orange=desc, red=website,
  purple=location) with title tooltips
2026-04-25 13:46:14 +02:00
e166ad5e48 content(impressum): add data accuracy disclaimer section 2026-04-25 13:34:32 +02:00
9d9520bcad feat(ui): image display improvements across admin and public views
- MarketCard: object-fit contain with padding instead of cropped 16:9;
  city-initial placeholder so all cards are uniform height in the grid;
  imgFailed state falls back to placeholder on broken URLs
- Admin market detail: show image thumbnail + Bild-URL link in Details
- Admin edit form: live image preview below Bild-URL input
- Public detail page: contain + max-height 250px instead of cover crop
- onerror handlers hide broken images on public card and detail pages
- Time inputs changed to text + pattern for reliable 24h display
2026-04-25 12:46:13 +02:00
6b3c673cd0 feat(ai): tighter Gemini model filter with per-model pricing
- Replace ListModelNames with ListModels returning ModelInfo structs
- Name-based filter: require gemini- prefix, drop tuned models, block
  EOL 2.0 family, TTS/image/live/audio/robotics/embedding, Gemma/Imagen/Veo
- Static pricing table with longest-prefix match; stable vs preview flag
- Settings handler validates SetModel against allowed list (degrade-open)
- Frontend dropdown shows input/output price per 1M tokens + Preview tag
- Table-driven unit tests for filter, sort order and pricing lookup
2026-04-25 12:42:53 +02:00
da9754cb2f fix(research): move all form fields to reactive state, add setField dispatcher
All Input fields used market?.xxx as initial value, so a Svelte re-render
triggered by researchResult=null would reset them back to the server-loaded
value, wiping every applied research suggestion.

Replace all research-applicable fields with $state variables and route all
apply calls through setField() instead of querySelector+dispatch. Country
name->code mapping added for LLM-returned values like "Deutschland" -> "DE".
writeReverseResult also updated to use setField.
2026-04-25 11:25:21 +02:00
c5c84ff297 fix(research): apply description via reactive state, add name correction
Description wasn't being applied because querySelector-then-assign runs before
Svelte's reactive flush of researchResult=null, which resets the textarea to
its initial market.description value. Fix: reactive state + exported setter
(same pattern as setHours/setAdmission).

Also add markt_name to felder in both schemas and the prompt so the LLM can
suggest a name correction. Name suggestions are gated to extraktion=direkt
(high confidence only) and guarded on the frontend with setName().
2026-04-25 11:15:16 +02:00
282d59e6c1 fix(research): add beschreibung to prompt, auto-note on apply
The beschreibung field was schema-required but absent from ## Felder,
causing the LLM to always return null. Add explicit extraction instruction.

Also reword the opening line which said "Keine Beschreibungstexte" —
contradicting the field we actually want.

On apply, append "KI-Recherche: DD.MM.YYYY HH:MM" to admin_notes so
there's a permanent audit trail of when research was run.
2026-04-25 11:05:27 +02:00
dd9a5ae9cc fix(research): convert LLM schema shapes to form-compatible types on apply
Researcher emits {datum_von,von,bis} for opening hours and [{name,betrag,waehrung}]
for admission info — both incompatible with the form's {day,open,close} and
AdmissionInfo shapes. Normalize on apply; extend normalizeDayName to handle
ISO YYYY-MM-DD dates the LLM produces. ResearchPanel renders both LLM and
form-native formats with dedicated table/list views.
2026-04-25 11:01:18 +02:00
016d7a0792 fix(settings): handle missing migrations gracefully, guard AI status page
factory.go: treat DB errors from GetGeminiAPIKey as "no key" and fall
back to the GEMINI_API_KEY env var instead of propagating the error
(which caused a panic/crash when migrations haven't been run yet).

gemini.go: ListModelNames returns a ProviderError when the client is
nil so that connected=false is reported correctly in GetAI instead of
the previous nil,nil→connected=true false positive.

+page.server.ts: catch fetch errors so a backend outage doesn't 500 the
whole page. +page.svelte: guard all data.ai access with {#if data.ai}
so the page renders an error banner instead of crashing on null access.
2026-04-25 10:41:25 +02:00
3ddfd87408 feat(ai): migrate to Google Gemini 2.5 Flash-Lite, drop Mistral/Ollama
Replace the Mistral + Ollama AI stack with a single Google Gemini provider
backed by google.golang.org/genai. API key moves from env/Helm to the DB
(AES-256-GCM, key derived from JWT_SECRET via HKDF) so it can be rotated
via the admin UI without a pod restart.

New:
- pkg/crypto/secretbox — AES-256-GCM encrypt/decrypt for secrets at rest
- pkg/ai/gemini — GeminiProvider with grounding, structured output, usage
  recording, and hot-reload (Reinitialize swaps client under mutex)
- pkg/ai/usage — UsageRecorder interface + UsageEvent struct
- domain/settings/store — DB-backed settings (model, grounding toggle, key)
- domain/settings/usage — UsageRepo implementing UsageRecorder; ai_usage table
- migrations 000021 (system_settings) + 000022 (ai_usage)
- settings API: GET /ai, POST /ai/key, POST /ai/model, POST /ai/grounding,
  GET /ai/usage
- admin UI: 4-card settings page — provider status, model selector, grounding
  toggle with quota, usage rollups + recent-calls table

Removed:
- pkg/ai/ollama, mistral_provider, ratelimiter (+ tests)
- Helm AI_API_KEY, AI_PROVIDER, AI_MODEL_COMPLEX, AI_AGENT_DISCOVERY,
  AI_RATE_LIMIT_RPS env vars

Call sites set Grounded+CallType: research (true/"research"), enrich Pass B
(true/"enrich_b"), similarity (false/"similarity"). Integration test updated
to use a stub ai.Provider instead of a fake Ollama HTTP server.
2026-04-25 09:54:49 +02:00
c4207865c8 feat(settings): Ollama connection status + runtime model selector
Add /admin/settings/ai endpoint (GET status + available models, POST
model switch). OllamaProvider gains SetModel/Model/ListModels with a
RWMutex so the active model can be swapped at runtime without restart.
New /admin/einstellungen page shows provider, connection badge, and a
model dropdown that calls the API on submit.
2026-04-25 08:29:38 +02:00
f13cd55393 feat(research): wire LLM output to ResearchResult, add beschreibung field
Transform raw LLM felder output into FieldSuggestion[] for the UI panel.
Skip suggestions identical to current market values. Add beschreibung to
both schemas, the Go struct, and the transformation mapping so description
is extracted during research. Fix field labels (Land, Startdatum, Enddatum)
in ResearchPanel.
2026-04-25 08:12:28 +02:00
c9a2f8622f feat(market): reverse geocoding — lat/lng to address
Complements the existing forward geocoder with Nominatim's /reverse
endpoint so the admin edit form can populate the address from
coordinates (useful when a crawl gave us lat/lng but no street,
e.g. after running crawl-enrich).

Backend:
- geocode.Reverse(ctx, lat, lng) hits Nominatim /reverse with
  addressdetails=1 and accept-language=de, reuses the 1 rps mutex
  already guarding forward calls. Falls through city → town →
  village → municipality → hamlet for small places. Returns nil
  when Nominatim has no match so callers can distinguish "no hit"
  from "all-empty address."
- New DTOs ReverseGeocodeRequest/Response.
- GeocodeHandler.ReverseGeocode wired at POST /reverse-geocode
  behind the same geocodeLimit middleware as /geocode.

Frontend:
- /api/reverse-geocode SvelteKit proxy mirrors /api/geocode.
- MarketForm gets a second button next to "Koordinaten aus Adresse
  ermitteln" — "Adresse aus Koordinaten ermitteln". Writes non-empty
  street/city/zip back into the form; empty result surfaces
  "Keine Adresse gefunden."
2026-04-24 15:00:23 +02:00
38834c56a3 fix(discovery): stop auto-firing research on Accept
Accepting a row triggered a background POST to
/admin/markets/<edition>/research. The intent was to "warm up" the
edit page, but the result was discarded (fire-and-forget), the edit
page only renders research from its own form action, and the
backend's 5-minute-per-market cooldown still got set — so the
operator's first manual "Mit KI recherchieren" click hit "Bitte
warte 5 Minuten zwischen Recherche-Aufrufen" instead.

Removes the auto-fire. Research runs on user click. If we want
prefetched suggestions later, that needs server-side caching + a
load-time fetch, not fire-and-forget.
2026-04-24 14:56:12 +02:00
20055acd2e fix(discovery): correct redirect path after Accept
The accept action redirected to /admin/maerkte/<id>/edit, but the
route is /admin/maerkte/[id]/bearbeiten — every other admin link
uses the German segment. Reviewers hit a 404 after every Accept.
2026-04-24 13:59:25 +02:00
2c0154e4ce feat(web): discovery preview modal
Adds a Vorschau button to the detail drawer header that opens a
full-width modal showing an approximate public /markt/[slug] layout
for the candidate row. Lets reviewers sanity-check the user-facing
result before clicking Accept.

- DiscoveryPreview.svelte: renders title, date range, venue/PLZ/city
  location line, organizer, description, opening hours, website link
  and a Leaflet map pin (if lat/lng present). Banner calls out which
  fields (street, admission prices, title image) will come later from
  the organizer so the preview's gaps are not mistaken for bugs.
- DetailDrawer.svelte: adds previewOpen state, an eye-icon Vorschau
  button next to Accept/Reject, and an overlay at z-60 over the
  drawer. Backdrop click or ✕ closes the preview without closing the
  drawer.
2026-04-24 13:50:43 +02:00
0d2c9c0f7f fix(web): silence svelte 5 warnings, add missing enrichment proxy
- Wrap $state initializers that read props (MarketForm, ResearchPanel,
  maerkte +page) in untrack() so Svelte 5 stops warning about
  state_referenced_locally. Intent stays "take an initial snapshot of
  the prop" — the warning existed to make that intent explicit.
- Add enrichment/crawl-all-status/+server.ts proxy route; the admin
  discovery page was polling this path and getting 404s in a tight
  loop because the equivalent SvelteKit proxy only existed for the
  plain /crawl-status endpoint.
2026-04-24 13:38:03 +02:00
2fdd8e8222 feat(web): polish discovery admin page and drawer
Discovery drawer
- Wrap each section in a rounded card so boundaries are visible without
  parsing the uppercase headers.
- Header: N Quellen and enrichment_status become consistent pills,
  matching the existing konfidenz pill treatment.
- Enrichment: replace the inline "(llm)"/"(crawl)" trailing text with a
  color-coded badge on the label side (purple = llm, sky = crawl).
- Empty enrichment state now tells the operator how to trigger it.
- Audit timestamp uses a local-time helper so the displayed time
  matches the browser timezone (was UTC-as-local).
- Quellen list: prefix each URL with its hostname for scannability;
  long URLs truncated with full URL in the title attribute.

ContributionsPanel
- Amber border/background now only on conflict rows; every row
  previously got border-amber-100 unconditionally, which diluted the
  conflict signal. Rang 1 badge flipped to emerald so it reads as a
  positive "winner" marker, not a warning.

Discovery page
- Remove dead dateInputValue() function and the stale
  a11y_click_events_have_key_events suppression — both flagged by
  eslint after earlier refactors.
- Render crawl/enrich timestamps in the browser's local timezone via a
  new fmtLocalStamp helper; the previous .slice(0,16).replace('T',' ')
  treated the ISO UTC string as if it were local time.
2026-04-24 13:37:39 +02:00
ef6e1def3d feat(discovery): keyboard shortcuts for queue review
Ship 2 MR 8. Operator-productivity layer on top of the detail drawer:
j/k to walk rows, Enter to open, a/r to accept-reject the selection,
e/s to jump into the drawer with AI enrich / Similar already visible,
? for a help modal listing everything. Escape closes the drawer (or
the help modal if it's open).

Implementation
- selectedId $state drives a subtle indigo ring on the highlighted
  row. Follows drawerId when the drawer opens so Esc → j leaves you
  on the same row. Auto-resets to queue[0] if the selected row
  scrolls off the page (pagination / refresh).
- Global <svelte:window onkeydown> listener. isTypingTarget() bails
  out when focus is inside an input/textarea/select/contenteditable
  so typing in the drawer's edit form doesn't trigger shortcuts.
  Cmd/Ctrl/Alt combos also skipped so browser shortcuts stay intact.
- selectRelative() updates selectedId + scrolls the row into view
  (block: 'nearest') so keyboard-driven scanning through a long
  queue keeps the highlight visible.
- submitRowAction() builds + submits a hidden <form> for a/r so the
  SvelteKit action pipeline (invalidations, form result propagation)
  runs the same way a button click would.

Decisions baked in
- 'e' (AI enrich) and 's' (Similar) open the drawer rather than
  firing the LLM call directly. LLM calls cost money; keeping the
  UI explicit avoids hidden side effects from a misclick.
- Persistent '?' button bottom-right for discoverability — operators
  shouldn't have to read docs to find the help.
- Modal uses click-outside-to-dismiss + Esc + ✕ button, all three.

No backend changes. Frontend-only.
2026-04-24 12:40:12 +02:00
5476578373 feat(discovery): per-row detail drawer replaces inline panels
Ship 2 MR 6. Consolidates every market-specific action that used to
expand into the queue table into a single side drawer. Queue rows
keep Accept/Reject for fast-path review; clicking anywhere else on a
row opens the drawer with the full context.

State via URL param ?drawer=<id>. F5 preserves the open row; links
like /admin/discovery?drawer=<uuid>&sort=konfidenz are shareable and
compose with existing pagination/sort state.

DetailDrawer.svelte (new) sections:
- Header: name, konfidenz, source count, Accept/Reject, close (✕)
- Identity: editable form (name, stadt, bundesland, start/end, website)
- Enrichment: full payload with per-field provenance tags + AI enrich
  button; "Noch keine Enrichment-Daten" empty state
- Quellen: URL list (link-out)
- Quellen-Vergleich: per-source contribution diff (reuses
  ContributionsPanel) — only rendered when >=2 sources
- Similar: candidates loaded lazily on drawer open; AI? tiebreak
  button per candidate shows ✓ same / ✗ diff chips with LLM reason
- Audit: discovered_at, agent_status, hinweis

+page.svelte: removed the three inline <tr> panels (Similar,
Quellen-Vergleich, expanded) and their associated state (expandedId,
similarOpenId, quellenVergleichOpenId, similarLoading, similarEntries,
similarVerdicts, similarClassifying, toggleSimilar, classifySimilar,
toggleQuellenVergleich). Row actions collapsed from 5 buttons
(Accept/Reject/Similar/AI/Quellen-Vergleich) to 2 (Accept/Reject).
The chevron glyph stays as a visual affordance but is inert — the
whole row is clickable. Buttons/forms/links inside the row stop
propagation via a closest()-based guard so fast-path Accept/Reject
don't accidentally open the drawer.

No backend changes; the drawer consumes existing queue data +
existing endpoints (similar, similar/classify, enrich).

Follow-ups: MR 8 adds keyboard shortcuts that naturally compose with
the drawer (j/k navigation, Enter opens, Esc closes).
2026-04-24 12:37:38 +02:00
525a20b79c Merge branch 'feat/discovery-auto-merge-crawl' — Ship 2 MRs 2–7 (enrichment foundation, crawl-enrich, LLM enrich, AI similarity, auto-merge)
Brings the full Ship 2 feature stack (except the eval harness and detail
drawer) into main. Conflicts resolved:

- repository.go: kept MR 1's sort params + queueOrderByClause builder on
  ListQueue, AND MR 7's FindPendingMatch + MergePendingSources (MR 7
  removed the old QueueHasPending). ListQueue SELECT keeps the enrichment
  columns MR 2 added.
- mock_repo_test.go: kept both MR 1's listQueueCalls capture and the
  MR 2-4 enrichment/similarity hooks.
- service_test.go: ListPendingQueuePaged uses MR 1's sort-param signature;
  NewService uses the MR 2-7 seven-arg form.
- handler_test.go: TestListQueueSortParamWhitelist's NewService call
  bumped from 4 args to 7 (nil geocoder, nil llm enricher, nil sim
  classifier).

Features landing on main:
- MR 2: enrichment schema (migration 000019), jsonb payload, enrich
  package with Merge/CacheKey/NoopLLMEnricher.
- MR 3: manual crawl-enrich-all button + async 202 status endpoint.
- MR 3b: per-row LLM enrich via scrape-then-prompt (pkg/scrape +
  MistralLLMEnricher).
- MR 4: AI similarity tiebreak (migration 000020), MistralSimilarityClassifier,
  per-candidate AI? button in the Similar panel.
- MR 7: cross-crawl auto-merge for new sources on pending queue rows
  (FindPendingMatch + MergePendingSources, AutoMerged counter).
2026-04-24 12:13:30 +02:00
c06788a63d feat(discovery): auto-merge queue rows across crawl runs
Ship 2 MR 7. Replaces the "drop on duplicate" branch of the crawl
loop with a cross-run auto-merge: when a new crawl brings a source
that a pending queue row doesn't yet carry, the new source's data
merges into the existing row instead of spawning a second entry.
Operator review burden stays bounded to one row per market even as
coverage grows across sources.

Konfidenz upgrades come for free: a row that starts with one source
at konfidenz=mittel flips to hoch the moment a second independent
source confirms the same (name, city, start_date) triple.

Repo changes
- QueueHasPending (bool) replaced by FindPendingMatch returning
  *DiscoveredMarket. Same exact-tuple lookup; now callers see the
  full match so they can merge.
- MergePendingSources appends new sources/quellen/contributions onto
  a pending row using set-union semantics. source_contributions
  dedupe by SourceName so repeat crawls don't stack duplicate entries.
  Konfidenz and hinweis are overwritten with caller-computed values.
- Idempotent: send the same delta twice, nothing changes the second
  time.

Service.Crawl flow
- On match + incoming source already on the row -> DedupedQueue.
  Same semantic as before, just more tightly scoped (same source
  re-emits an event; previously any match counted as dedup).
- On match + incoming source not yet on the row -> auto-merge path:
  compute the source/quellen/contribution delta, call
  MergePendingSources, count in summary.AutoMerged.
- The crawlerKonfidenz helper is now a thin wrapper over a shared
  konfidenzForSources(sources []string), reused by the merge path.
  Source-name constants extracted to un-hardcode the switch cases
  and the test references.

Summary + UI
- CrawlSummary gains AutoMerged int. Logged alongside the other
  counters.
- +page.svelte crawl-result grid gets an "Auto-merged" tile.

Tests
- Same-source redundant pickup -> DedupedQueue=1, no MergePendingSources
  call, no insert.
- New-source auto-merge -> AutoMerged=1, MergePendingSources called with
  exact delta (addSources=[new only], addQuellen=[new only], addContribs
  labelled with new source_name), konfidenz upgraded to hoch.
- Existing TestServiceCrawlDedupQueue renamed to
  TestServiceCrawlDedupQueue_SameSourceRedundant reflecting the
  tightened semantic.

No migration — existing text[] and jsonb columns support the union
operations via SQL.
2026-04-24 12:01:01 +02:00
e0b73acfd6 feat(discovery): AI tiebreak for ambiguous similarity matches
Ship 2 MR 4. Adds per-pair AI-backed classification for operator use
inside the existing Similar panel: an "AI?" button next to each
candidate asks Mistral whether the two queue rows refer to the same
underlying market. Result shown inline as a green "✓ same N%" or
grey "✗ diff N%" chip with the LLM's reason on hover.

No scraping — the classifier works from (name, city, year) alone,
which is enough for the common cases (same venue on two calendars,
typos, cross-year recurrence). Call is short (usually <3s) so the
handler is synchronous, 15s deadline.

Caching
- Migration 000020 adds similarity_ai_cache keyed on a content hash
  over (normalized_name|stadt|year) for both rows, sorted for
  symmetry. Survives queue row accept/reject because the hash is
  about markt-content, not queue-row lifecycle.
- enrich.SimilarityPairKey computes the key. Classify(a,b) and
  Classify(b,a) hit the same entry. Stadt casing drift doesn't
  invalidate.
- Repo methods GetSimilarityCache / SetSimilarityCache + corresponding
  mock hooks. DefaultSimilarityCacheTTL=30d.

Mistral integration
- enrich.MistralSimilarityClassifier reuses the same aiPass2
  interface as the enricher. English system prompt asks for
  JSON-only output with {same_market, confidence 0..1, reason}.
  Confidence clamped to [0,1] because models occasionally return
  1.2 or -0.1. Reason is short German justification.
- NoopSimilarityClassifier returns an error — callers must check
  ai.Enabled() before deciding which binding to pass.

Service.ClassifySimilarPair loads both rows, computes pair key,
cache-first, calls classifier on miss, writes cache, returns
verdict. Rejects self-comparison (pair-key collapses). Handler
POST /admin/discovery/queue/:aid/similar/:bid/classify.

UI: new AI? column inside the Similar panel. Per-candidate pending
state via Set<string>, disabled button while in-flight, inline
verdict chip after response. Tooltip shows the LLM's reason.

Tests: pair-key symmetry + differentiation + casing tolerance;
Mistral classifier happy path, clamping edge cases, error
propagation, bad-JSON handling, Noop rejection. Service tests:
happy path writes cache, cache-hit skips LLM, self-comparison
rejected, classifier errors don't poison the cache.

NewService signature grows by one param (sim enrich.
SimilarityClassifier). All 14 existing callers (routes.go + tests)
updated; tests pass nil.
2026-04-24 11:04:15 +02:00
ce32f76731 feat(discovery): per-row LLM enrichment via scrape-then-prompt
Completes the manual two-pass enrichment flow: the crawl-enrich-all
button (MR 3) fills deterministic fields across the queue; this MR
adds a per-row "AI" button that scrapes the row's quellen URLs and
asks Mistral to fill category, opening_hours, description.

Flow per click:
  1. Load row, compute CacheKey(name_normalized, stadt, year).
  2. Cache hit -> skip LLM, merge cached payload onto current
     crawl-enrich base, persist, return.
  3. Miss -> scrape up to 5 quellen URLs via pkg/scrape (goquery
     text extraction, 4000-char truncation), concatenate into labeled
     blocks, call ai.Client.Pass2 with JSON response format.
  4. Parse response into Enrichment{category, opening_hours,
     description}, stamp provenance=llm + model + token counts.
  5. Cache the raw LLM payload (not the merged one) under the tuple
     key with DefaultCacheTTL=30d, so later re-crawls can layer new
     crawl-enrich bases on the same cached answer.
  6. Merge(crawl, llm) -- crawl fields survive. Persist via
     SetEnrichment(status=done). Return merged to the operator.

ErrNoScrapedContent fails fast when zero URLs return usable text;
LLMs without grounding hallucinate, and a 400-style operator error is
better than inventing details. Individual scrape failures don't halt
the flow as long as at least one source succeeds.

pkg/scrape (new, reusable)
- Client.Fetch: HTTP GET, strip script/style/nav/footer/aside via
  goquery, gather body text, collapse whitespace, truncate.
  DefaultTimeout=10s, DefaultMaxChars=4000. User-Agent configurable.
- Tests cover noise stripping, whitespace collapsing, truncation,
  body-less fragments.

enrich.MistralLLMEnricher
- Takes ai.Client + Scraper (both injectable; tests use stubs).
- Prompt: English system instructions asking for JSON-only output
  with category/opening_hours/description in German. User prompt
  includes markt identifiers, already-filled fields (so the LLM
  doesn't waste tokens re-deriving them), and scraped blocks.
- Tests: happy path, all-scrapes-fail (-> ErrNoScrapedContent),
  partial-scrape-success, empty LLM fields yield no provenance,
  URL cap at 5.

Service.RunLLMEnrichOne + handler POST /admin/discovery/queue/:id/
enrich (sync, 30s timeout). NewService gains llm enrich.LLMEnricher
param; routes.go constructs a MistralLLMEnricher when ai.Client is
enabled, falls back to NoopLLMEnricher otherwise.

UI: per-row AI button next to Similar, tracks per-row pending state
via a Set<string>, disables the button while the request is in
flight and shows "AI..." label. Success invalidates the page, the
row's expanded view picks up the new category/opening_hours/
description fields with llm provenance tags. Inline error message on
the row if the enrich action fails.
2026-04-24 10:46:28 +02:00
afe9d916d6 feat(discovery): manual crawl-enrich-all button + payload display
Replaces the originally-planned async-worker design with operator-
triggered bulk runs (see memory/project_ship2_enrichment.md). Crawl-
enrichment is cheap enough to always run against the whole list but
runs only when the admin clicks — the flow stays predictable and the
crawl itself stays fast.

Endpoints
- POST /admin/discovery/enrichment/crawl-all — 202 + goroutine, mirrors
  the crawl pattern. Per-process CAS gate prevents concurrent runs.
- GET  /admin/discovery/enrichment/crawl-all-status — polled shape
  identical to /crawl-status for UI reuse.

Service RunCrawlEnrichAll iterates enrichment_status='pending' rows,
builds an enrich.Input from each, runs CrawlEnrich (consolidation +
Nominatim geocoding via the shared geocoder), and persists via
SetEnrichment(status=done). Per-row errors count toward Failed and
append to a bounded Errors slice; the pass never halts.

Enrich package refactor
- Enrichment, Sources, Provenance constants moved from discovery ->
  enrich (they are the enrich package's own types; discovery previously
  held them for historical reasons).
- CrawlEnrich now takes a narrow enrich.Input / enrich.Contribution so
  the enrich package no longer imports the parent discovery package.
  This breaks the import cycle that appeared once discovery needed to
  call enrich (the MR 2 structure only worked because no caller went
  in that direction yet).
- LLMEnricher takes an LLMRequest (primitives) instead of a
  DiscoveredMarket. NoopLLMEnricher updated; real Mistral impl lands
  in MR 3b.
- CacheKey signature switched from (DiscoveredMarket) to primitive
  (nameNormalized, stadt, year).

Service geocoder wiring: discovery.NewService gains a Geocoder param
(routes.go passes the shared Nominatim client; the interface lives in
discovery to avoid another circular edge with enrich).

UI: "Run crawl-enrich" button next to "Run crawl"; identical poll +
summary card pattern. Queue row expand shows enrichment status badge
plus the PLZ/Venue/Organizer/Lat-Lng fields inline with per-field
provenance tag.

Tests: three new service tests (happy path, per-row SetEnrichment
failure, empty-queue no-op). Existing enrich package tests updated
for the primitive input signature. All 13 test NewService call-sites
updated for the new geocoder param.
2026-04-24 10:29:58 +02:00
65027ca9aa feat(discovery): sortable queue columns, default konfidenz desc
Admin queue table gains clickable sort on Markt, Stadt, Datum, Quellen
(count), and Konfidenz. Default on page load is konfidenz desc with
start_datum ASC NULLS LAST as the within-tier tiebreaker — operators
see highest-confidence, soonest-upcoming markets first. URL state
(?sort=&order=) is the single source of truth; F5 preserves, localStorage
is not used.

Backend: ListQueue takes (sortBy, order); repository builds ORDER BY
from a closed whitelist — konfidenz uses a CASE rank (hoch=3, mittel=2,
niedrig=1), quellen_count uses cardinality(quellen). Handler
normalisers reject anything off the whitelist and echo the effective
values in meta.sort / meta.order so the UI can render arrows. Unit
tests lock the emitted SQL per combination and assert raw input cannot
leak into ORDER BY.
2026-04-24 09:38:53 +02:00
52f3e4c009 chore: replace personal emails with contact@marktvogt.de 2026-04-21 10:56:07 +02:00
bddab60686 fix(admin): queue response uses meta envelope; UI reads total from meta
MR 6 backend returned {data, total, limit, offset} as siblings but the
shared ApiResponse<T> envelope only types the data field. The UI's load
function treated queueRes.data as a wrapper and read body.data (undefined)
as the row list. Result: empty queue in UI despite 1384 pending rows
in the DB.

Fix: backend moves total/limit/offset into meta (matches PaginationMeta
convention from web/src/lib/api/types.ts). UI casts to read the meta
slot alongside typed data.
2026-04-19 00:46:05 +02:00