marktvogt.de

Author	SHA1	Message	Date
vikingowl	2bb5156c0b	Merge branch 'feat/discovery-crawler-mr2' — Ship 1 MR 2 cutover Deletes the Mistral Pass 0 code path from discovery, flips the k8s CronJob to the crawler endpoint on a daily schedule, and adds a Run crawl button to the admin UI that renders CrawlSummary. Net change: ~-900 lines / +150 lines. Mistral remains wired for Pass 1 and Pass 2 research — only Pass 0 discovery is replaced by the deterministic 5-source Go crawler.	2026-04-18 17:49:08 +02:00
vikingowl	ba453a910f	chore(helm): daily discovery cron hits /crawl endpoint	2026-04-18 17:46:39 +02:00
vikingowl	3add4fb7ad	refactor(discovery): remove Mistral Pass 0 path; /crawl is canonical Deletes agent_client.go, agent_client_test.go, and the discovery-compare diagnostic CLI. Removes Tick/PickBuckets/processOneBucket/processBucketResponse from Service; renames NewServiceWithCrawler to NewService. Drops BatchSize, ForwardMonths, AgentDiscovery config fields and their env reads. PickStaleBuckets and UpdateBucketQueried removed from Repository interface (no callers). Stats hardcodes forwardMonths=12. /tick route removed; /crawl is now the only machine path, still protected by requireTickToken middleware.	2026-04-18 17:42:30 +02:00
vikingowl	a729412478	feat(admin): add Run crawl button and CrawlSummary rendering to discovery page	2026-04-18 17:29:05 +02:00
vikingowl	4c7c3dcb37	Merge branch 'feat/discovery-crawler' — DACH discovery crawler MR 1 Replaces Mistral Pass 0 with a deterministic 5-source Go crawler (marktkalendarium.de, mittelalterkalender.info, festival-alarm.com, mittelaltermarkt.online Tribe REST, suendenfrei.tv). Pass 1/2 enrichment paths unchanged. Existing Mistral Tick path preserved alongside; cutover gated on coverage verification via cmd/discovery-compare. Spec: docs/superpowers/specs/2026-04-18-dach-discovery-crawler-design.md Plan: docs/superpowers/plans/2026-04-18-dach-discovery-crawler.md	2026-04-18 17:03:27 +02:00
vikingowl	7c8a8c6419	fix(discovery): review follow-ups — konfidenz signal, end-date default, determinism, rate-limit=0 - Service.Crawl derives Konfidenz from merged source count + rank instead of hardcoded "mittel". Two+ sources -> "hoch"; single curated source -> "mittel"; single suendenfrei (prose regex) -> "niedrig". - New AgentStatus constant "crawler" replaces "bestaetigt" for crawler rows so the validator's agent-specific rules don't fire on them and operators can filter the queue by origin. Added Konfidenz* and AgentStatus* constants to model.go. - Default EndDatum to StartDatum when a source reports a single date (festival_alarm one-day events, suendenfrei lines without a "bis" range). Avoids Service.Accept rejecting nil-EndDatum rows. - Sort PerSource names before assembling raw events for merge — makes merged output order deterministic across runs. - NewHandler: manualRateLimitPerHour <= 0 now explicitly disables the rate limit (previously silently floored to 1/hour). Documented behavior for all three cases in a constructor comment. - Added four new tests for Service.Crawl failure/quality paths: LinkCheckFailed, DedupedQueue, EndDatum default, multi-source Konfidenz. - Documented the substring-match approximation in cmd/discovery-compare/main.go's groupCrawlerByBucket — diagnostic-only, not safe for production routing.	2026-04-18 16:35:26 +02:00
vikingowl	c5a4bc441c	feat(cmd): discovery-compare CLI for pre-cutover coverage verification	2026-04-18 16:08:48 +02:00
vikingowl	0bed4401fe	feat(config): crawler user-agent and manual rate-limit knobs	2026-04-18 15:50:21 +02:00
vikingowl	91cd4d89b3	feat(discovery): POST /admin/discovery/crawl with mutex and rate limit Exposes Service.Crawl via two HTTP routes: a bearer-token path that bypasses the manual rate limit, and an admin-session path subject to a configurable per-hour cap. A sync.Mutex blocks concurrent runs. Includes handler tests for mutex reentry and rate limit enforcement.	2026-04-18 15:22:24 +02:00
vikingowl	b3289bc6e6	feat(discovery): Service.Crawl — orchestrate crawler through existing pipeline Extract normalize helpers into discovery/normalize subpackage to break the otherwise circular import (discovery/crawler → discovery → crawler). NormalizeName/NormalizeCity in discovery become thin wrappers; merger.go switches to discovery/normalize directly. Adds crawlerRunner interface, NewServiceWithCrawler constructor, CrawlSummary/ SourceSummary types, and Service.Crawl which wires the crawler output through link-verify, dedup, validation, and insert — same pipeline as processBucketResponse but without a bucket context (BucketID is nil on crawler-produced rows).	2026-04-18 15:03:02 +02:00
vikingowl	20176dd51f	refactor(discovery): validator accepts *Bucket, skips bucket checks when nil	2026-04-18 14:43:07 +02:00
vikingowl	310673940e	feat(discovery): migration 000017 — nullable bucket_id; model uses *uuid.UUID	2026-04-18 14:30:54 +02:00
vikingowl	507052e375	feat(discovery/crawler): source config and RunAll orchestrator	2026-04-18 14:09:22 +02:00
vikingowl	c013f6bc54	feat(discovery/crawler): cross-source merger with source-rank tiebreaks	2026-04-18 13:40:21 +02:00
vikingowl	3aed982e1c	feat(discovery/crawler): log unparseable suendenfrei entries at INFO	2026-04-18 13:33:51 +02:00
vikingowl	2163621415	feat(discovery/crawler): suendenfrei.tv parser	2026-04-18 13:09:47 +02:00
vikingowl	94aa261c90	refactor(discovery/crawler): hoist land constants; document Tribe date format assumption	2026-04-18 13:04:28 +02:00
vikingowl	1cc7de0bb6	feat(discovery/crawler): mittelaltermarkt.online Tribe REST client	2026-04-18 12:53:32 +02:00
vikingowl	a55bb7e15b	docs(discovery/crawler): clarify unused year param in parseDateAttr	2026-04-18 12:48:38 +02:00
vikingowl	93efb90967	feat(discovery/crawler): festival-alarm.com parser	2026-04-18 12:36:01 +02:00
vikingowl	91c058105e	feat(discovery/crawler): mittelalterkalender.info parser	2026-04-18 12:24:49 +02:00
vikingowl	e6ec97c09d	feat(discovery/crawler): marktkalendarium.de parser	2026-04-18 12:12:13 +02:00
vikingowl	57120beac0	feat(discovery/crawler): polite HTTP fetcher with retry and 429 backoff	2026-04-18 12:02:47 +02:00
vikingowl	31fea6fa3c	test(discovery/crawler): add PLZ boundary + range coverage cases	2026-04-18 12:00:42 +02:00
vikingowl	eed76f1e76	docs(discovery/crawler): align PLZ helper comment with implementation	2026-04-18 11:57:34 +02:00
vikingowl	4694804331	feat(discovery/crawler): PLZ-to-land inference helper	2026-04-18 11:54:17 +02:00
vikingowl	e359d06d13	test(discovery/crawler): capture golden fixtures from five sources	2026-04-18 11:45:53 +02:00
vikingowl	5135f0a3be	feat(discovery/crawler): scaffold subpackage with Source interface and RawEvent types	2026-04-18 11:36:07 +02:00
vikingowl	adf417b731	fix(research): 429 aware error handling for Pass 1/2 Pass 1 and Pass 2 now detect Mistral web_search rate limits (shared with the Pass 0 CronJob) and return a proper HTTP 429 with Retry-After: 60 instead of a generic 500 "AI research failed". Pass 2 is enrichment-only, so rate-limits there fall through with pass1 results intact. - pkg/ai: new shared IsRateLimit helper + DefaultRetryAfterSeconds=60. discovery/service.go drops its local copy and imports the shared one. - apierror.TooManyRequests now accepts an optional custom message so the response body can include "try again in ~60s". - market/research.go: respondRateLimited helper sets Retry-After, downgrades the log line from ERROR to WARN (rate-limits are expected state, not a fault), and returns 429 with a structured rate_limited code the admin UI can key off of.	2026-04-18 10:33:13 +02:00
Christian Nachtigall	8e8bb8d4c3	Merge branch 'feature/discovery-validator' into 'main' feat(discovery): validator — catches agent self-contradictions before insert See merge request vikingowl/marktvogt.de!14	2026-04-18 08:05:20 +00:00
vikingowl	de1a3f6efb	feat(discovery): validator — catches agent self-contradictions before insert Pass 0 agents produce schema-valid but semantically wrong output: markets claimed in the wrong bundesland, status 'bestaetigt' with a hinweis about Vorjahresdaten, etc. The schema alone can't catch these. This validator does, as a blocking gate before InsertDiscovered. Checks (Pass 0 scope): - bundesland_mismatch: agent's bundesland must equal bucket.region, with a light normalizer for CH 'Kanton X' prefix so Phase B can refine the Schweiz seed without a signature break. - status_hinweis_inconsistent: if agent_status=='bestaetigt' AND hinweis contains 'vorjahr' (case-insensitive), the agent contradicted itself. Errors drop the market (counted as summary.validation_failed); warnings would get merged into hinweis — no warning-level checks exist yet at Pass 0 scope, placeholder reserved. Phase B (research agent) checks will extend this file: oeffnungszeiten dedup, start_datum window coverage, full quellen liveness for Pass 1.	2026-04-18 10:05:08 +02:00
Christian Nachtigall	fda30de158	Merge branch 'feature/discovery-pass0-halbmonat-plus-verify' into 'main' feature/discovery pass0 halbmonat plus verify See merge request vikingowl/marktvogt.de!13	2026-04-18 07:52:26 +00:00
vikingowl	cd836564f1	feat(discovery): Pass 0 halbmonat buckets + konfidenz/status + link verification Pass 0 splits every month into two halves (H1 = days 1-15, H2 = 16-EOM) so each agent call fits within Mistral's 4096 max_tokens budget. The response schema picks up richer per-market signals and dead agent URLs get filtered before they land in the admin queue. DB: - 000015: add halbmonat char(2) to discovery_buckets, widen unique key, backfill existing rows as H1 + insert H2 siblings (624 → 1248 rows). - 000016: rename discovered_markets.extraktion → konfidenz with best-effort value mapping (verbatim→hoch, abgeleitet→mittel); add agent_status column. Backend: - model: Bucket gains Halbmonat; Pass0Bucket same. Pass0Market renames Extraktion → Konfidenz and adds AgentStatus (JSON tag "status"). DiscoveredMarket mirrors both fields; queue-lifecycle Status column stays distinct from agent-reported AgentStatus. - repository: all SELECT/INSERT touched to use the new columns; picker orders by year_month, halbmonat so H1 runs before H2 in the same month. - agent client: prompt now injects halbmonat and recherche_datum (today) so the agent has explicit date context. - link verification: new LinkChecker does concurrent HEAD (GET fallback on 405) with a 5s timeout. FilterURLs runs before InsertDiscovered — markets whose quellen all fail are dropped and counted as link_check_failed in TickSummary. Failing website URLs are cleared but don't block insert. - Service.linkChecker is a narrow interface so tests inject a noop stub instead of hitting the network. Web: - DiscoveredMarket type gains konfidenz + agent_status, drops extraktion. - Queue column renames "Extraktion" → "Konfidenz" with three-level coloring (hoch=emerald, mittel=amber, niedrig=red, else neutral). - A small pill next to markt_name surfaces agent_status when it's not "bestaetigt" — red for "abgesagt", amber for "unklar" and "vorjahr_unbestaetigt" — so risky entries are obvious before accept.	2026-04-18 09:51:57 +02:00
Christian Nachtigall	1af97bda21	Merge branch 'feature/discovery-edit-and-sources' into 'main' feat(discovery): edit pending entries + surface quellen links See merge request vikingowl/marktvogt.de!12	2026-04-18 07:33:33 +00:00
vikingowl	bf72095348	feat(discovery): edit pending entries + surface quellen links Expanding any row in the discovery queue now reveals: - Quellen as clickable URLs (was just a count) - Hinweis if the agent emitted one - Inline edit form for markt_name, stadt, bundesland, start/end date, and website — the fields the Pass 0 agent gets wrong most often Backend: - PATCH /admin/discovery/queue/:id applies a partial update to pending entries via a COALESCE-based SQL update. Only fields that were set are written. - Service recomputes name_normalized when markt_name or stadt change so dedup stays consistent after edits. - Status check ensures only 'pending' entries are mutable. Web: - Row state $expandedId holds at most one open drawer at a time. - Dates round-trip through <input type="date"> using the shared dateInputValue helper; form action converts back to RFC3339 for Go. - Existing Accept/Reject buttons untouched — workflow is edit-then-accept.	2026-04-18 09:33:14 +02:00
Christian Nachtigall	a44005b694	Merge branch 'fix/discovery-rate-limit-and-polish' into 'main' fix(discovery): defer rate-limited buckets + polish queue table See merge request vikingowl/marktvogt.de!11	2026-04-18 07:21:24 +00:00
vikingowl	98eae40755	fix(discovery): defer rate-limited buckets + polish queue table Rate limits (Mistral web_search 429) used to get counted as hard errors, marking the bucket as queried and bumping the Errors(24h) strip — even though the right behavior is to wait and try again later. Backend: - isRateLimit() matches "rate limit" / "status 429" in the error string. - On persistent rate-limit after one 10s retry: leave last_queried_at unchanged (bucket stays eligible for next tick) and abort the remainder of this tick — Mistral's web_search budget is shared, no point hammering more buckets in the same batch. - TickSummary gains rate_limited counter; Errors stays for real failures. Frontend: - Dates: RFC3339 → 'DD.MM.YYYY' German format, range rendered as 'DD.MM.YYYY – DD.MM.YYYY'. - Queue table: cell horizontal padding, uppercase compact headers, scrollable on narrow viewports, dark-mode variants on every color (emerald/amber badges, link color, reject button), Region folds bundesland\|\|land into a single column (Land was always 'Deutschland' for DACH anyway).	2026-04-18 09:21:05 +02:00
Christian Nachtigall	e4ef4adad6	Merge branch 'fix/discovery-json-tags' into 'main' fix(discovery): add json tags to domain types See merge request vikingowl/marktvogt.de!10	2026-04-18 07:11:55 +00:00
vikingowl	b6ace52ada	fix(discovery): add json tags to domain types Without snake_case json tags, Go serializes fields as PascalCase (ID, MarktName, etc.) — but the Svelte frontend reads snake_case. Every row.id on the client was undefined, which made Svelte 5 see identical 'undefined' keys across the {#each queue as row (row.id)} loop and throw each_key_duplicate. Adds explicit snake_case tags to Bucket, DiscoveredMarket, and RejectedDiscovery to match what the TypeScript types already expect.	2026-04-18 09:11:44 +02:00
Christian Nachtigall	8f1efe73f2	Merge branch 'fix/discovery-cron-service-port' into 'main' fix(helm): CronJob curls the Service port, not the container port See merge request vikingowl/marktvogt.de!9	2026-04-18 07:00:26 +00:00
vikingowl	5a561b3092	fix(helm): CronJob curls the Service port, not the container port Service listens on port 80 (target: container 8080). The CronJob was curling :8080 directly, which isn't exposed by the Service — every tick timed out after ~135s with "Could not connect to server". Switch to {{ .Values.service.port }} so the template always tracks the actual Service port.	2026-04-18 09:00:16 +02:00
Christian Nachtigall	1252044d60	Merge branch 'fix/discovery-empty-card-dark' into 'main' fix(web): dark-mode variants for discovery empty-queue card See merge request vikingowl/marktvogt.de!8	2026-04-18 06:53:43 +00:00
vikingowl	173e7c5013	fix(web): dark-mode variants for discovery empty-queue card	2026-04-18 08:53:25 +02:00
Christian Nachtigall	ce75448f1e	Merge branch 'fix/discovery-empty-slice-null' into 'main' fix(discovery): render empty queue as [] not null (500 on empty prod) See merge request vikingowl/marktvogt.de!7	2026-04-18 06:44:52 +00:00
vikingowl	14e1a36622	fix(discovery): render empty queue as [] not null (500 on empty prod) Go's nil slice marshals as JSON null, not [], which crashed the Svelte page's .length access on fresh installs where no discovery tick has happened yet. Reproduced in production: /admin/discovery → 500 because data.queue was null and {queue.length} dereferenced it. Backend: initialize every returning slice in repository.go via make([]T, 0) so zero rows serialize as [] consistently. Also applies to PickStaleBuckets, ListSeriesByCity, and Stats.RecentErrors. Web: coalesce data.queue / data.stats.recent_errors at the top of the Svelte script with `?? []` so future nil-slice regressions don't take the whole page down.	2026-04-18 08:44:17 +02:00
Christian Nachtigall	134cc9726b	Merge branch 'feature/discovery-admin-stats' into 'main' feat(discovery): admin stats strip + sidebar nav link See merge request vikingowl/marktvogt.de!6	2026-04-18 06:35:10 +00:00
vikingowl	b7670b6152	feat(discovery): admin stats strip + sidebar nav link Surfaces CronJob health signals without needing kubectl: last tick time (stale-amber if > 6h), buckets due now, errors in the last 24h (with an expandable list of the most recent failing buckets), and queue size. Also wires the previously-orphaned /admin/discovery route into the admin sidebar next to Märkte. - backend: new GET /admin/discovery/stats endpoint; Stats + BucketError types; repository Stats() aggregates four counters + top 5 failing buckets. - web: +page.server.ts fetches stats in parallel with queue; +page.svelte renders a 4-card strip above the queue table.	2026-04-18 08:34:34 +02:00
Christian Nachtigall	2debc15bc7	Merge branch 'fix/discovery-cron-podsecurity' into 'main' fix(helm): add restricted PodSecurity settings to discovery CronJob See merge request vikingowl/marktvogt.de!5	2026-04-18 06:27:52 +00:00
vikingowl	1ba8f856b4	fix(helm): add restricted PodSecurity settings to discovery CronJob Previous deploys emitted 4 warnings on the discovery-tick Pod template against the restricted:latest policy. Today they are warnings; if the namespace enforcement tightens, admission will silently drop the Pod. Pod-level: runAsNonRoot, runAsUser/runAsGroup 100 (curlimages/curl's built-in non-root UID), seccompProfile RuntimeDefault. Container-level: allowPrivilegeEscalation false, capabilities drop ALL.	2026-04-18 08:26:40 +02:00
Christian Nachtigall	0a408a40ba	Merge branch 'ci/discovery-helm-vars' into 'main' ci(deploy): wire AI_AGENT_DISCOVERY and DISCOVERY_TOKEN into helm upgrade See merge request vikingowl/marktvogt.de!4	2026-04-18 06:17:23 +00:00

1 2 3 4 5

213 Commits