Commit Graph

354 Commits

Author SHA1 Message Date
208f76f9cc docs(claude): refresh post-migration — somegit, Woodpecker, helm/marktvogt, Bun 2026-04-28 17:11:35 +02:00
d293dd9182 fix(ci): use --set-string for image tags (avoids float coercion of all-digit SHAs)
All checks were successful
ci/someci/push/backend Pipeline was successful
ci/someci/push/web Pipeline was successful
2026-04-28 17:04:54 +02:00
1539879098 fix(ci): retry helm upgrade on race + revert workflow depends_on
Some checks failed
ci/someci/push/backend Pipeline was successful
ci/someci/push/web Pipeline failed
depends_on: [backend] in web.yaml blocked web from triggering on
single-subtree web commits (backend's path filter excluded it from the
pipeline, so web's dependency was never satisfied; web never ran).

Switching to retry loop: 3 attempts with 30s backoff. When a cross-subtree
commit triggers both pipelines, the loser of the helm release-lock race
sleeps and retries — works whether or not the other workflow ran.
2026-04-28 17:01:40 +02:00
99a0d13ab9 chore(ci): web workflow depends_on backend to serialize helm upgrades 2026-04-28 16:58:36 +02:00
f2973dc905 fix(ci): drop --install flag (release exists; helm 4.1 errors on it)
Some checks failed
ci/someci/push/backend Pipeline was successful
ci/someci/push/web Pipeline failed
2026-04-28 16:40:52 +02:00
36db6f08ed fix(ci): use --reuse-values instead of --reset-then-reuse-values
Some checks failed
ci/someci/push/web Pipeline failed
ci/someci/push/backend Pipeline failed
helm 4.1 in alpine/helm:4.1 errored 'release: already exists' on the latter
flag despite the release being deployed. --reuse-values is the older,
universally-supported variant — preserves the other service's image tag
between pipeline runs the same way.
2026-04-28 16:39:16 +02:00
75a626b127 chore: switch CI to monolithic chart, delete old per-service charts
Some checks failed
ci/someci/push/backend Pipeline failed
ci/someci/push/web Pipeline failed
CI deploy steps now target helm/marktvogt with --reset-then-reuse-values,
preserving the other service's image tag across pipeline runs. Each pipeline
sets only its own X.image.tag.

App-level secrets (smtp/turnstile/discovery/ai/JWT/oauth) moved out of CI's
--set chain in the previous phase — now pre-created via
scripts/k8s-secrets-sync.sh from .env.helm. The chart's conditional secret
templates remain for backward-compat with the live release's stored values
but will be removed in a follow-up once those values are cleared.

Old per-service chart directories deleted; only the monolithic
helm/marktvogt/ remains.

MIGRATION.md updated with the actual procedure that worked, including the
several pitfalls hit during the live tenant-2 migration on 2026-04-28
(helm uninstall trap, SSA field-manager swap for CRDs, kyverno hostname
allowlist for new subdomains).
2026-04-28 16:33:53 +02:00
d3982c1d73 feat(helm): add monolithic marktvogt chart + secrets sync script
New unified helm chart at helm/marktvogt/ that combines backend (Go API,
Postgres, Dragonfly, migrate hook, discovery cron) and web (SvelteKit SSR)
into a single release. Replaces the per-service charts at backend/deploy/helm
and web/deploy/helm — kept in place until the live migration is verified
(see helm/marktvogt/MIGRATION.md).

Selector labels and resource names match the existing per-service charts
exactly so migration is by re-annotation rather than recreate; CNPG cluster
and Dragonfly survive the cutover with no data loss.

Adds scripts/k8s-secrets-sync.sh + .env.helm.example for reproducible
out-of-band secret creation. .env.helm itself is gitignored.
2026-04-28 15:57:30 +02:00
ae8e06c7b0 fix(ci): revert corepack on node:25-alpine (no longer bundled)
All checks were successful
ci/someci/push/web Pipeline was successful
2026-04-28 15:31:13 +02:00
08aa34141f chore(ci): split web install/check, corepack, pin platforms
Some checks failed
ci/someci/push/web Pipeline failed
ci/someci/push/backend Pipeline was successful
2026-04-28 15:25:06 +02:00
5b34252132 feat(web): add explicit /healthz endpoint for liveness/readiness probes
All checks were successful
ci/someci/push/web Pipeline was successful
2026-04-28 15:13:48 +02:00
921f329dab chore(web): switch runtime to bun (drop-in node replacement, ~50MB lighter)
All checks were successful
ci/someci/push/web Pipeline was successful
2026-04-28 14:51:17 +02:00
2758e315a8 fix(ci): prune step keeps 8-char SHAs and guards against empty history
All checks were successful
ci/someci/push/web Pipeline was successful
ci/someci/push/backend Pipeline was successful
2026-04-28 14:48:07 +02:00
2acfeed12e chore(ci): prune old SHA-tagged images, keep last 10 per pipeline
All checks were successful
ci/someci/push/web Pipeline was successful
ci/someci/push/backend Pipeline was successful
2026-04-28 14:37:13 +02:00
ec76bc2528 fix(ci): cd web once in check step (cwd persists across commands)
All checks were successful
ci/someci/push/web Pipeline was successful
2026-04-28 14:26:50 +02:00
c857544a12 chore(ci): drop environment from plugin steps to satisfy schema
Some checks failed
ci/someci/push/web Pipeline failed
ci/someci/push/backend Pipeline was successful
2026-04-28 14:21:54 +02:00
fbaa598ae7 chore(ci): switch woodpecker pipelines to plugin-docker-buildx
Some checks failed
ci/someci/push/web Pipeline failed
ci/someci/push/backend Pipeline failed
2026-04-28 14:18:58 +02:00
8fd3e53fe6 chore(ci): add woodpecker pipelines for backend and web 2026-04-28 14:04:16 +02:00
5f96daf7f3 feat(market): admin edit link and public feedback form 2026-04-28 13:43:22 +02:00
f30a963329 fix(market): marshal empty merge-plan buckets as [] not null
Nil slices in MergePlan.AutoApply/ReviewRequired/Rejected serialized to
JSON null, causing the admin research panel to crash with
"can't access property 'map', plan.review_required is null". Initialize
the buckets as empty slices so the wire contract is always an array.
Tightened the empty-buckets test to assert the JSON shape.
2026-04-28 13:09:26 +02:00
3d62ba9526 chore(web): integrate @typescript/native-preview (tsgo) for type-checking
Adds the native (Go) TypeScript compiler as a devDep and routes
svelte-check through it via --tsgo. Local pnpm run check goes from
~5s to ~3s on this codebase; pre-commit hook inherits the speed
automatically.

The linux-x64 prebuilt is a statically-linked Go binary (~25MB), so
the alpine builder in web/Dockerfile installs it cleanly even though
it never invokes svelte-check during the image build.
2026-04-28 12:56:41 +02:00
2de9bdf6c3 chore(db): backfill historical ai_usage costs after pricing fix
Re-prices every existing ai_usage row using the correct $/1M token rates
per model family. CASE clauses ordered specific-first (flash-lite before
flash) to mirror the longest-prefix-match in priceFor(). Aliases
(gemini-*-latest) resolve to the 2.5 family, the only one in production
during the affected window.

The grounding-fee component ($35/1k above 1500/day free tier) is not
recomputed: historical traffic shows zero grounded calls in the window,
so the bumper would be 0. Down is a no-op (irreversible by design — the
original miscalculated values are not preserved).
2026-04-28 12:56:32 +02:00
ba4dce1f76 fix(ai): per-model cost calc + thinking toggle and token tracking
estimateCost ignored the model name and billed every Gemini call at
hardcoded flash-lite rates ($0.10 / $0.40 per 1M), under-counting Pro
calls by ~12-25x. Switch to priceFor(model) and prefer resp.ModelVersion
so aliases like gemini-pro-latest resolve to their concrete family.

Capture ThoughtsTokenCount as a separate ThinkingTokens column on
ai_usage (migration 000030) and bill it at the output rate.

Add a global thinking on/off toggle that mirrors the grounding pattern:
provider holds an in-memory cache (read at startup from settings.Store),
handler keeps it in sync, Chat() applies ThinkingConfig.ThinkingBudget=0
only when disabled. Default true preserves SDK behavior. Grounding+
thinking get/set helpers folded into shared getBool/setBool to keep
goconst happy.

Web admin settings: new "Modell-Reasoning" toggle card; usage panel sums
include thinking tokens. Types are optional with `?? 0` defaults so a
brief web-before-backend rollout window cannot render NaN.
2026-04-28 12:56:04 +02:00
34a3da6e8b fix(auth): include legacy expires_at column in session INSERT
The original sessions table has expires_at TIMESTAMPTZ NOT NULL with no
default. Migration 000027 added the new columns but did not drop this one,
so CreateSession must still supply a value. Using AbsoluteExpiresAt.
2026-04-26 14:10:19 +02:00
bf4d8eb71d chore(settings): update stale JWT_SECRET comment to APP_SECRET 2026-04-26 13:57:19 +02:00
38401ca802 docs(app): auth migration notes for Flutter interceptor update 2026-04-26 13:39:59 +02:00
c6cdc11693 feat(auth): D5 cleanup + W3 web refresh UX
D5 — backend cleanup:
- Migration 000029 drops legacy token_hash column from sessions
- JWT_SECRET renamed to APP_SECRET (fallback + deprecation warning)

W3 — web session UX:
- AuthData type: session_token→refresh_token, remove expires_in
- cookies.ts: refresh_token cookie, non-HttpOnly access_expires_at
- client.server.ts: sends X-Refresh-Token header (not JSON body)
- hooks.server.ts: simplified two-path SSR refresh logic
- refresh.ts: single-flight client-side refresh
- client.ts: proactive refresh + 401 retry on non-auth paths
- /api/auth/refresh: SvelteKit proxy for HttpOnly cookie refresh
- OAuth callback, Datenschutz page updated to new cookie names
2026-04-26 13:25:48 +02:00
515a72e6e8 feat(auth): D4 TOTP backup codes + session management
- Backup codes: 10 × Crockford base32 (XXXXX-XXXXX), SHA-256 hashed,
  single-use; regenerate requires current TOTP code
- Login accepts BackupCode field alongside TOTPCode
- Session management: list, revoke-by-id (ownership-checked),
  revoke-all-except-current; password change revokes other sessions
- New routes: POST /auth/2fa/backup-codes/regenerate,
  GET /auth/sessions, DELETE /auth/sessions, DELETE /auth/sessions/:id
- fakeRepo extended with backup code + session management stubs
- Tests cover: code format/count, hash storage, regen invalidates old,
  login with valid/used code, session list isolation, revoke ownership,
  password change session revocation
2026-04-26 12:33:47 +02:00
492bbb350e feat(auth): D2/D3 opaque-token session model — drop JWT
Replace HS256 JWT access tokens with two opaque 32-byte random tokens
(access + refresh), both stored as SHA-256 hashes in sessions + Valkey.

Key changes:
- GenerateOpaqueToken() replaces JWT issuance; TokenService removed
- Sessions now carry access_token_hash, refresh_token_hash, family_id,
  parent_session_id, access_expires_at, absolute_expires_at, last_used_at,
  revoked_at — per migration 000027 (updated to add access_expires_at)
- Refresh rotation is atomic (UPDATE...RETURNING); reuse detection kills
  the entire token family and returns auth.refresh_reuse_detected
- RequireAuth/OptionalAuth now take SessionLookup (Valkey→Postgres) instead
  of *TokenService; sets session_id in context alongside user_id
- last_used_at is bumped on each request, throttled to writes >60s old
- AuthConfig{AccessTTL,RefreshIdleTTL,RefreshAbsoluteTTL} replaces JWT TTL env
  vars (AUTH_ACCESS_TTL=30m, AUTH_REFRESH_IDLE_TTL=168h, AUTH_REFRESH_ABSOLUTE_TTL=720h)
- JWT_SECRET kept for AI-settings key derivation (drops from auth flow)

Forced logout on deploy (D3 behaviour); pre-launch so acceptable.
2026-04-26 12:15:57 +02:00
0997d4befa feat(auth): D1 non-breaking security foundations
- CORS: rewrite middleware with Vary: Origin, regex origin patterns,
  startup validation, and prod boot-fail on empty allowlist; shared
  CORSConfig exported for CSRF reuse
- CSRF: new Origin/Referer check middleware sharing CORS allowlist;
  Bearer-token clients exempt; mounts globally after CORS
- Argon2id: new password package with PHC format, bcrypt dispatch, and
  NeedsRehash; lazy upgrade on login in auth service
- Rate limiting: add RateLimitByKey with custom key function; apply
  per-route limits to /auth/login, /refresh, /2fa/verify,
  /auth/magic-link, and /auth/password
- apierror: add CSRFMismatch and RefreshReuse error constructors
- Migrations: 000027 (session model schema columns for D2/D3),
  000028 (TOTP secret_v2 column + totp_backup_codes table)
- cmd/totp-encrypt: one-shot job to encrypt existing TOTP secrets
2026-04-26 11:54:37 +02:00
49a31bca02 chore: add .worktrees/ to .gitignore 2026-04-26 11:36:27 +02:00
24dc46eeb8 fix(merge-plan): snapshot proposal prop to avoid structuredClone proxy throw
structuredClone on a Svelte 5 reactive Proxy throws DataCloneError during
component init, causing MergeProposalPanel to silently fail to mount.
Replace with \$state.snapshot which is the documented way to deep-copy a
reactive prop into a local editable state.
2026-04-26 00:28:21 +02:00
131d8c8ff0 fix(merge-plan): extend poll timeout to 270s + guard undefined proposal
Frontend budget was 180s — equal to the backend goroutine cap — so a race
determined which side timed out first. Bumped to 270s to guarantee the frontend
outlasts the backend's 3-minute window.

Added explicit null guard on result.proposal: if the LLM ever returns a
done-status without a proposal body the UI now surfaces a clear error instead
of silently assigning undefined (which kept the panel hidden with no feedback).

Also guards field_merges ?? {} in MergeProposalPanel to avoid Object.keys(null)
if the model returns a null map.
2026-04-26 00:11:31 +02:00
643ee77600 feat(merge-plan): convert to async polling to bypass nginx 60s timeout
POST /admin/markets/:id/merge-plan now returns 202 + job_id immediately
and runs the Gemini advisor in a detached goroutine. Frontend polls
GET .../merge-plan/:job_id until done, with backoff up to 3 minutes.

Adds in-memory job registry (keyed map + RWMutex, 5-min TTL sweep) and
handler tests covering the full pending→done and error paths.
2026-04-25 23:37:03 +02:00
caaad8adf4 fix(web): SSR calls use cluster-internal backend URL to bypass nginx timeout
All serverFetch calls were going to https://api.marktvogt.de (public
gateway), creating a second nginx hop for every SSR operation. Slow LLM
calls (merge-plan, research-plan) hit the 60s proxy_read_timeout.

- Add PRIVATE_API_BASE_URL=http://marktvogt-backend to web Helm config
- serverFetch now builds SERVER_API_BASE from PRIVATE_API_BASE_URL at
  runtime (falls back to PUBLIC_API_BASE_URL when not set)
- apiFetch accepts optional baseURL param; client-side calls unchanged
2026-04-25 22:25:31 +02:00
4916b0d6af fix(infra): increase gateway timeout for admin+market routes to 120s
Merge-plan and research-plan both call Gemini which can take >60s.
The default gateway timeout was killing connections with 504.

- Web HTTPRoute: add /admin/ rule with 120s request+backendRequest timeout
- Backend HTTPRoute: add /api/v1/admin/markets/ rule with 120s timeout
- MergePlan handler: add 110s context deadline for graceful degradation
  before the gateway cuts the upstream connection
2026-04-25 22:03:20 +02:00
e6445b5db8 fix(dedup): wire merge advisor JSON schema + flexible field_merges parser
Gemini returned field_merges as an array without structure constraint,
causing json.Unmarshal to fail with "cannot unmarshal array into Go struct
field of type map[string]mergeFieldDecision".

- Pass merge_advisor_schema.json via JSONSchema instead of bare JSONMode
- Add parseFieldMerges() that accepts both object and array LLM formats
- Validate target_id is one of the two input market IDs after parsing
- Fix schemaFromMap: minimum/maximum are supported by genai.Schema v1.54
2026-04-25 21:33:58 +02:00
3d922e50bf perf(admin): stream duplicate check async — don't block page render
LLM tiebreaker can take several seconds; return the duplicates fetch
as an unawaited Promise so the page renders immediately with market
data. Template uses {#await} to render the panel when it resolves.
2026-04-25 21:08:56 +02:00
11377b8463 fix(research): return full body from plan proxy, not res.data
The Plan handler returns {plan, research_result} directly without a
data wrapper. apiFetch casts the body to ApiResponse<T>, so res.data
was undefined, json(undefined) produced an empty response, and the
client either crashed (JSON.parse) or silently got a null plan.
2026-04-25 20:53:57 +02:00
73c30d2f5f feat(admin/dedup): merge UI + enrich enum fix + robust JSON parse
H1: Drop empty string from enricher_schema.json category enum —
Gemini rejects enum[7]: cannot be empty (Error 400). Remove category
from required so the model can omit it when no category fits.

H2: Research-plan/apply client reads response as text before
JSON.parse; empty or HTML error bodies now surface the actual HTTP
status instead of crashing with "unexpected end of data".

I: Dedup UI for approved markets:
- DuplicatesPanel: LLM verdict pills (same/not-same, confidence),
  llm_reason, per-candidate Merge-planen button
- MergeProposalPanel: summary, confidence, flags, per-field
  decisions with editable source radio (a/b/combined), current
  value context, confirm() before destructive apply
- Two SvelteKit proxy routes: merge-plan/ and merge-into/[targetId]/
- [id]/+page.svelte: wired with full state; navigates to survivor
  after successful merge
- [id]/+page.server.ts: load duplicates for all non-merged editions
  (was gated to status=rumored only)
- types.ts: DuplicateMarket gains llm_same/llm_confidence/llm_reason;
  add MarketMergeProposal + MergeFieldDecision; add merged to
  EditionStatus
2026-04-25 19:34:49 +02:00
77e150f122 feat(dedup): E5+E2+E2b — merge advisor LLM + merge-plan/merge-into endpoints
MergeAdvisor calls Gemini with a German system prompt to propose how to merge
two duplicate market editions. It guards against confident non-duplicates via
ErrNotDuplicate (same=false AND confidence>0.5).

POST /:id/merge-plan generates a MarketMergeProposal (read-only).
POST /:id/merge-into/:target_id applies the merge: updates target fields,
marks source as status=merged with merged_into_id set, reparents discovered_markets,
and writes a market_merge_log audit row — all in one transaction.

AdminHandler gains advisor and updated constructor. VersionMergeAdvisor added
to pkg/ai versions.
2026-04-25 19:05:52 +02:00
5a643098d1 feat(dedup): E3+E1 — merged status, LLM tiebreaker in FindDuplicates
Migration 000026 adds merged_into_id + merged_at to market_editions and
extends the status CHECK constraint to include 'merged'. FindSimilar now
excludes merged editions from candidates.

AdminHandler gains a SimilarityClassifier field; FindDuplicates enriches
the top 5 pg_trgm candidates with LLM same/confidence/reason verdicts.
simClassifier from routes.go is passed through to avoid a second instance.
2026-04-25 18:56:58 +02:00
9b308639fd feat(admin/ui): three-section merge plan UI + plan/apply proxy endpoints (D6) 2026-04-25 18:45:40 +02:00
65c8c4bf96 feat(research/merge): add merge planner, validators, plan+apply endpoints, audit log (D1-D5) 2026-04-25 18:39:01 +02:00
1b991518a4 feat(ai): warn on unsupported schema keys + enrich grounding gate
schemaFromMap now logs a warning when keys genai.Schema ignores
(pattern, minLength, $ref, etc.) are present, keeping the workaround
visible. LLMEnricher skips Google Search grounding when total scraped
chars >= 1500, conserving free-tier quota on content-rich pages.
2026-04-25 18:10:37 +02:00
66aee62646 feat(ai): add PromptHash to ProviderError + log on schema violation
promptHashShort(system+"\x00"+user)[:12] computed on ErrSchemaViolation
and attached to ProviderError.PromptHash. research.go schema-violation
log now includes prompt_hash for cross-referencing ai_usage rows.
2026-04-25 18:09:28 +02:00
ad1da8be66 feat(ai): add prompt_version to ai_usage + wire version constants
Migration 000024 adds prompt_version column + partial index.
PromptVersion plumbed through ChatRequest -> UsageEvent ->
buildUsageEvent -> settings INSERT/SELECT. Version constants
defined in ai/versions.go and wired at all three call sites.
2026-04-25 18:08:53 +02:00
69c6453e26 feat(similarity): confidence calibration anchors + Ronneburg failure-case fixtures (B4-B5)
- Add confidence scale (0.95-1.00 / 0.70-0.90 / 0.50-0.70 / 0.00-0.50)
  with four annotated few-shot examples to the similarity system prompt
- Add two Ronneburg real-world pairs to similarity.json: descriptive-prefix
  swap and low-trigram-overlap rename, both expected same=true
2026-04-25 18:00:39 +02:00
b25ae09bd2 feat(enrich): full category taxonomy, tighter description + opening_hours rules (B1-B3)
- Replace 3-example inline comment with 7-label taxonomy block so the
  model knows all valid categories instead of guessing from partial hints
- Tighten description constraint to 60-220 chars with explicit word bans
- Mark opening_hours as a rough guide, not authoritative for booking
2026-04-25 17:58:08 +02:00
f98ecf8790 fix(discovery): auto-trigger Pass B (LLM enrich) after post-crawl Pass A
Adds ListEnrichedNeedingLLM to the Repository interface and RunLLMEnrichBacklog
to Service, then wires RunLLMEnrichBacklog into the post-crawl goroutine so
LLM enrichment runs automatically after every crawl without manual triggers.
2026-04-25 17:53:56 +02:00