vikingowl/marktvogt.de

feat(security): close audit waves 1-4 (C1-C6, H1, H2, H4, H11, H13, H14, H16) #1

Merged

vikingowl merged 8 commits from feat/security-audit-waves-1-4-remediation into main

2026-05-01 00:02:10 +02:00

Author	SHA1	Message	Date
vikingowl	0d5788c951	docs(security): record waves 1-4 remediation status in audit report All checks were successful ci/someci/push/backend Pipeline was successful Details ci/someci/pr/backend Pipeline was successful Details	2026-04-30 23:41:56 +02:00
vikingowl	5821547a73	feat(security): close audit waves 1-4 (C1-C6, H1, H2, H4, H11, H13, H14, H16) Implements the remediation pass described in planning/19-security-audit-2026-04-30.md. All Critical findings and the Wave 1-4 High findings are closed; PoC tests added; full backend test suite green; helm chart lints clean. Wave 1 - Auth & identity - C1 OAuth state nonce: PutOAuthState / ConsumeOAuthState (valkey, GETDEL single-use, 15min TTL); Callback rejects missing/forged/cross- provider state before token exchange. - C2 OAuth identity linking: refuse silent linking to existing user unless info.EmailVerified is true. fetchGitHubUser now consults the /user/emails endpoint for the verified flag (no more hardcoded true); fetchFacebookUser sets EmailVerified=false (FB exposes no per-email verification flag). - H1 Magic-link verify: replaced Get + MarkUsed with a single atomic UPDATE...RETURNING (ConsumeMagicLink) - TOCTOU-free. - H2 TOTP code replay: MarkTOTPCodeConsumed (valkey SET NX, 120s TTL) prevents replay of a successfully validated code; fails closed on transient store errors. - H3 Backup-code orphan: DisableTOTP now also wipes totp_backup_codes. Wave 2 - Middleware & network - C3 CORS/CSRF regex anchoring: NewCORSConfig wraps each pattern with \A...\z so substring spoofing of origins is impossible. - H4 ClientIP: server reads APP_TRUSTED_PROXIES; gin SetTrustedProxies is called explicitly (empty default = no proxy trust). - H11 Body limit + DisallowUnknownFields: BodyLimitBytes middleware (1 MiB default) wraps every request; validate.BindJSON now uses a json.Decoder with DisallowUnknownFields and rejects trailing tokens; 413 envelope on body-limit overflow. - H16 NetworkPolicy: backend.networkPolicy.enabled defaults to true; new web-networkpolicy.yaml restricts web pod ingress to nginx-gateway and egress to backend service + DNS + 443. Wave 3 - Encryption at rest - C4 TOTP secrets: CreateTOTPSecret writes encrypted secret_v2; GetTOTPSecret prefers v2 with legacy fallback. - C5 OAuth tokens: migration 000033 adds *_v2 columns; CreateOAuthAccount and UpdateOAuthTokens write encrypted; GetOAuthAccount reads v2 with legacy fallback. - M1 Domain separation: crypto.DeriveKeyFor(secret, purpose) replaces single-purpose DeriveKey; settings, totp, oauth each use a distinct HKDF-derived subkey. DeriveKey kept as back-compat alias for settings. Wave 4 - Input & AI safety - C6 SSRF: new pkg/safehttp refuses to dial RFC1918, loopback, link- local, ULA, multicast, unspecified, or cloud-metadata IPs; scheme allowlist (http/https). Wired into pkg/scrape, discovery LinkChecker, and imageURLReachable. NewForTesting opt-in for httptest. - H13 PromptGuard German + Unicode: NFKC + Cf-class strip pre-pass closes zero-width and full-width-homoglyph bypasses; new German rules for ignoriere/missachte/vergiss/role-escalation/prompt-exfil/verbatim; Gemma-style and pipe-delimited chat-template tokens covered; source-fence rule prevents '=== Quelle:' splice in scraped text. - H14 BudgetGate: new ai.BudgetGate interface; UsageRepo.CheckBudget reads today's SUM(estimated_cost_usd) (10s cache) and refuses calls when AI_DAILY_CAP_USD is exceeded; GeminiProvider.Chat checks the gate before contacting Gemini. OAuth routes remain disabled in server/routes.go, so C1/C2 are not actively reachable today; fixes ensure correctness when re-enabled.	2026-04-30 23:41:48 +02:00
vikingowl	bef8657d81	chore(tooling): migrate pre-commit framework to husky Replaces the Python pre-commit framework (.pre-commit-config.yaml) with husky 9, kept faithful to the existing checks: - Block direct commits to main (was: no-commit-to-branch). - git diff --cached --check covers trailing-whitespace and merge-conflict marker detection. - Custom large-file check (>500KB), excluding crawler test fixtures. - Backend Go: gofmt -l (fail on diff), go vet, golangci-lint — only when backend/*.go is staged. - Backend deps: go mod tidy -diff — only when go.mod/go.sum is staged. - Web: prettier --check, eslint, svelte-check — only when web/ is staged. lint-staged was intentionally not adopted — the previous config ran hooks tree-wide (pass_filenames: false), so per-file optimisation would be a behaviour change. Install: pnpm install at repo root (the prepare script wires husky into .git/hooks via core.hooksPath=.husky/_).	2026-04-30 22:18:14 +02:00
vikingowl	dee4cee23c	fix(auth): invalidate valkey cache on session revoke RevokeSession, RevokeSessionsByFamilyID, DeleteUserSessions, RevokeOtherSessions, and ConsumeRefreshToken updated revoked_at in Postgres but did not invalidate the valkey access-token cache. The cache serves the original Session JSON (RevokedAt: null) until its TTL expires (JWT_ACCESS_TTL = 2h), so logout / admin-revoke / refresh-reuse-detection took up to 2h to actually invalidate. Fix: each revocation path now uses RETURNING access_token_hash and DELs the cache key via new helper invalidateCachedSessions. revokeBulk handles multi-row revocations. Adds three router-level negative tests for the admin auth chain (RequireAuth + RequireRole("admin")): - TestAdminChain_UserRole_Returns403 — user role rejected with 403 - TestAdminChain_AdminRole_Passes — admin role accepted - TestAdminChain_NoBearerToken_Returns401 — missing token rejected with 401 (auth runs before role check) Repository-level regression test for the cache invalidation requires real Valkey + Postgres, currently not in test harness — flagged as TODO in planning/18-security-threat-model.md. Audit findings H1, E (negative tests for session validation, authz).	2026-04-30 22:12:09 +02:00
vikingowl	b7c88dd86a	docs(planning): add security threat model and abuse-case audit First security audit of the marktvogt backend. Covers custom auth, sessions, OAuth, TOTP+backup codes, magic links, admin endpoints, LLM pipeline, and AI-cost endpoints. Findings: - H1: revoked sessions stay valid in valkey cache (FIXED in this branch) - H2: prompt-injection via aggregator scrapes (MITIGATED via promptguard) - H3: no threat-model artefact existed (RESOLVED by this doc) - M1: VPA lost in monolithic-chart migration (FIXED in this branch) - M2: fmt.Printf used for valkey-cache failure (FIXED in this branch) - M3: AI per-day cost cap is logged-only, not enforcing (OPEN) - M4: replicaCount=1 + PDB disabled for backend+web (OPEN) - M5: no body-tampering tests on admin discovery endpoints (OPEN) - L1: panic in startup paths (acceptable, documented) - I1: Stripe pre-implementation guards (verify-first, idempotency) Audit finding H3.	2026-04-30 22:11:54 +02:00
vikingowl	6181adbba4	feat(helm): port VerticalPodAutoscaler resources into monolithic chart VPA was added to per-service charts (backend/deploy/helm, web/deploy/helm) on 2026-04-20 but lost when those charts were deleted in the 2026-04-28 monolithic chart migration. The orphan branch gitlab/feat/helm-vpa-off-mode never made it into helm/marktvogt/. Restores VPA gated under <svc>.vpa.enabled (default false), updateMode "Off" so the recommender observes without eviction. Activate via: helm upgrade --reuse-values --set backend.vpa.enabled=true \ --set web.vpa.enabled=true After ~1 week of recommender data, decide: tune resources.requests manually, or flip updateMode to "Auto" (requires PDB + replicaCount>=2). When flipping to Auto with HPA on CPU, drop "cpu" from controlledResources to avoid the HPA+VPA-on-same-metric anti-pattern. Audit finding M1.	2026-04-30 22:11:43 +02:00
vikingowl	c2bcdf0881	feat(promptguard): redact prompt-injection patterns in LLM input New pkg/promptguard.Sanitize strips known structural injection patterns (role labels, override directives, chat-template tokens, llama tokens, prompt-exfil) from third-party scraped content before it reaches Gemini. Wired into both LLM call sites: - discovery/enrich.ProviderLLMEnricher.EnrichMissing (per-source quellen) - market/research.buildUserPrompt (quellePage title + text) Defense-in-depth on top of existing structural framing (JSON envelope in research, JSON-Schema constrained decoding in enrich_b). Audit finding H2.	2026-04-30 22:11:20 +02:00
vikingowl	c1430e66b0	fix(auth): use structured logger for valkey-cache failure Replace fmt.Printf to stderr with slog.Warn so cache-degradation events are captured in Loki and queryable. Audit finding M2.	2026-04-30 22:11:02 +02:00