Implements the remediation pass described in
planning/19-security-audit-2026-04-30.md. All Critical findings and the
Wave 1-4 High findings are closed; PoC tests added; full backend test
suite green; helm chart lints clean.
Wave 1 - Auth & identity
- C1 OAuth state nonce: PutOAuthState / ConsumeOAuthState (valkey,
GETDEL single-use, 15min TTL); Callback rejects missing/forged/cross-
provider state before token exchange.
- C2 OAuth identity linking: refuse silent linking to existing user
unless info.EmailVerified is true. fetchGitHubUser now consults the
/user/emails endpoint for the verified flag (no more hardcoded true);
fetchFacebookUser sets EmailVerified=false (FB exposes no per-email
verification flag).
- H1 Magic-link verify: replaced Get + MarkUsed with a single atomic
UPDATE...RETURNING (ConsumeMagicLink) - TOCTOU-free.
- H2 TOTP code replay: MarkTOTPCodeConsumed (valkey SET NX, 120s TTL)
prevents replay of a successfully validated code; fails closed on
transient store errors.
- H3 Backup-code orphan: DisableTOTP now also wipes totp_backup_codes.
Wave 2 - Middleware & network
- C3 CORS/CSRF regex anchoring: NewCORSConfig wraps each pattern with
\A...\z so substring spoofing of origins is impossible.
- H4 ClientIP: server reads APP_TRUSTED_PROXIES; gin SetTrustedProxies
is called explicitly (empty default = no proxy trust).
- H11 Body limit + DisallowUnknownFields: BodyLimitBytes middleware
(1 MiB default) wraps every request; validate.BindJSON now uses a
json.Decoder with DisallowUnknownFields and rejects trailing tokens;
413 envelope on body-limit overflow.
- H16 NetworkPolicy: backend.networkPolicy.enabled defaults to true;
new web-networkpolicy.yaml restricts web pod ingress to nginx-gateway
and egress to backend service + DNS + 443.
Wave 3 - Encryption at rest
- C4 TOTP secrets: CreateTOTPSecret writes encrypted secret_v2;
GetTOTPSecret prefers v2 with legacy fallback.
- C5 OAuth tokens: migration 000033 adds *_v2 columns; CreateOAuthAccount
and UpdateOAuthTokens write encrypted; GetOAuthAccount reads v2 with
legacy fallback.
- M1 Domain separation: crypto.DeriveKeyFor(secret, purpose) replaces
single-purpose DeriveKey; settings, totp, oauth each use a distinct
HKDF-derived subkey. DeriveKey kept as back-compat alias for settings.
Wave 4 - Input & AI safety
- C6 SSRF: new pkg/safehttp refuses to dial RFC1918, loopback, link-
local, ULA, multicast, unspecified, or cloud-metadata IPs; scheme
allowlist (http/https). Wired into pkg/scrape, discovery LinkChecker,
and imageURLReachable. NewForTesting opt-in for httptest.
- H13 PromptGuard German + Unicode: NFKC + Cf-class strip pre-pass
closes zero-width and full-width-homoglyph bypasses; new German rules
for ignoriere/missachte/vergiss/role-escalation/prompt-exfil/verbatim;
Gemma-style and pipe-delimited chat-template tokens covered;
source-fence rule prevents '=== Quelle:' splice in scraped text.
- H14 BudgetGate: new ai.BudgetGate interface; UsageRepo.CheckBudget
reads today's SUM(estimated_cost_usd) (10s cache) and refuses calls
when AI_DAILY_CAP_USD is exceeded; GeminiProvider.Chat checks the
gate before contacting Gemini.
OAuth routes remain disabled in server/routes.go, so C1/C2 are not
actively reachable today; fixes ensure correctness when re-enabled.
Replaces the Python pre-commit framework (.pre-commit-config.yaml) with
husky 9, kept faithful to the existing checks:
- Block direct commits to main (was: no-commit-to-branch).
- git diff --cached --check covers trailing-whitespace and merge-conflict
marker detection.
- Custom large-file check (>500KB), excluding crawler test fixtures.
- Backend Go: gofmt -l (fail on diff), go vet, golangci-lint — only when
backend/*.go is staged.
- Backend deps: go mod tidy -diff — only when go.mod/go.sum is staged.
- Web: prettier --check, eslint, svelte-check — only when web/ is
staged.
lint-staged was intentionally not adopted — the previous config ran
hooks tree-wide (pass_filenames: false), so per-file optimisation would
be a behaviour change.
Install: pnpm install at repo root (the prepare script wires husky into
.git/hooks via core.hooksPath=.husky/_).
RevokeSession, RevokeSessionsByFamilyID, DeleteUserSessions,
RevokeOtherSessions, and ConsumeRefreshToken updated revoked_at in
Postgres but did not invalidate the valkey access-token cache. The cache
serves the original Session JSON (RevokedAt: null) until its TTL expires
(JWT_ACCESS_TTL = 2h), so logout / admin-revoke / refresh-reuse-detection
took up to 2h to actually invalidate.
Fix: each revocation path now uses RETURNING access_token_hash and DELs
the cache key via new helper invalidateCachedSessions. revokeBulk handles
multi-row revocations.
Adds three router-level negative tests for the admin auth chain
(RequireAuth + RequireRole("admin")):
- TestAdminChain_UserRole_Returns403 — user role rejected with 403
- TestAdminChain_AdminRole_Passes — admin role accepted
- TestAdminChain_NoBearerToken_Returns401 — missing token rejected with
401 (auth runs before role check)
Repository-level regression test for the cache invalidation requires
real Valkey + Postgres, currently not in test harness — flagged as TODO
in planning/18-security-threat-model.md.
Audit findings H1, E (negative tests for session validation, authz).
First security audit of the marktvogt backend. Covers custom auth,
sessions, OAuth, TOTP+backup codes, magic links, admin endpoints,
LLM pipeline, and AI-cost endpoints.
Findings:
- H1: revoked sessions stay valid in valkey cache (FIXED in this branch)
- H2: prompt-injection via aggregator scrapes (MITIGATED via promptguard)
- H3: no threat-model artefact existed (RESOLVED by this doc)
- M1: VPA lost in monolithic-chart migration (FIXED in this branch)
- M2: fmt.Printf used for valkey-cache failure (FIXED in this branch)
- M3: AI per-day cost cap is logged-only, not enforcing (OPEN)
- M4: replicaCount=1 + PDB disabled for backend+web (OPEN)
- M5: no body-tampering tests on admin discovery endpoints (OPEN)
- L1: panic in startup paths (acceptable, documented)
- I1: Stripe pre-implementation guards (verify-first, idempotency)
Audit finding H3.
VPA was added to per-service charts (backend/deploy/helm, web/deploy/helm)
on 2026-04-20 but lost when those charts were deleted in the 2026-04-28
monolithic chart migration. The orphan branch gitlab/feat/helm-vpa-off-mode
never made it into helm/marktvogt/.
Restores VPA gated under <svc>.vpa.enabled (default false), updateMode
"Off" so the recommender observes without eviction. Activate via:
helm upgrade --reuse-values --set backend.vpa.enabled=true \
--set web.vpa.enabled=true
After ~1 week of recommender data, decide: tune resources.requests
manually, or flip updateMode to "Auto" (requires PDB + replicaCount>=2).
When flipping to Auto with HPA on CPU, drop "cpu" from
controlledResources to avoid the HPA+VPA-on-same-metric anti-pattern.
Audit finding M1.
New pkg/promptguard.Sanitize strips known structural injection patterns
(role labels, override directives, chat-template tokens, llama tokens,
prompt-exfil) from third-party scraped content before it reaches Gemini.
Wired into both LLM call sites:
- discovery/enrich.ProviderLLMEnricher.EnrichMissing (per-source quellen)
- market/research.buildUserPrompt (quellePage title + text)
Defense-in-depth on top of existing structural framing (JSON envelope in
research, JSON-Schema constrained decoding in enrich_b).
Audit finding H2.