feat(security): close audit waves 1-4 (C1-C6, H1, H2, H4, H11, H13, H14, H16) #1
Reference in New Issue
Block a user
Delete Branch "feat/security-audit-waves-1-4-remediation"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Implements the four-wave remediation plan from
planning/19-security-audit-2026-04-30.md. All Critical findings (C1-C6) plus Wave 1-4 Highs (H1, H2, H4, H11, H13, H14, H16) closed in one branch. PoC security tests added per wave; full backend test suite green; helm chart lints clean; golangci-lint shows 0 issues.Wave summary
pkg/safehttpblocks SSRF; promptguard German rules + NFKC + Cf-strip;ai.BudgetGateenforcesAI_DAILY_CAP_USDpre-callTest plan
go build ./...cleango test ./...18 packages pass; 8 new*_security_test.gofilesgo vet ./...cleangolangci-lint0 issueshelm lint helm/marktvogt0 failureshelm template helm/marktvogtrenders both NetworkPoliciescmd/totp-encryptto backfill any pre-migration TOTP rows; OAuth tokens self-rotate on next refreshAPP_TRUSTED_PROXIESto ingress CIDR (currently empty = no proxy trust); setAI_DAILY_CAP_USD(currently 0 = no cap)Notes
server/routes.go:60-61. C1/C2 fixes ensure correctness when re-enabled — nothing reachable onmainis regressed.planning/19-security-audit-2026-04-30.md.VPA was added to per-service charts (backend/deploy/helm, web/deploy/helm) on 2026-04-20 but lost when those charts were deleted in the 2026-04-28 monolithic chart migration. The orphan branch gitlab/feat/helm-vpa-off-mode never made it into helm/marktvogt/. Restores VPA gated under <svc>.vpa.enabled (default false), updateMode "Off" so the recommender observes without eviction. Activate via: helm upgrade --reuse-values --set backend.vpa.enabled=true \ --set web.vpa.enabled=true After ~1 week of recommender data, decide: tune resources.requests manually, or flip updateMode to "Auto" (requires PDB + replicaCount>=2). When flipping to Auto with HPA on CPU, drop "cpu" from controlledResources to avoid the HPA+VPA-on-same-metric anti-pattern. Audit finding M1.RevokeSession, RevokeSessionsByFamilyID, DeleteUserSessions, RevokeOtherSessions, and ConsumeRefreshToken updated revoked_at in Postgres but did not invalidate the valkey access-token cache. The cache serves the original Session JSON (RevokedAt: null) until its TTL expires (JWT_ACCESS_TTL = 2h), so logout / admin-revoke / refresh-reuse-detection took up to 2h to actually invalidate. Fix: each revocation path now uses RETURNING access_token_hash and DELs the cache key via new helper invalidateCachedSessions. revokeBulk handles multi-row revocations. Adds three router-level negative tests for the admin auth chain (RequireAuth + RequireRole("admin")): - TestAdminChain_UserRole_Returns403 — user role rejected with 403 - TestAdminChain_AdminRole_Passes — admin role accepted - TestAdminChain_NoBearerToken_Returns401 — missing token rejected with 401 (auth runs before role check) Repository-level regression test for the cache invalidation requires real Valkey + Postgres, currently not in test harness — flagged as TODO in planning/18-security-threat-model.md. Audit findings H1, E (negative tests for session validation, authz).