marktvogt.de

Author	SHA1	Message	Date
vikingowl	75a626b127	chore: switch CI to monolithic chart, delete old per-service charts Some checks failed ci/someci/push/backend Pipeline failed Details ci/someci/push/web Pipeline failed Details CI deploy steps now target helm/marktvogt with --reset-then-reuse-values, preserving the other service's image tag across pipeline runs. Each pipeline sets only its own X.image.tag. App-level secrets (smtp/turnstile/discovery/ai/JWT/oauth) moved out of CI's --set chain in the previous phase — now pre-created via scripts/k8s-secrets-sync.sh from .env.helm. The chart's conditional secret templates remain for backward-compat with the live release's stored values but will be removed in a follow-up once those values are cleared. Old per-service chart directories deleted; only the monolithic helm/marktvogt/ remains. MIGRATION.md updated with the actual procedure that worked, including the several pitfalls hit during the live tenant-2 migration on 2026-04-28 (helm uninstall trap, SSA field-manager swap for CRDs, kyverno hostname allowlist for new subdomains).	2026-04-28 16:33:53 +02:00
vikingowl	4916b0d6af	fix(infra): increase gateway timeout for admin+market routes to 120s Merge-plan and research-plan both call Gemini which can take >60s. The default gateway timeout was killing connections with 504. - Web HTTPRoute: add /admin/ rule with 120s request+backendRequest timeout - Backend HTTPRoute: add /api/v1/admin/markets/ rule with 120s timeout - MergePlan handler: add 110s context deadline for graceful degradation before the gateway cuts the upstream connection	2026-04-25 22:03:20 +02:00
vikingowl	3ddfd87408	feat(ai): migrate to Google Gemini 2.5 Flash-Lite, drop Mistral/Ollama Replace the Mistral + Ollama AI stack with a single Google Gemini provider backed by google.golang.org/genai. API key moves from env/Helm to the DB (AES-256-GCM, key derived from JWT_SECRET via HKDF) so it can be rotated via the admin UI without a pod restart. New: - pkg/crypto/secretbox — AES-256-GCM encrypt/decrypt for secrets at rest - pkg/ai/gemini — GeminiProvider with grounding, structured output, usage recording, and hot-reload (Reinitialize swaps client under mutex) - pkg/ai/usage — UsageRecorder interface + UsageEvent struct - domain/settings/store — DB-backed settings (model, grounding toggle, key) - domain/settings/usage — UsageRepo implementing UsageRecorder; ai_usage table - migrations 000021 (system_settings) + 000022 (ai_usage) - settings API: GET /ai, POST /ai/key, POST /ai/model, POST /ai/grounding, GET /ai/usage - admin UI: 4-card settings page — provider status, model selector, grounding toggle with quota, usage rollups + recent-calls table Removed: - pkg/ai/ollama, mistral_provider, ratelimiter (+ tests) - Helm AI_API_KEY, AI_PROVIDER, AI_MODEL_COMPLEX, AI_AGENT_DISCOVERY, AI_RATE_LIMIT_RPS env vars Call sites set Grounded+CallType: research (true/"research"), enrich Pass B (true/"enrich_b"), similarity (false/"similarity"). Integration test updated to use a stub ai.Provider instead of a fake Ollama HTTP server.	2026-04-25 09:54:49 +02:00
vikingowl	67b2eb5d74	feat(market): in-backend research orchestrator with SearxNG + schema-validated LLM Adds pkg/search (SearxNG impl), domain/market/research (orchestrator + embedded German prompt and JSON schema), and reinstates POST /markets/:id/research on top of the new pipeline. Seeds URLs from crawler provenance; falls back to search when fewer than two distinct seed domains are known.	2026-04-24 17:06:04 +02:00
vikingowl	52f3e4c009	chore: replace personal emails with contact@marktvogt.de	2026-04-21 10:56:07 +02:00
vikingowl	d6b65501ec	security: redact agent ID from helm values; gitignore superpowers docs Remove Mistral agent ID from agentDiscovery comment in helm values.yaml. Add docs/superpowers/ to .gitignore to prevent re-tracking internal AI plans.	2026-04-21 09:48:32 +02:00
vikingowl	b52ac7d861	docs(ship-2): handoff note + chore(helm): bump JWT access TTL 15m to 2h Handoff captures end-of-Ship-1 state and Ship 2 scope (§4.10 expanded product additions: crawl-time enrichment, AI-augmented similarity, inline enrich-before-accept, detail drawer, eval harness, enrichment cache, auto-merge during crawl, keyboard shortcuts). §4.12 tracks the admin auth refresh-on-401 fix; pending that work JWT_ACCESS_TTL bumped from 15m to 2h as interim relief.	2026-04-19 01:05:52 +02:00
vikingowl	f6e4e5c29f	fix(discovery): crawl survives gateway timeout and long-running runs - HTTPRoute: add 300s request+backendRequest timeout rule for /api/v1/admin/discovery/crawl; default rule unchanged. nginx-gateway's 60s default was cutting the connection mid-crawl. - Service.Crawl: detach insert pipeline from HTTP request context with a 3-minute internal timeout. Previously a canceled request ctx cascaded into the link-verifier, failing every URL check and counting every merged event as LinkCheckFailed. Inserts now complete even if the gateway cut the connection. - Log CrawlSummary at INFO on completion so outcomes are visible in backend logs without needing the HTTP response body. - New test: TestServiceCrawlDetachesInsertContextFromRequestCtx.	2026-04-18 18:39:21 +02:00
vikingowl	ba453a910f	chore(helm): daily discovery cron hits /crawl endpoint	2026-04-18 17:46:39 +02:00
vikingowl	5a561b3092	fix(helm): CronJob curls the Service port, not the container port Service listens on port 80 (target: container 8080). The CronJob was curling :8080 directly, which isn't exposed by the Service — every tick timed out after ~135s with "Could not connect to server". Switch to {{ .Values.service.port }} so the template always tracks the actual Service port.	2026-04-18 09:00:16 +02:00
vikingowl	1ba8f856b4	fix(helm): add restricted PodSecurity settings to discovery CronJob Previous deploys emitted 4 warnings on the discovery-tick Pod template against the restricted:latest policy. Today they are warnings; if the namespace enforcement tightens, admission will silently drop the Pod. Pod-level: runAsNonRoot, runAsUser/runAsGroup 100 (curlimages/curl's built-in non-root UID), seccompProfile RuntimeDefault. Container-level: allowPrivilegeEscalation false, capabilities drop ALL.	2026-04-18 08:26:40 +02:00
vikingowl	31ce937f55	feat(helm): add discovery CronJob + token secret + env wiring Adds a batch/v1 CronJob that POSTs to /api/v1/admin/discovery/tick on a configurable schedule (default every 4h). Wires DISCOVERY_TOKEN into the ci-secrets Secret and projects discovery/AI env vars into the backend Deployment.	2026-04-18 07:57:18 +02:00
vikingowl	f9b77f362f	chore(helm): right-size resource requests/limits per cluster telemetry Drop requests to match observed peak usage and widen CPU limits for burst headroom (Burstable QoS). Backend, web, Postgres, and Dragonfly all had requests == limits pinned at defaults well above measured 7-day peaks. - backend: req 100m/128Mi -> 50m/64Mi, lim 100m/128Mi -> 200m/128Mi - web: req 100m/128Mi -> 50m/96Mi, lim 100m/128Mi -> 200m/128Mi - postgres (CNPG): req 50m/256Mi -> 15m/128Mi, lim 200m/512Mi -> 100m/256Mi - dragonfly: req 100m/128Mi -> 100m/72Mi, lim 100m/128Mi -> 150m/128Mi RAM limits unchanged where reasonable to preserve OOM protection; Dragonfly CPU request kept at 100m (peak 74m) but limit raised to avoid throttling under brief bursts.	2026-04-18 04:36:12 +02:00
vikingowl	a95d24876d	fix(helm): update imagePullSecret to itsh-registry	2026-04-06 20:01:10 +02:00
vikingowl	e454e31472	fix(ci): switch container registry to registry.itsh.dev	2026-04-06 19:49:06 +02:00
vikingowl	53d7faae24	fix(helm): guaranteed QoS, config checksum, migration retry limit - Set resources req=limit (100m/128Mi) for Guaranteed QoS class - Add ConfigMap checksum annotation to trigger rollouts on config changes - Add retry limit (60 attempts) to migration init container - Use TARGETARCH in Dockerfile for multi-arch build support	2026-04-01 23:44:50 +02:00
vikingowl	482fcd180a	feat(helm): add Go runtime tuning, startup probe, upgrade to Helm 4 - Set GOMAXPROCS and GOMEMLIMIT from cgroup limits to prevent thread oversubscription and unbounded GC memory growth - Add startup probe (60s budget) to gate liveness/readiness during connection pool initialization - Increase liveness failureThreshold to 5 to avoid restarts on transient issues - Remove initialDelaySeconds (startup probe replaces this) - Upgrade CI from alpine/helm:3.17 to alpine/helm:4.1 - Replace deprecated --atomic with --rollback-on-failure + --wait=watcher	2026-04-01 00:07:01 +02:00
vikingowl	74ee825039	fix(helm): switch migrate init container from busybox to alpine busybox:1.37 nc -z is broken (outputs "punt!" and hangs). Alpine 3.21 ships a working nc -z implementation.	2026-03-31 23:38:50 +02:00
vikingowl	08d83bc57e	fix(helm): replace broken nc -z in migrate job init container BusyBox 1.37 nc -z outputs "punt!" and hangs. Use nc -w 2 with stdin redirect instead, which correctly tests TCP connectivity.	2026-03-31 23:06:51 +02:00
vikingowl	ab2484474e	fix(helm): remove busybox init container blocking backend startup BusyBox 1.37 nc -z is broken (outputs "punt!" and never exits), causing the wait-for-cache init container to loop indefinitely. The cache is healthy — the backend should handle reconnects itself.	2026-03-31 23:02:26 +02:00
vikingowl	9c051df350	feat(helm): add wait-for-cache init container to backend deployment Prevents the backend from starting before the DragonflyDB operator has the cache pod ready and reachable. Mirrors the existing wait-for-postgres pattern in the migration job.	2026-03-08 20:00:52 +01:00
vikingowl	3d17e25764	fix(helm): bump dragonfly memory limit to 512Mi DragonflyDB requires 256MiB minimum per thread. With container overhead the 256Mi limit is insufficient, causing immediate exit.	2026-03-08 19:43:05 +01:00
vikingowl	b00e8df6db	fix(helm): lower resource limits to fit within tenant-quota (1 CPU) Set backend and cache limits to 200m/256Mi to stay within the tenant-1 ResourceQuota of 1 CPU total.	2026-03-08 19:29:50 +01:00
vikingowl	2e1eed543d	feat(helm): use DragonflyDB operator CRD, add HTTPRoute sectionName and HTTP→HTTPS redirects Replace manual Valkey Deployment+Service with DragonflyDB operator CRD. Add sectionName to HTTPRoute for HTTPS listener pinning and a separate HTTP→HTTPS 301 redirect route. Update resources from req=limit to request/limit separation for pay-as-you-go billing. Fix NetworkPolicy cache pod selector to match operator-managed labels.	2026-03-08 19:01:34 +01:00
vikingowl	c7085e5337	chore: bump Go to 1.26 in CI and Dockerfile Required by mistral-go-sdk which targets go 1.26.	2026-03-05 21:25:58 +01:00
vikingowl	02a03c3d41	feat: pass AI and Turnstile secrets via Helm deploy pipeline Add Woodpecker secrets for AI_API_KEY, AI_AGENT_SIMPLE, and TURNSTILE_SECRET_KEY. Create ci-secrets.yaml template and wire them through the deploy step alongside existing SMTP secrets.	2026-03-05 18:41:42 +01:00
vikingowl	bec253506e	chore: normalize resources to 100m/100Mi and enable zero-downtime deploys Set CPU and memory requests equal to limits (100m/100Mi) for backend, cache, and web. Switch rolling update strategy to maxSurge=1, maxUnavailable=0 so new pods start before old ones terminate. Add readiness probe to cache deployment.	2026-03-05 18:24:58 +01:00
vikingowl	fd879ba026	fix(deploy): use maxSurge=0 for rolling update to fit resource quota maxSurge=1 requires a second pod during rollout, but the tenant ResourceQuota (1 CPU limit) is already at 900m — the extra 250m exceeds the cap and the pod can't schedule, causing a 5min timeout. Switch to maxSurge=0/maxUnavailable=1 (kill-then-start) to stay within quota. Matches the web deployment strategy.	2026-02-27 14:17:02 +01:00
vikingowl	2def99d163	fix: switch backend deployment to RollingUpdate for zero downtime maxUnavailable=0 ensures old pod stays up until new pod passes readiness probes. maxSurge=1 allows one extra pod during rollout.	2026-02-27 13:43:04 +01:00
vikingowl	2e5d7b726b	feat: add SMTP config to Helm chart and Woodpecker pipeline - Add SMTP_PORT, SMTP_FROM, ADMIN_EMAIL, FRONTEND_URL to ConfigMap - Add Helm-managed SMTP secret for credentials (host, user, password) - Wire Woodpecker secrets into deploy step via --set flags - SMTP secret conditionally created only when values are provided	2026-02-27 13:31:37 +01:00
vikingowl	580b9d5e3c	feat: add admin panel, market submissions, and email notifications - Admin CRUD endpoints for markets with role-based middleware - Anonymous market submission with Cloudflare Turnstile verification - SMTP email notifications on new submissions (LogSender fallback) - Market status workflow (pending/approved/rejected) with admin notes - Nullable location column for submissions without coordinates - CLI tool for promoting users to admin role - Slug generation package extracted from seed - Rate limiting on submission endpoint (3/hour per IP) - Mailpit added to docker-compose for local email testing	2026-02-27 11:03:44 +01:00
vikingowl	8b478a11b8	fix(deploy): use Recreate strategy to fit tenant CPU quota Single-replica deployment with tight CPU quota (1 core) cannot run two pods simultaneously during a rolling update. Recreate kills the old pod before starting the new one.	2026-02-22 12:01:50 +01:00
vikingowl	3236318e72	fix(deploy): add resource limits to migrate job to fit tenant quota	2026-02-22 11:55:42 +01:00
vikingowl	9e6608384c	fix(deploy): increase migrate job deadline to 300s	2026-02-22 11:47:06 +01:00
vikingowl	e092a8d054	fix(deploy): replace Dragonfly CRD with plain Valkey deployment Tenant SA lacks dragonflydb.io CRD permissions. Use a standard Valkey Deployment+Service instead. Also re-enable CNPG (created via kubectl), migrate job, and add seccompProfile to migrate pod.	2026-02-22 10:53:33 +01:00
vikingowl	f48a29c433	fix(deploy): disable CNPG and migrate job (tenant SA lacks CRD permissions) Postgres, Dragonfly, and NetworkPolicy must be provisioned by the platform admin or via itsh.dev dashboard, not by the tenant SA.	2026-02-22 10:32:10 +01:00
vikingowl	7c2e2cebff	fix(deploy): disable Dragonfly CRD (tenant SA lacks dragonflydb.io permission)	2026-02-22 10:25:50 +01:00
vikingowl	e99ab896d3	fix(deploy): disable NetworkPolicy (tenant SA lacks networkpolicies permission)	2026-02-22 10:22:18 +01:00
vikingowl	ae54910f51	fix(docker): use existing nobody user instead of creating UID 65534	2026-02-22 10:19:19 +01:00
vikingowl	f6f07b2139	fix(deploy): add seccompProfile RuntimeDefault to satisfy PodSecurity restricted policy	2026-02-22 10:03:41 +01:00
vikingowl	a12c1b48f1	fix(ci): correct registry to somegit.dev, add golangci-lint v2 version field	2026-02-22 09:50:46 +01:00
vikingowl	10e1d15462	chore(deploy): remove superseded raw k8s manifests	2026-02-22 09:33:41 +01:00
vikingowl	7780c3378b	feat(deploy): add Helm chart and update CI for k8s deployment - Replace raw k8s manifests with a full Helm chart (deploy/helm/) - Add CloudNativePG cluster with PostGIS extensions and hcloud-volumes storage - Add DragonflyDB (Redis-compatible) cache via operator CRD - Add migration Job as Helm pre-install/pre-upgrade hook - Add NetworkPolicy restricting ingress to nginx-gateway, egress to DB/cache/DNS/HTTPS - Add ServiceAccount with automountServiceAccountToken disabled - Use HTTPRoute (Gateway API) instead of Ingress to match cluster setup - Fix Dockerfile: explicit UID 65534, add golang-migrate CLI for migration Job - Update CI: push immutable SHA tags, deploy via helm upgrade --install --atomic	2026-02-22 09:32:01 +01:00
vikingowl	a1d93f7a8e	feat: implement MVP backend API Go backend with Gin, pgx, Valkey (go-valkey), and PostGIS. Domains: - Market search with PostGIS geo-queries (ST_DWithin, ST_Distance), German full-text search (tsvector + ILIKE fallback for compound words), date range filtering, pagination, and slug-based detail endpoint - Auth with email+password (bcrypt), JWT access tokens (15min), session tokens (30d, dual Valkey+Postgres storage), OAuth (Google/GitHub/Facebook), magic links, and TOTP 2FA - User profile with CRUD, soft-delete (30d grace), and restore Infrastructure: - 6 database migrations (users, sessions, oauth_accounts, magic_links, markets with PostGIS+FTS, totp_secrets) - Middleware: recovery, request ID, structured logging (slog), CORS, per-IP rate limiting, JWT auth - Seed data: 10 medieval markets across DACH region - Docker Compose (PostGIS 17 + Valkey 8), multi-stage Dockerfile, Woodpecker CI pipeline, Kubernetes manifests - Justfile, golangci-lint config, env example	2026-02-18 05:52:20 +01:00

44 Commits