ADR-006: Compute Platform Strategy — Kubernetes, Knative & Scale-to-Zero
ADR-006: Compute Platform Strategy — Kubernetes, Knative & Scale-to-Zero
Status
Proposed — Pending engineering team review
Context
The platform runs ~35 services on GKE (Kubernetes 1.30.4) with dedicated clusters per tenant. ADR-004 proposes consolidating to a shared cluster. This ADR evaluates whether GKE/Kubernetes remains the right compute platform, and whether services can be optimized with Knative for scale-to-zero to reduce costs during low-traffic periods.
Current Compute Profile
| Metric | Current |
|---|---|
| GKE clusters | 4 production + 2 dev |
| Node machine type | n1-standard-8 (8 vCPU, 30 GB RAM) |
| Autoscaling | 0-3 nodes per pool |
| Services per tenant | ~28-52 pods always running |
| CastAI | Spot instances default, 25+ services forced on-demand |
| Min replicas | 1 per service (always at least 1 pod running) |
| Traffic pattern | Consumer-facing (fans) — likely peaks during evenings/weekends |
Cost Structure Concern
With ~28 services × 4 tenants = ~112 pods minimum running 24/7, many
of these are low-traffic services that receive few requests during
off-hours. Services like onsite-event,
journey, webinar (between events),
shoutout-bpm, org-manager, and
tracking likely have near-zero traffic for most of the
day.
Decision
Adopt a hybrid compute strategy: Keep GKE Kubernetes for core always-on services, and deploy low-traffic services on Knative (Cloud Run for Anthos or standalone Cloud Run) to enable scale-to-zero.
Service Tiering
| Tier | Criteria | Compute Target | Scale-to-Zero |
|---|---|---|---|
| Tier 1: Always-On | Real-time, high-traffic, stateful connections | GKE Deployment (current model) | No |
| Tier 2: Elastic | Moderate traffic, tolerates cold start | Knative / GKE + KEDA | Yes (min 0, scale on traffic) |
| Tier 3: Event-Driven | RabbitMQ consumers, batch processing | Knative / KEDA ScaledObject | Yes (scale on queue depth) |
Service Classification
Tier 1: Always-On (GKE Deployment)
| Service | Rationale |
|---|---|
| identity-service | Auth flows must be instant; Keycloak dependency |
| payment-service | Stripe webhooks must respond <5s; financial SLA |
| sse-service | Persistent SSE connections; cannot cold-start |
| chat-service | Real-time messaging; Stream Chat webhooks |
| notification-service | High-frequency message consumer; delivery SLA |
| search-service | User-facing search latency sensitive |
| content-service | Core content delivery; high traffic |
| Keycloak | Identity infrastructure; must always be ready |
| PgBouncer | Database infrastructure |
| RabbitMQ | Messaging infrastructure |
Tier 2: Elastic (Knative / Scale-to-Zero)
| Service | Traffic Pattern | Cold Start Tolerance |
|---|---|---|
| webinar-service | Spike before/during events, idle otherwise | Yes — events scheduled in advance |
| class-catalog-service | Browsing hours only | Yes — catalog pages can tolerate 1-2s |
| event-service (onsite) | Event days only | Yes — check-in tolerates slight delay |
| platform-services | Admin/analytics operations | Yes — internal-facing |
| shoutout-service | Sporadic (on-demand orders) | Yes — order processing is async |
| inventory-service | API calls during browsing | Marginal — evaluate cold start impact |
| message-board-service | Read-heavy, infrequent writes | Yes — SSE handles real-time |
Tier 3: Event-Driven (KEDA ScaledObject)
| Service | Trigger | Scale Metric |
|---|---|---|
| purchase-workflow | RabbitMQ queue depth | Messages in purchase-request queue |
| shoutout-bpm (merged) | RabbitMQ queue depth | Messages in shoutout queue |
| email consumer | RabbitMQ queue depth | Messages in email queue |
| sms consumer | RabbitMQ queue depth | Messages in sms queue |
Knative Implementation
# Example: webinar-service as Knative Service
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: webinar-service
namespace: tenant-agile-network
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "0"
autoscaling.knative.dev/maxScale: "5"
autoscaling.knative.dev/target: "100" # concurrent requests
autoscaling.knative.dev/scaleDownDelay: "5m"
spec:
containerConcurrency: 100
timeoutSeconds: 300
containers:
- image: us-central1-docker.pkg.dev/favedom-dev/docker/webinar:latest
ports:
- containerPort: 8080
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
KEDA Integration for RabbitMQ Consumers
The existing rabbitmq-queue-monitor (Bash script) would
be replaced by KEDA ScaledObjects:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: purchase-workflow-scaler
spec:
scaleTargetRef:
name: purchase-workflow
minReplicaCount: 0
maxReplicaCount: 5
triggers:
- type: rabbitmq
metadata:
host: amqp://rabbitmq:5672
queueName: purchase-request
queueLength: "5"
Cloud Run as Alternative
For Tier 2 services, Cloud Run (serverless) is an alternative to Knative-on-GKE:
| Factor | Knative on GKE | Cloud Run |
|---|---|---|
| Networking | Same cluster, Istio mesh | Requires VPC connector for Cloud SQL/RabbitMQ |
| Cold start | ~2-5s (JVM) | ~2-5s (JVM), similar |
| Cost | Pay for node + scale-to-zero saves within cluster | Pay per request (no idle cost) |
| Ops complexity | Knative CRDs on existing cluster | Fully managed, separate deploy pipeline |
| Service mesh | Full Istio mTLS | No Istio — must implement auth differently |
| Database access | Direct via PgBouncer | Cloud SQL Auth Proxy / connector |
Recommendation: Start with Knative on GKE for Tier 2 services (preserves Istio mesh, PgBouncer access, existing Helm deployment model). Evaluate Cloud Run for new services or if Knative operational overhead proves high.
JVM Cold Start Mitigation
Spring Boot on JVM has 2-5 second cold starts. Mitigation strategies:
- GraalVM native images — Reduces cold start to 100-500ms. Requires build pipeline changes and compatibility testing with Spring Boot 3.x + core-lib.
- CRaC (Coordinated Restore at Checkpoint) — Java 21 feature. Checkpoint JVM state, restore from snapshot. Reduces cold start to ~200ms without native compilation.
- Keep-warm with min-scale=1 during peak hours — Schedule Knative min-scale to 1 during business hours, 0 overnight. Balances cost and latency.
- Proactive scaling — Use Knative
scaleDownDelayof 5-15 minutes to avoid thrashing.
Recommendation: Start with option 3 (scheduled min-scale) for immediate savings, evaluate CRaC as a Phase 2 optimization.
Hypothesis Background
Primary: A hybrid Kubernetes + Knative model reduces compute costs by 40-60% by enabling scale-to-zero for low-traffic services while maintaining reliability for core services.
- Evidence: At least 10 of ~18 consolidated services have traffic patterns amenable to scale-to-zero (sporadic, event-driven, or admin-only).
- Evidence: KEDA is already referenced in the common Helm chart
(
_keda.yamltemplate exists), indicating the team has considered event-driven scaling. - Evidence: CastAI already optimizes node costs with spot instances, but 25+ services forced to on-demand reduces savings. Scale-to-zero eliminates idle pods entirely.
Alternative 1: Stay with standard GKE Deployments for all services. - Not rejected for Tier 1 services, but wasteful for Tier 2/3. The platform pays for ~112 always-running pods when many could be at zero replicas for hours.
Alternative 2: Move entirely to Cloud Run (serverless). - Rejected: Core services (SSE, chat, payments, identity) need persistent connections, sub-second latency, and Istio mesh features that Cloud Run doesn’t support. Cloud Run also complicates database access (no direct PgBouncer).
Alternative 3: Move entirely off Kubernetes to a PaaS (e.g., Google App Engine, Heroku). - Rejected: The platform’s complexity (35+ services, Istio mesh, multi-tenant namespace isolation, RabbitMQ, custom Helm charts) exceeds what PaaS platforms handle well. The existing GKE investment (Terraform modules, Helm charts, ArgoCD GitOps) is mature and shouldn’t be abandoned.
Alternative 4: Replace GKE with ECS/Fargate (AWS) or Azure Container Apps. - Rejected: The platform is deeply GCP-native (Cloud SQL, GCS, Cloud DNS, Secret Manager, Artifact Registry, CastAI). Cloud migration would be a multi-quarter effort with no architectural benefit. The question is whether to optimize within GCP, not whether to change clouds.
Falsifiability Criteria
- If Knative cold starts exceed 5s for Tier 2 services under production load → move affected services back to always-on GKE Deployments
- If KEDA ScaledObjects cause message processing delays >30s during scale-from-zero → set minReplicaCount=1 for affected consumers
- If Knative CRD management overhead exceeds operational cost savings → remove Knative, keep standard Deployments with HPA
- If GraalVM/CRaC evaluation shows >20% of core-lib features are incompatible → abandon native images, rely on scheduled min-scale
Evidence Quality
| Evidence | Assurance |
|---|---|
| KEDA template exists in common Helm chart | L2 (verified in chart templates) |
| CastAI on-demand overrides reduce spot savings | L1 (25+ services on-demand, from infra docs) |
| Low-traffic service identification | L1 (inferred from service descriptions and traffic patterns) |
| Knative on GKE compatibility | L1 (GKE documentation, community adoption) |
| Spring Boot cold start on Knative | L1 (well-documented: 2-5s JVM, 100-500ms native) |
| Actual traffic patterns per service | L0 (no production metrics access) |
| Cost savings estimate (40-60%) | L0 (need actual billing data + traffic analysis) |
Overall: L1 (WLNK capped by actual traffic patterns L0 and cost data L0)
Bounded Validity
- Scope: All application services after ADR-001 consolidation. Does not apply to infrastructure services (Keycloak, PgBouncer, RabbitMQ, ArgoCD, Istio).
- Expiry: Re-evaluate if Cloud Run gains Istio mesh support and VPC connector performance matches direct networking, making full serverless viable.
- Review trigger: If actual production traffic analysis shows Tier 2 services have consistently high traffic (>50 req/s sustained), move them to Tier 1. If Tier 1 services show long idle periods, consider moving to Tier 2.
- Monitoring: Track pod uptime, cold start latency (p99), scale-from-zero events, KEDA scaling lag, and monthly compute cost per service tier.
Consequences
Positive: - Estimated 40-60% compute cost reduction for Tier 2/3 services (pods at zero during idle periods) - Replaces custom rabbitmq-queue-monitor with standard KEDA (maintained, Kubernetes-native) - Better resource utilization (cluster nodes freed when pods scale down) - Natural pressure to keep services stateless and fast-starting
Negative: - Knative adds CRD complexity to the cluster - Cold starts may impact user experience for Tier 2 services (mitigated by scheduled min-scale) - KEDA requires RabbitMQ metrics endpoint access - Two deployment models (Deployment + Knative Service) increase cognitive load
Mitigated by: Clear tier classification documented in Helm values. Common chart already supports KEDA. Knative CRDs are well-maintained. Cold start can be masked with loading states on frontend.
Decision date: 2026-01-31 Review by: 2026-07-31