ADR

ADR-006: Compute Platform Strategy — Kubernetes, Knative & Scale-to-Zero

Last updated: 2026-02-01 | Decisions

ADR-006: Compute Platform Strategy — Kubernetes, Knative & Scale-to-Zero

Status

Proposed — Pending engineering team review

Context

The platform runs ~35 services on GKE (Kubernetes 1.30.4) with dedicated clusters per tenant. ADR-004 proposes consolidating to a shared cluster. This ADR evaluates whether GKE/Kubernetes remains the right compute platform, and whether services can be optimized with Knative for scale-to-zero to reduce costs during low-traffic periods.

Current Compute Profile

Metric	Current
GKE clusters	4 production + 2 dev
Node machine type	n1-standard-8 (8 vCPU, 30 GB RAM)
Autoscaling	0-3 nodes per pool
Services per tenant	~28-52 pods always running
CastAI	Spot instances default, 25+ services forced on-demand
Min replicas	1 per service (always at least 1 pod running)
Traffic pattern	Consumer-facing (fans) — likely peaks during evenings/weekends

Cost Structure Concern

With ~28 services × 4 tenants = ~112 pods minimum running 24/7, many of these are low-traffic services that receive few requests during off-hours. Services like onsite-event, journey, webinar (between events), shoutout-bpm, org-manager, and tracking likely have near-zero traffic for most of the day.

Decision

Adopt a hybrid compute strategy: Keep GKE Kubernetes for core always-on services, and deploy low-traffic services on Knative (Cloud Run for Anthos or standalone Cloud Run) to enable scale-to-zero.

Service Tiering

Tier	Criteria	Compute Target	Scale-to-Zero
Tier 1: Always-On	Real-time, high-traffic, stateful connections	GKE Deployment (current model)	No
Tier 2: Elastic	Moderate traffic, tolerates cold start	Knative / GKE + KEDA	Yes (min 0, scale on traffic)
Tier 3: Event-Driven	RabbitMQ consumers, batch processing	Knative / KEDA ScaledObject	Yes (scale on queue depth)

Service Classification

Tier 1: Always-On (GKE Deployment)

Service	Rationale
identity-service	Auth flows must be instant; Keycloak dependency
payment-service	Stripe webhooks must respond <5s; financial SLA
sse-service	Persistent SSE connections; cannot cold-start
chat-service	Real-time messaging; Stream Chat webhooks
notification-service	High-frequency message consumer; delivery SLA
search-service	User-facing search latency sensitive
content-service	Core content delivery; high traffic
Keycloak	Identity infrastructure; must always be ready
PgBouncer	Database infrastructure
RabbitMQ	Messaging infrastructure

Tier 2: Elastic (Knative / Scale-to-Zero)

Service	Traffic Pattern	Cold Start Tolerance
webinar-service	Spike before/during events, idle otherwise	Yes — events scheduled in advance
class-catalog-service	Browsing hours only	Yes — catalog pages can tolerate 1-2s
event-service (onsite)	Event days only	Yes — check-in tolerates slight delay
platform-services	Admin/analytics operations	Yes — internal-facing
shoutout-service	Sporadic (on-demand orders)	Yes — order processing is async
inventory-service	API calls during browsing	Marginal — evaluate cold start impact
message-board-service	Read-heavy, infrequent writes	Yes — SSE handles real-time

Tier 3: Event-Driven (KEDA ScaledObject)

Service	Trigger	Scale Metric
purchase-workflow	RabbitMQ queue depth	Messages in `purchase-request` queue
shoutout-bpm (merged)	RabbitMQ queue depth	Messages in `shoutout` queue
email consumer	RabbitMQ queue depth	Messages in `email` queue
sms consumer	RabbitMQ queue depth	Messages in `sms` queue

Knative Implementation

# Example: webinar-service as Knative Service
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: webinar-service
  namespace: tenant-agile-network
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "0"
        autoscaling.knative.dev/maxScale: "5"
        autoscaling.knative.dev/target: "100"      # concurrent requests
        autoscaling.knative.dev/scaleDownDelay: "5m"
    spec:
      containerConcurrency: 100
      timeoutSeconds: 300
      containers:
        - image: us-central1-docker.pkg.dev/favedom-dev/docker/webinar:latest
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 200m
              memory: 512Mi
            limits:
              cpu: 1000m
              memory: 1Gi

KEDA Integration for RabbitMQ Consumers

The existing rabbitmq-queue-monitor (Bash script) would be replaced by KEDA ScaledObjects:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: purchase-workflow-scaler
spec:
  scaleTargetRef:
    name: purchase-workflow
  minReplicaCount: 0
  maxReplicaCount: 5
  triggers:
    - type: rabbitmq
      metadata:
        host: amqp://rabbitmq:5672
        queueName: purchase-request
        queueLength: "5"

Cloud Run as Alternative

For Tier 2 services, Cloud Run (serverless) is an alternative to Knative-on-GKE:

Factor	Knative on GKE	Cloud Run
Networking	Same cluster, Istio mesh	Requires VPC connector for Cloud SQL/RabbitMQ
Cold start	~2-5s (JVM)	~2-5s (JVM), similar
Cost	Pay for node + scale-to-zero saves within cluster	Pay per request (no idle cost)
Ops complexity	Knative CRDs on existing cluster	Fully managed, separate deploy pipeline
Service mesh	Full Istio mTLS	No Istio — must implement auth differently
Database access	Direct via PgBouncer	Cloud SQL Auth Proxy / connector

Recommendation: Start with Knative on GKE for Tier 2 services (preserves Istio mesh, PgBouncer access, existing Helm deployment model). Evaluate Cloud Run for new services or if Knative operational overhead proves high.

JVM Cold Start Mitigation

Spring Boot on JVM has 2-5 second cold starts. Mitigation strategies:

GraalVM native images — Reduces cold start to 100-500ms. Requires build pipeline changes and compatibility testing with Spring Boot 3.x + core-lib.
CRaC (Coordinated Restore at Checkpoint) — Java 21 feature. Checkpoint JVM state, restore from snapshot. Reduces cold start to ~200ms without native compilation.
Keep-warm with min-scale=1 during peak hours — Schedule Knative min-scale to 1 during business hours, 0 overnight. Balances cost and latency.
Proactive scaling — Use Knative scaleDownDelay of 5-15 minutes to avoid thrashing.

Recommendation: Start with option 3 (scheduled min-scale) for immediate savings, evaluate CRaC as a Phase 2 optimization.

Hypothesis Background

Primary: A hybrid Kubernetes + Knative model reduces compute costs by 40-60% by enabling scale-to-zero for low-traffic services while maintaining reliability for core services.

Evidence: At least 10 of ~18 consolidated services have traffic patterns amenable to scale-to-zero (sporadic, event-driven, or admin-only).
Evidence: KEDA is already referenced in the common Helm chart (_keda.yaml template exists), indicating the team has considered event-driven scaling.
Evidence: CastAI already optimizes node costs with spot instances, but 25+ services forced to on-demand reduces savings. Scale-to-zero eliminates idle pods entirely.

Alternative 1: Stay with standard GKE Deployments for all services. - Not rejected for Tier 1 services, but wasteful for Tier 2/3. The platform pays for ~112 always-running pods when many could be at zero replicas for hours.

Alternative 2: Move entirely to Cloud Run (serverless). - Rejected: Core services (SSE, chat, payments, identity) need persistent connections, sub-second latency, and Istio mesh features that Cloud Run doesn’t support. Cloud Run also complicates database access (no direct PgBouncer).

Alternative 3: Move entirely off Kubernetes to a PaaS (e.g., Google App Engine, Heroku). - Rejected: The platform’s complexity (35+ services, Istio mesh, multi-tenant namespace isolation, RabbitMQ, custom Helm charts) exceeds what PaaS platforms handle well. The existing GKE investment (Terraform modules, Helm charts, ArgoCD GitOps) is mature and shouldn’t be abandoned.

Alternative 4: Replace GKE with ECS/Fargate (AWS) or Azure Container Apps. - Rejected: The platform is deeply GCP-native (Cloud SQL, GCS, Cloud DNS, Secret Manager, Artifact Registry, CastAI). Cloud migration would be a multi-quarter effort with no architectural benefit. The question is whether to optimize within GCP, not whether to change clouds.

Falsifiability Criteria

If Knative cold starts exceed 5s for Tier 2 services under production load → move affected services back to always-on GKE Deployments
If KEDA ScaledObjects cause message processing delays >30s during scale-from-zero → set minReplicaCount=1 for affected consumers
If Knative CRD management overhead exceeds operational cost savings → remove Knative, keep standard Deployments with HPA
If GraalVM/CRaC evaluation shows >20% of core-lib features are incompatible → abandon native images, rely on scheduled min-scale

Evidence Quality

Evidence	Assurance
KEDA template exists in common Helm chart	L2 (verified in chart templates)
CastAI on-demand overrides reduce spot savings	L1 (25+ services on-demand, from infra docs)
Low-traffic service identification	L1 (inferred from service descriptions and traffic patterns)
Knative on GKE compatibility	L1 (GKE documentation, community adoption)
Spring Boot cold start on Knative	L1 (well-documented: 2-5s JVM, 100-500ms native)
Actual traffic patterns per service	L0 (no production metrics access)
Cost savings estimate (40-60%)	L0 (need actual billing data + traffic analysis)

Overall: L1 (WLNK capped by actual traffic patterns L0 and cost data L0)

Bounded Validity

Scope: All application services after ADR-001 consolidation. Does not apply to infrastructure services (Keycloak, PgBouncer, RabbitMQ, ArgoCD, Istio).
Expiry: Re-evaluate if Cloud Run gains Istio mesh support and VPC connector performance matches direct networking, making full serverless viable.
Review trigger: If actual production traffic analysis shows Tier 2 services have consistently high traffic (>50 req/s sustained), move them to Tier 1. If Tier 1 services show long idle periods, consider moving to Tier 2.
Monitoring: Track pod uptime, cold start latency (p99), scale-from-zero events, KEDA scaling lag, and monthly compute cost per service tier.

Consequences

Positive: - Estimated 40-60% compute cost reduction for Tier 2/3 services (pods at zero during idle periods) - Replaces custom rabbitmq-queue-monitor with standard KEDA (maintained, Kubernetes-native) - Better resource utilization (cluster nodes freed when pods scale down) - Natural pressure to keep services stateless and fast-starting

Negative: - Knative adds CRD complexity to the cluster - Cold starts may impact user experience for Tier 2 services (mitigated by scheduled min-scale) - KEDA requires RabbitMQ metrics endpoint access - Two deployment models (Deployment + Knative Service) increase cognitive load

Mitigated by: Clear tier classification documented in Helm values. Common chart already supports KEDA. Knative CRDs are well-maintained. Cold start can be masked with loading states on frontend.

Decision date: 2026-01-31 Review by: 2026-07-31