ADR

ADR-006: Compute Platform Strategy — Kubernetes, Knative & Scale-to-Zero

Last updated: 2026-02-01 | Decisions

ADR-006: Compute Platform Strategy — Kubernetes, Knative & Scale-to-Zero

Status

Proposed — Pending engineering team review

Context

The platform runs ~35 services on GKE (Kubernetes 1.30.4) with dedicated clusters per tenant. ADR-004 proposes consolidating to a shared cluster. This ADR evaluates whether GKE/Kubernetes remains the right compute platform, and whether services can be optimized with Knative for scale-to-zero to reduce costs during low-traffic periods.

Current Compute Profile

Metric Current
GKE clusters 4 production + 2 dev
Node machine type n1-standard-8 (8 vCPU, 30 GB RAM)
Autoscaling 0-3 nodes per pool
Services per tenant ~28-52 pods always running
CastAI Spot instances default, 25+ services forced on-demand
Min replicas 1 per service (always at least 1 pod running)
Traffic pattern Consumer-facing (fans) — likely peaks during evenings/weekends

Cost Structure Concern

With ~28 services × 4 tenants = ~112 pods minimum running 24/7, many of these are low-traffic services that receive few requests during off-hours. Services like onsite-event, journey, webinar (between events), shoutout-bpm, org-manager, and tracking likely have near-zero traffic for most of the day.

Decision

Adopt a hybrid compute strategy: Keep GKE Kubernetes for core always-on services, and deploy low-traffic services on Knative (Cloud Run for Anthos or standalone Cloud Run) to enable scale-to-zero.

Service Tiering

Tier Criteria Compute Target Scale-to-Zero
Tier 1: Always-On Real-time, high-traffic, stateful connections GKE Deployment (current model) No
Tier 2: Elastic Moderate traffic, tolerates cold start Knative / GKE + KEDA Yes (min 0, scale on traffic)
Tier 3: Event-Driven RabbitMQ consumers, batch processing Knative / KEDA ScaledObject Yes (scale on queue depth)

Service Classification

Tier 1: Always-On (GKE Deployment)

Service Rationale
identity-service Auth flows must be instant; Keycloak dependency
payment-service Stripe webhooks must respond <5s; financial SLA
sse-service Persistent SSE connections; cannot cold-start
chat-service Real-time messaging; Stream Chat webhooks
notification-service High-frequency message consumer; delivery SLA
search-service User-facing search latency sensitive
content-service Core content delivery; high traffic
Keycloak Identity infrastructure; must always be ready
PgBouncer Database infrastructure
RabbitMQ Messaging infrastructure

Tier 2: Elastic (Knative / Scale-to-Zero)

Service Traffic Pattern Cold Start Tolerance
webinar-service Spike before/during events, idle otherwise Yes — events scheduled in advance
class-catalog-service Browsing hours only Yes — catalog pages can tolerate 1-2s
event-service (onsite) Event days only Yes — check-in tolerates slight delay
platform-services Admin/analytics operations Yes — internal-facing
shoutout-service Sporadic (on-demand orders) Yes — order processing is async
inventory-service API calls during browsing Marginal — evaluate cold start impact
message-board-service Read-heavy, infrequent writes Yes — SSE handles real-time

Tier 3: Event-Driven (KEDA ScaledObject)

Service Trigger Scale Metric
purchase-workflow RabbitMQ queue depth Messages in purchase-request queue
shoutout-bpm (merged) RabbitMQ queue depth Messages in shoutout queue
email consumer RabbitMQ queue depth Messages in email queue
sms consumer RabbitMQ queue depth Messages in sms queue

Knative Implementation

# Example: webinar-service as Knative Service
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: webinar-service
  namespace: tenant-agile-network
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "0"
        autoscaling.knative.dev/maxScale: "5"
        autoscaling.knative.dev/target: "100"      # concurrent requests
        autoscaling.knative.dev/scaleDownDelay: "5m"
    spec:
      containerConcurrency: 100
      timeoutSeconds: 300
      containers:
        - image: us-central1-docker.pkg.dev/favedom-dev/docker/webinar:latest
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 200m
              memory: 512Mi
            limits:
              cpu: 1000m
              memory: 1Gi

KEDA Integration for RabbitMQ Consumers

The existing rabbitmq-queue-monitor (Bash script) would be replaced by KEDA ScaledObjects:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: purchase-workflow-scaler
spec:
  scaleTargetRef:
    name: purchase-workflow
  minReplicaCount: 0
  maxReplicaCount: 5
  triggers:
    - type: rabbitmq
      metadata:
        host: amqp://rabbitmq:5672
        queueName: purchase-request
        queueLength: "5"

Cloud Run as Alternative

For Tier 2 services, Cloud Run (serverless) is an alternative to Knative-on-GKE:

Factor Knative on GKE Cloud Run
Networking Same cluster, Istio mesh Requires VPC connector for Cloud SQL/RabbitMQ
Cold start ~2-5s (JVM) ~2-5s (JVM), similar
Cost Pay for node + scale-to-zero saves within cluster Pay per request (no idle cost)
Ops complexity Knative CRDs on existing cluster Fully managed, separate deploy pipeline
Service mesh Full Istio mTLS No Istio — must implement auth differently
Database access Direct via PgBouncer Cloud SQL Auth Proxy / connector

Recommendation: Start with Knative on GKE for Tier 2 services (preserves Istio mesh, PgBouncer access, existing Helm deployment model). Evaluate Cloud Run for new services or if Knative operational overhead proves high.

JVM Cold Start Mitigation

Spring Boot on JVM has 2-5 second cold starts. Mitigation strategies:

  1. GraalVM native images — Reduces cold start to 100-500ms. Requires build pipeline changes and compatibility testing with Spring Boot 3.x + core-lib.
  2. CRaC (Coordinated Restore at Checkpoint) — Java 21 feature. Checkpoint JVM state, restore from snapshot. Reduces cold start to ~200ms without native compilation.
  3. Keep-warm with min-scale=1 during peak hours — Schedule Knative min-scale to 1 during business hours, 0 overnight. Balances cost and latency.
  4. Proactive scaling — Use Knative scaleDownDelay of 5-15 minutes to avoid thrashing.

Recommendation: Start with option 3 (scheduled min-scale) for immediate savings, evaluate CRaC as a Phase 2 optimization.

Hypothesis Background

Primary: A hybrid Kubernetes + Knative model reduces compute costs by 40-60% by enabling scale-to-zero for low-traffic services while maintaining reliability for core services.

Alternative 1: Stay with standard GKE Deployments for all services. - Not rejected for Tier 1 services, but wasteful for Tier 2/3. The platform pays for ~112 always-running pods when many could be at zero replicas for hours.

Alternative 2: Move entirely to Cloud Run (serverless). - Rejected: Core services (SSE, chat, payments, identity) need persistent connections, sub-second latency, and Istio mesh features that Cloud Run doesn’t support. Cloud Run also complicates database access (no direct PgBouncer).

Alternative 3: Move entirely off Kubernetes to a PaaS (e.g., Google App Engine, Heroku). - Rejected: The platform’s complexity (35+ services, Istio mesh, multi-tenant namespace isolation, RabbitMQ, custom Helm charts) exceeds what PaaS platforms handle well. The existing GKE investment (Terraform modules, Helm charts, ArgoCD GitOps) is mature and shouldn’t be abandoned.

Alternative 4: Replace GKE with ECS/Fargate (AWS) or Azure Container Apps. - Rejected: The platform is deeply GCP-native (Cloud SQL, GCS, Cloud DNS, Secret Manager, Artifact Registry, CastAI). Cloud migration would be a multi-quarter effort with no architectural benefit. The question is whether to optimize within GCP, not whether to change clouds.

Falsifiability Criteria

Evidence Quality

Evidence Assurance
KEDA template exists in common Helm chart L2 (verified in chart templates)
CastAI on-demand overrides reduce spot savings L1 (25+ services on-demand, from infra docs)
Low-traffic service identification L1 (inferred from service descriptions and traffic patterns)
Knative on GKE compatibility L1 (GKE documentation, community adoption)
Spring Boot cold start on Knative L1 (well-documented: 2-5s JVM, 100-500ms native)
Actual traffic patterns per service L0 (no production metrics access)
Cost savings estimate (40-60%) L0 (need actual billing data + traffic analysis)

Overall: L1 (WLNK capped by actual traffic patterns L0 and cost data L0)

Bounded Validity

Consequences

Positive: - Estimated 40-60% compute cost reduction for Tier 2/3 services (pods at zero during idle periods) - Replaces custom rabbitmq-queue-monitor with standard KEDA (maintained, Kubernetes-native) - Better resource utilization (cluster nodes freed when pods scale down) - Natural pressure to keep services stateless and fast-starting

Negative: - Knative adds CRD complexity to the cluster - Cold starts may impact user experience for Tier 2 services (mitigated by scheduled min-scale) - KEDA requires RabbitMQ metrics endpoint access - Two deployment models (Deployment + Knative Service) increase cognitive load

Mitigated by: Clear tier classification documented in Helm values. Common chart already supports KEDA. Knative CRDs are well-maintained. Cold start can be masked with loading states on frontend.


Decision date: 2026-01-31 Review by: 2026-07-31