ADR

ADR-004: Multi-Brand Architecture

Last updated: 2026-02-01 | Decisions

ADR-004: Multi-Brand Architecture

Status

Proposed — Pending engineering team review

Context

The platform serves 3 production brands (The Agile Network, NIL Game Plan, VT NIL) on a cluster-per-tenant model. Speed of AI is planned but has no ArgoCD or Terraform production configuration as of 2026-01-31. H11 (L2 Verified) confirmed all tenant differentiation is config-only — same Docker images, same Helm charts, same code. Infrastructure cost scales linearly with tenant count.

Decision

Consolidate to a shared GKE cluster with namespace-per-tenant isolation, preserving config-only tenant differentiation.

Architecture

Current State (3 production tenants)

Shared Regional GKE Cluster
├── namespace: tenant-agile-network → ~18 pods
├── namespace: tenant-nil-game-plan → ~18 pods
├── namespace: tenant-vt-nil → ~18 pods
├── namespace: platform → Keycloak, Istio, ArgoCD, monitoring
└── Isolation: NetworkPolicies + Istio AuthPolicy + RabbitMQ vhosts

Future State (when Speed of AI reaches production)

Shared Regional GKE Cluster
├── namespace: tenant-agile-network → ~18 pods
├── namespace: tenant-nil-game-plan → ~18 pods
├── namespace: tenant-vt-nil → ~18 pods
├── namespace: tenant-speed-of-ai → ~18 pods  ← Planned
├── namespace: platform → Keycloak, Istio, ArgoCD, monitoring
└── Isolation: NetworkPolicies + Istio AuthPolicy + RabbitMQ vhosts

Isolation Mechanisms

Layer Current Target
Compute Separate clusters Namespace + ResourceQuota
Network Physical isolation NetworkPolicies (default-deny)
Service mesh Separate Istio Shared Istio + AuthorizationPolicy
Database Separate Cloud SQL Shared Cloud SQL + separate schemas
Messaging Separate RabbitMQ Shared RabbitMQ + vhosts
Cache Separate Redis Key prefixing or separate Redis
Secrets Cluster-scoped Namespace-scoped (AVP)
Config values-globals.yaml per cluster values-globals.yaml per namespace

Hypothesis Background

Primary: Config-only multi-brand (H11 L2) enables safe cluster consolidation. - Same Docker images across all tenants — no code-level branching. - values-globals.yaml provides all tenant-specific config. - Namespace isolation provides equivalent security to cluster isolation for this workload.

Alternative 1: Keep cluster-per-tenant model. - Rejected: Cost scales linearly. Adding a 4th brand (Speed of AI) or beyond requires an entire new cluster + Cloud SQL + RabbitMQ + Redis. Current model is not economically scalable.

Alternative 2: Full multi-tenancy (single set of services, tenant ID in requests). - Rejected: Requires code changes to add tenant context to every service, database queries, and message handlers. Risk is disproportionate to benefit. Namespace isolation achieves the cost savings with minimal code changes.

Falsifiability Criteria

Evidence Quality

Evidence Assurance
Config-only differentiation L2 (H11, verified across all domains + infra)
Same Docker images L2 (verified from ArgoCD + Helm charts)
NetworkPolicies supported by GKE L2 (GCP documentation)
Cost scaling is linear L1 (inferred from 3 production clusters)
Data volumes manageable per shared instance L0 (H8 — need actual data)

Overall: L1 (WLNK capped by H8)

Bounded Validity

Consequences

Positive: ~67% infrastructure cost reduction (3 clusters → 1), simplified operations (single cluster to manage), easier to add new brands like Speed of AI (new namespace, not new cluster). Negative: Blast radius increases (cluster failure affects all tenants), noisy-neighbor risk, more complex RBAC/NetworkPolicy configuration. Mitigated by: Regional GKE (multi-zone HA), ResourceQuotas per namespace, monitoring per tenant.


Decision date: 2026-01-30 Review by: 2026-07-30