ADR

ADR-020: Network Security Model

Last updated: 2026-02-01 | Decisions

ADR-020: Network Security Model

Status

Proposed — Pending engineering team review

Context

The platform has no NetworkPolicies, permissive CORS configuration, no WAF, and public GCS buckets. All pods can communicate freely within each GKE cluster. Moving to a shared cluster (ADR-002) makes this gap critical — without network isolation, tenant A’s pods could reach tenant B’s services.

Current State

Component Current Gap
NetworkPolicies None All pods communicate freely — lateral movement possible
CORS Allow all origins Cross-origin attacks possible on all 24+ GraphQL endpoints
WAF None No application-layer attack filtering (SQL injection, XSS)
GCS buckets Some public access Unauthorized content access; data exposure risk
Istio mTLS PERMISSIVE mode Not all traffic is encrypted in transit
Namespace isolation Single namespace per cluster No tenant isolation within cluster
Secret rotation None automated Secrets persist indefinitely once created

Impact

Decision

Implement defense-in-depth network security across four layers: edge (WAF), mesh (mTLS + AuthorizationPolicy), cluster (NetworkPolicies), and storage (signed URLs).

Layer 1: Edge Security (Cloud Armor WAF)

Rule Purpose
OWASP Top 10 managed rule set Block SQL injection, XSS, RCE
Rate limiting 1000 req/min per IP (adjustable per path)
Geo-restriction Optional — restrict to operating regions
Bot management Block known malicious user agents
Custom rules Block requests >10MB (except file upload paths)

Applied at GCP Load Balancer, in front of Istio IngressGateway.

Layer 2: Service Mesh (Istio)

Policy Scope Effect
PeerAuthentication Mesh-wide STRICT mTLS — all service-to-service traffic encrypted
AuthorizationPolicy Per namespace Only named services can reach each other
RequestAuthentication IngressGateway Validate JWT from Keycloak before routing
# Example: Strict mTLS mesh-wide
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT

Layer 3: Cluster Network (NetworkPolicies)

Default-deny ingress for all namespaces, with explicit allow rules:

Namespace Allowed Ingress From Rationale
tenant-{name} Istio IngressGateway only All external traffic enters via mesh
platform (Keycloak) All tenant namespaces All services validate JWT
monitoring All namespaces (metrics scrape) Prometheus needs pod access
Within tenant namespace Same namespace only Services within one tenant can communicate
# Default deny all ingress per tenant namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: tenant-agile-network
spec:
  podSelector: {}
  policyTypes:
    - Ingress

Layer 4: Storage Security (GCS Signed URLs)

Current Target
Public GCS buckets Private buckets + signed URLs
Direct GCS URLs in database Signed URL generation at read time
No expiry URLs expire after configurable TTL (default 1 hour)

Content delivery flow: 1. Frontend requests content via GraphQL 2. Service generates signed URL (GCS IAM) 3. Frontend receives time-limited URL 4. CDN caches signed URL response (not the signature)

CORS Hardening

Current Target
Access-Control-Allow-Origin: * Explicit tenant domains only
No preflight caching Access-Control-Max-Age: 3600
All methods allowed Only GET, POST, OPTIONS for GraphQL

Allowed origins derived from values-globals.yaml per tenant: - https://theagilenetwork.com - https://nilgameplan.com - https://vtnil.com - https://speedofai.com - Plus *.preview.app for staging/preview environments

Implementation Priority

  1. NetworkPolicies — prerequisite for shared cluster (ADR-002)
  2. CORS hardening — low effort, high impact
  3. mTLS STRICT — service mesh security
  4. Cloud Armor WAF — edge protection
  5. GCS signed URLs — storage security
  6. Secret rotation — operational hygiene

Hypothesis Background

Primary: Defense-in-depth network security with four layers provides adequate isolation for multi-tenant shared cluster operation.

Alternative: Keep cluster-per-tenant (no NetworkPolicy needed). - Rejected for cost reasons (ADR-002). Cluster-per-tenant costs scale linearly with tenants.

Falsifiability Criteria

Evidence Quality

Evidence Assurance
No NetworkPolicies in any cluster L2 (verified from Terraform/Helm)
CORS allows all origins L2 (verified from service code)
Public GCS buckets L1 (observed, not exhaustively audited)
Integration patterns (allow list) L1 (documented in integration-patterns.md)
Cloud Armor effectiveness L0 (not tested)

Overall: L1 (WLNK capped by untested Cloud Armor configuration)

Bounded Validity

Consequences

Positive: - Enables shared cluster multi-tenancy (ADR-002 prerequisite) - Defense-in-depth: WAF + mesh + network + storage - Eliminates public GCS bucket exposure - CORS hardening prevents cross-origin attacks - mTLS encrypts all service-to-service traffic

Negative: - NetworkPolicy maintenance as services change - Signed URL generation adds compute overhead - Cloud Armor has per-request cost - CORS changes may break preview/development environments initially


Decision date: 2026-02-01 Review by: 2026-08-01