ADR

ADR-018: API Strategy — GraphQL Federation & Schema Evolution

Last updated: 2026-02-01 | Decisions

ADR-018: API Strategy — GraphQL Federation & Schema Evolution

Status

Proposed — Pending engineering team review

Context

The platform exposes 24+ separate GraphQL gateways, one per service, behind Istio path-based routing (/api/{service}). Each service owns its own schema. There is no API versioning, no deprecation policy, no rate limiting, and no centralized schema registry. Frontend clients must know which service owns which data.

Service consolidation (ADR-001) reduces services from ~35 to ~18, which naturally reduces gateway count. This is the right time to decide whether to federate.

Current State

Metric Value
GraphQL gateways 24+ (one per service)
API versioning None
Schema registry None
Rate limiting None (Istio routing only)
Deprecation policy Ad-hoc (@deprecated annotation, no enforcement)
Frontend API knowledge Must know which service owns each query/mutation
Dead API calls ~17% of frontend code calls non-existent services

Decision

Adopt GraphQL schema stitching via a lightweight API gateway, consolidating the 24+ separate endpoints into a single unified GraphQL endpoint. Use schema evolution best practices (additive-only changes, deprecation annotations with sunset dates) rather than versioning.

Why Schema Stitching (Not Full Federation)

Factor Schema Stitching Apollo Federation
Complexity Moderate — gateway merges schemas High — requires Apollo Router, subgraph protocol changes
Code changes Minimal — services keep existing GraphQL endpoints Significant — each service needs @key, @external directives
Operational overhead Single gateway service Apollo Router + registry + composition pipeline
Cross-service references Handled via client-side joins or gateway resolvers Native entity references across subgraphs
Fit for our scale 18 services, moderate query complexity Enterprise scale, complex entity graphs
Existing patterns Services already expose standalone GraphQL Would require schema restructuring

At 18 services with moderate query complexity, full Apollo Federation is over-engineering. Schema stitching via a gateway (graphql-mesh, graphql-tools, or custom Spring GraphQL gateway) provides a unified endpoint without restructuring every service.

Target Architecture

Frontend (Next.js / React Native)
    │
    └── Single GraphQL endpoint: /api/graphql
            │
            ├── API Gateway (Spring GraphQL or graphql-mesh)
            │   ├── Rate limiting
            │   ├── Auth validation (Keycloak JWT)
            │   ├── Schema stitching (merges all service schemas)
            │   ├── Query complexity limiting
            │   └── Deprecation enforcement
            │
            ├── identity-service /api/identity/graphql
            ├── content-service /api/content/graphql
            ├── payment-service /api/payment/graphql
            ├── shoutout-service /api/shoutout/graphql
            └── ... (18 total)

Frontend talks to ONE endpoint. Gateway routes to the correct service. Services keep their existing GraphQL schemas unchanged.

Schema Evolution Policy

Rule Enforcement
Additive only New fields, types, queries are always safe. Never remove or rename without deprecation.
Deprecation annotations @deprecated(reason: "Use X instead. Sunset: YYYY-MM-DD")
Sunset period Minimum 90 days between deprecation and removal
Breaking change gate CI checks schema diff — blocks removal of non-deprecated fields
Schema changelog Auto-generated from schema diffs per PR

Rate Limiting

Tier Limit Applies To
Authenticated 1000 req/min Normal users
Admin 5000 req/min Admin dashboard
Webhook No limit Stripe/Mux webhook callbacks
Query complexity Max depth 10, max complexity 500 All queries

Hypothesis Background

Primary: A unified GraphQL gateway with schema stitching provides a better developer experience and operational model than 24+ separate endpoints, without the complexity of full federation.

Alternative 1: Apollo Federation. - Not rejected permanently — evaluate if cross-service entity references become common (e.g., “load user’s purchases and their content in a single query”). Currently queries are mostly scoped to a single service.

Alternative 2: Keep separate endpoints (status quo with fewer services). - Viable but misses the opportunity to simplify the frontend. After consolidation, 18 endpoints is manageable but a single endpoint is better DX.

Falsifiability Criteria

Evidence Quality

Evidence Assurance
24+ separate GraphQL endpoints L2 (verified from service catalog)
~17% dead frontend API calls L2 (verified from code analysis)
No API versioning L2 (verified — no version annotations found)
Schema stitching works at our scale L1 (documented pattern, not tested)
Gateway latency overhead L0 (needs benchmarking)

Overall: L1 (WLNK capped by untested gateway performance)

Bounded Validity

Consequences

Positive: - Single GraphQL endpoint for frontend — simpler client code - Centralized rate limiting, auth validation, and schema enforcement - Schema evolution policy prevents breaking changes - Dead API calls eliminated (gateway validates schema) - Foundation for API analytics and monitoring

Negative: - Gateway is a new service to operate and scale - Schema stitching can have edge cases with conflicting type names - Additional network hop for every query - Gateway must be updated when services add/change schemas


Decision date: 2026-02-01 Review by: 2026-08-01