ADR-001: Service Consolidation Strategy
ADR-001: Service Consolidation Strategy
Status
Proposed — Pending engineering team review
Context
The platform runs ~28 backend application services per tenant in production, with ~40 total ArgoCD applications per tenant when including infrastructure and frontends. The platform serves 3 production tenants (agilenetwork, nilgameplan, vtnil), each deploying a near-identical set of services with config-only differentiation (H11 L2).
ArgoCD Production App Breakdown (per tenant, verified from agilenetwork)
| Category | Count | Examples |
|---|---|---|
| Backend services | ~28 | celebrity, chat, class-catalog, content, email, fan, group-profile, identityx-26, inventory, journey, media, message-board, notifications, org-manager, purchase-request-bpm, search, shoutout, shoutout-bpm, sms, sse, stripe, subscriptions, tags, tracking, transaction, users, wallet, webinar |
| Frontend apps | ~3 | admin-fe, celeb-fe, mono-web |
| Infrastructure/platform | ~9 | pgbouncer, rabbitmq, redis, pvcs, secrets, cdn, kube-fledged, site-maintenance, superset |
| Total ArgoCD apps | ~40 |
Tenant-Specific Variations
Not all services deploy to all tenants:
| Service | agilenetwork | nilgameplan | vtnil | Notes |
|---|---|---|---|---|
| chat | Yes | No | No | agilenetwork only |
| search | Yes | No | No | agilenetwork only |
| message-board | Yes | No | Yes | Not in nilgameplan |
| nilgp-partnerportal-be/fe | No | Yes | No | Partner portal (nilgameplan only) |
| investordeck | No | Yes | No | Investor-facing app (nilgameplan only) |
| onsite-event | No | No | Yes | In-person event mgmt (vtnil only) |
Analysis (Sessions 0-10) identified over-decomposed services, consistent patterns across all services, and clean domain boundaries. The question is how to reduce operational complexity while preserving functionality.
Decision
Consolidate from ~28 backend services to ~18 services per tenant along domain-driven boundaries, using incremental in-place merges rather than rewrites. This consolidation applies across all 3 production tenants (agilenetwork, nilgameplan, vtnil).
Consolidation Map
| Target | Source Services | Rationale |
|---|---|---|
| identity-service | celebrity, fan, users | Same domain, shared Keycloak |
| content-service | content, media | Shared Mux, overlapping video |
| payment-service | stripe, subscriptions, wallet, transaction | Same financial domain |
| purchase-workflow | purchase-request-bpm (state machine) | CIB Seven replacement |
| shoutout-service | shoutout, shoutout-bpm | Absorb BPM into service |
| class-catalog-service | class-catalog, journey | Same learning domain |
| notification-service | email, sms, notifications | Shared DB, delivery pipeline |
| platform-services | tags, tracking, group-profile, org-manager | Small supporting services |
Services Kept Separate
inventory (cross-cutting hub), webinar (Zoom lifecycle), chat (Stream SaaS), message-board (Redis SSE), sse (platform infrastructure), search (Elasticsearch), event (distinct domain), keycloak, identityx-26 (identity provider integration).
Note: Tenant-specific services (nilgp-partnerportal-be/fe, investordeck, onsite-event) will be evaluated separately as part of the multi-brand consolidation (see ADR-004).
Hypothesis Background
Primary: Consolidation reduces operational overhead without losing functionality. - Evidence: All services use identical patterns (core-lib, GraphQL, RabbitMQ) — merging is additive code organization, not rewriting (H13 L1). - Database-per-service boundaries are clean (H6 L1) — no hidden data coupling.
Alternative 1: Keep all ~28 backend services, upgrade each independently. - Rejected: Operational overhead of ~28 services × 3 production tenants × CI/CD × monitoring is disproportionate to service complexity. 6 services have <5 endpoints.
Alternative 2: Full rewrite to fewer services with new patterns. - Rejected: H14 falsified — existing patterns are sound, rewrite duplicates work without benefit.
Falsifiability Criteria
- If consolidated services show >20% latency increase → revert, investigate coupling
- If merged databases exceed Cloud SQL connection limits → reconsider database strategy
- If >3 consolidations fail due to unexpected coupling → pause and re-evaluate remaining merges
Evidence Quality
| Evidence | Assurance |
|---|---|
| Pattern consistency across services | L1 (H13) |
| Clean service boundaries | L1 (H6) |
| Over-decomposed services identified | L1 (tech debt inventory) |
| RabbitMQ contracts discoverable | L2 (H12) |
| Multi-brand is config-only | L2 (H11) |
| ArgoCD production manifests reviewed | L2 (IaC verification — 40 apps enumerated per tenant) |
| Terraform production config reviewed | L2 (IaC verification — 3 production tenants confirmed) |
| Data volumes | L0 (H8 — partial) |
Overall: L1 (WLNK capped by data volume uncertainty)
Bounded Validity
- Scope: All application services except Keycloak and infrastructure
- Expiry: Re-evaluate if service count grows >25 (new features requiring new domains)
- Review trigger: If >2 consolidations require rollback
Consequences
Positive: ~50% fewer deployments, simpler CI/CD, reduced monitoring surface, lower infrastructure cost. Negative: Larger deployment units per service, more complex codebase per repo, risk of coupling within consolidated services. Mitigated by: Module boundaries within consolidated services (Maven multi-module), separate Flyway migration folders.
Decision date: 2026-01-30 Review by: 2026-07-30