ADR

ADR-005: Database Consolidation & Optimization

Last updated: 2026-02-01 | Decisions

ADR-005: Database Consolidation & Optimization

Status

Proposed — Pending engineering team review

Context

The platform runs 35 PostgreSQL databases per tenant across 4 production tenants (140 total). Each microservice has its own database, routed through PgBouncer (41 entries, 3 replicas). Many databases are tiny — wallet has 3 tables, transaction has 1 table, tags has 2 tables. All share a single Cloud SQL instance per tenant (db-custom-2-6656, 750 max connections). ADR-001 proposed consolidating ~35 services to ~18 and noted databases would merge with services. This ADR goes deeper: not every consolidated service needs its own database either.

Current Database Inventory (Per Tenant)

Database Tables/Migrations Service Domain
celebrity-db 22 migrations celebrity Identity
fan-db 11 migrations fan Identity
users-db (Keycloak bridge) users Identity
content-db 15+ migrations content Content
media-db 10+ migrations media Content
stripe-db 12+ migrations stripe Payments
subscriptions-db 8+ migrations subscriptions Payments
wallet-db 3 migrations wallet Payments
transaction-db 6 migrations transaction Payments
shoutout-db 15+ migrations shoutout Commerce
shoutout-bpm-db (BPM state) shoutout-bpm Commerce
class-catalog-db 10+ migrations class-catalog Learning
journey-db 5+ migrations journey Learning
notification-service-db (shared) notifications Communication
email-db (shared w/ notifications) email Communication
sms-db (shared w/ notifications) sms Communication
tags-db 2+ migrations tags Platform
tracking-db 3+ migrations tracking Platform
group-profile-db 5+ migrations group-profile Platform
org-manager-db 5+ migrations org-manager Platform
inventory-db 10+ migrations inventory Cross-cutting
message-board-db 5 migrations message-board Communication
sse-db 3 migrations sse Infrastructure
search-db (ES metadata) search Infrastructure
onsite-event-db 2 migrations onsite-event Events
webinar-db 8+ migrations webinar Content
chat-db 2 migrations chat Communication
stream-db (inactive) Dead
broadcast-db (inactive) Dead
+ others Various

Decision

Consolidate from 35 databases to 6 domain-aligned databases per tenant, using PostgreSQL schemas for logical isolation within each database. Services within the same domain share a database but use separate schemas to maintain clear boundaries.

Consolidation Map

Target Database Source Databases Schema Strategy
identity_db celebrity, fan, users celebrity.*, fan.*, users.* schemas
commerce_db stripe, subscriptions, wallet, transaction, shoutout, shoutout-bpm, inventory, onsite-event payments.*, shoutout.*, inventory.*, events.* schemas
content_db content, media, webinar, class-catalog, journey content.*, media.*, webinar.*, learning.* schemas
communication_db notification-service, email, sms, message-board, chat, sse notifications.*, messageboard.*, chat.*, sse.* schemas
platform_db tags, tracking, group-profile, org-manager, search tags.*, tracking.*, profiles.*, org.*, search.* schemas
keycloak_db identityx-26 public.* (Keycloak-managed)

Database Count Reduction

Metric Current After ADR-001 (~18) This ADR (6)
Databases per tenant 35 ~18 6
Total production databases 140 ~72 24
PgBouncer routing entries 41 ~18 6
Cloud SQL instances (with ADR-004) 4 1 1
Max connections consumed 750 shared across 35 750 shared across 18 750 shared across 6

Migration Approach

  1. Schema-per-service within shared database — Each service keeps its own Flyway migration folder and schema prefix. No cross-schema foreign keys.
  2. PgBouncer simplification — Reduce from 41 routing entries to 6. Connection pooling becomes far more efficient.
  3. Flyway continues — Each service module manages its own migrations. Flyway schemas config points to the service’s schema within the shared database.
  4. No data model changes — Tables move into schemas but structure is unchanged. ALTER TABLE SET SCHEMA operations during maintenance window.
  5. Rollback strategy — Keep old database as read-only backup during transition. Point PgBouncer back if issues arise.

Connection Pool Optimization

With 6 databases instead of 35, PgBouncer can allocate connections far more efficiently:

Metric Current (35 DBs) Target (6 DBs)
Connections per database ~21 avg ~125 avg
Connection overhead High (35 × min pool) Low (6 × min pool)
Idle connection waste Significant (many idle DBs) Minimal
Failover complexity 35 databases to failover 6 databases to failover

Hypothesis Background

Primary: Domain-aligned database consolidation reduces operational overhead and improves resource utilization without introducing coupling.

Alternative 1: Keep database-per-service (current ADR-001 plan of ~18 databases). - Not rejected entirely, but represents a missed optimization. 18 databases is better than 35 but still carries unnecessary overhead for small services. The incremental effort to go from 18 to 6 is low since PgBouncer and Flyway config changes are mechanical.

Alternative 2: Single database with all schemas. - Rejected: Too much blast radius. A single database failure takes down everything. 6 domain-aligned databases provide a balance between isolation and efficiency. Also, Keycloak must remain separate (it manages its own schema lifecycle).

Alternative 3: Move to managed database service (AlloyDB, Aurora equivalent). - Not rejected but deferred. AlloyDB offers better performance and HA but is a more complex migration. Can be evaluated independently of the consolidation strategy.

Falsifiability Criteria

Evidence Quality

Evidence Assurance
Clean service boundaries (no cross-DB joins) L1 (H6)
Shared notification DB works in production L2 (verified — email/sms/notifications)
Small databases are over-provisioned L1 (migration count analysis)
PgBouncer can route by schema L1 (PgBouncer documentation)
Flyway supports schema-scoped migrations L2 (Flyway documentation, schemas config)
Actual data volumes per database L0 (H8 — need production access)
Airbyte CDC compatibility with schemas L0 (needs testing)

Overall: L1 (WLNK capped by data volumes L0 and CDC compatibility L0)

Bounded Validity

Consequences

Positive: - 83% reduction in database count (35 → 6) — dramatically simpler operations - PgBouncer routing simplified from 41 entries to 6 - Better connection pool utilization (fewer idle connections per small database) - Faster Cloud SQL failover (6 databases vs 35) - Simpler backup/restore procedures - Reduced Terraform/IaC complexity - Lower Airbyte CDC source count (20 → 6)

Negative: - Larger blast radius per database (a commerce_db failure affects payments, shoutouts, inventory, and events) - Flyway migration ordering may need coordination within a database (mitigated by schema isolation) - Schema permission management adds a layer of complexity

Mitigated by: Schema-level isolation preserves service boundaries. Each service’s Flyway migrations only touch its own schema. No cross-schema foreign keys allowed. Database-level backups capture all schemas atomically (advantage for consistency).


Decision date: 2026-01-31 Review by: 2026-07-31