ADR-005: Database Consolidation & Optimization
ADR-005: Database Consolidation & Optimization
Status
Proposed — Pending engineering team review
Context
The platform runs 35 PostgreSQL databases per tenant across 4 production tenants (140 total). Each microservice has its own database, routed through PgBouncer (41 entries, 3 replicas). Many databases are tiny — wallet has 3 tables, transaction has 1 table, tags has 2 tables. All share a single Cloud SQL instance per tenant (db-custom-2-6656, 750 max connections). ADR-001 proposed consolidating ~35 services to ~18 and noted databases would merge with services. This ADR goes deeper: not every consolidated service needs its own database either.
Current Database Inventory (Per Tenant)
| Database | Tables/Migrations | Service | Domain |
|---|---|---|---|
| celebrity-db | 22 migrations | celebrity | Identity |
| fan-db | 11 migrations | fan | Identity |
| users-db | (Keycloak bridge) | users | Identity |
| content-db | 15+ migrations | content | Content |
| media-db | 10+ migrations | media | Content |
| stripe-db | 12+ migrations | stripe | Payments |
| subscriptions-db | 8+ migrations | subscriptions | Payments |
| wallet-db | 3 migrations | wallet | Payments |
| transaction-db | 6 migrations | transaction | Payments |
| shoutout-db | 15+ migrations | shoutout | Commerce |
| shoutout-bpm-db | (BPM state) | shoutout-bpm | Commerce |
| class-catalog-db | 10+ migrations | class-catalog | Learning |
| journey-db | 5+ migrations | journey | Learning |
| notification-service-db | (shared) | notifications | Communication |
| email-db | (shared w/ notifications) | Communication | |
| sms-db | (shared w/ notifications) | sms | Communication |
| tags-db | 2+ migrations | tags | Platform |
| tracking-db | 3+ migrations | tracking | Platform |
| group-profile-db | 5+ migrations | group-profile | Platform |
| org-manager-db | 5+ migrations | org-manager | Platform |
| inventory-db | 10+ migrations | inventory | Cross-cutting |
| message-board-db | 5 migrations | message-board | Communication |
| sse-db | 3 migrations | sse | Infrastructure |
| search-db | (ES metadata) | search | Infrastructure |
| onsite-event-db | 2 migrations | onsite-event | Events |
| webinar-db | 8+ migrations | webinar | Content |
| chat-db | 2 migrations | chat | Communication |
| stream-db | (inactive) | — | Dead |
| broadcast-db | (inactive) | — | Dead |
| + others | — | — | Various |
Decision
Consolidate from 35 databases to 6 domain-aligned databases per tenant, using PostgreSQL schemas for logical isolation within each database. Services within the same domain share a database but use separate schemas to maintain clear boundaries.
Consolidation Map
| Target Database | Source Databases | Schema Strategy |
|---|---|---|
| identity_db | celebrity, fan, users | celebrity.*, fan.*, users.*
schemas |
| commerce_db | stripe, subscriptions, wallet, transaction, shoutout, shoutout-bpm, inventory, onsite-event | payments.*, shoutout.*,
inventory.*, events.* schemas |
| content_db | content, media, webinar, class-catalog, journey | content.*, media.*,
webinar.*, learning.* schemas |
| communication_db | notification-service, email, sms, message-board, chat, sse | notifications.*, messageboard.*,
chat.*, sse.* schemas |
| platform_db | tags, tracking, group-profile, org-manager, search | tags.*, tracking.*,
profiles.*, org.*, search.*
schemas |
| keycloak_db | identityx-26 | public.* (Keycloak-managed) |
Database Count Reduction
| Metric | Current | After ADR-001 (~18) | This ADR (6) |
|---|---|---|---|
| Databases per tenant | 35 | ~18 | 6 |
| Total production databases | 140 | ~72 | 24 |
| PgBouncer routing entries | 41 | ~18 | 6 |
| Cloud SQL instances (with ADR-004) | 4 | 1 | 1 |
| Max connections consumed | 750 shared across 35 | 750 shared across 18 | 750 shared across 6 |
Migration Approach
- Schema-per-service within shared database — Each service keeps its own Flyway migration folder and schema prefix. No cross-schema foreign keys.
- PgBouncer simplification — Reduce from 41 routing entries to 6. Connection pooling becomes far more efficient.
- Flyway continues — Each service module manages its
own migrations. Flyway
schemasconfig points to the service’s schema within the shared database. - No data model changes — Tables move into schemas
but structure is unchanged.
ALTER TABLE SET SCHEMAoperations during maintenance window. - Rollback strategy — Keep old database as read-only backup during transition. Point PgBouncer back if issues arise.
Connection Pool Optimization
With 6 databases instead of 35, PgBouncer can allocate connections far more efficiently:
| Metric | Current (35 DBs) | Target (6 DBs) |
|---|---|---|
| Connections per database | ~21 avg | ~125 avg |
| Connection overhead | High (35 × min pool) | Low (6 × min pool) |
| Idle connection waste | Significant (many idle DBs) | Minimal |
| Failover complexity | 35 databases to failover | 6 databases to failover |
Hypothesis Background
Primary: Domain-aligned database consolidation reduces operational overhead and improves resource utilization without introducing coupling.
- Evidence: H6 (L1) confirmed clean service boundaries — no cross-database joins or backdoors exist today. Schema-level isolation within a shared database preserves this property.
- Evidence: Many databases are tiny (wallet: 3 tables, transaction: 1 table, tags: 2 tables, chat: 2 tables). The overhead of managing separate Cloud SQL connections, PgBouncer routing, Flyway configs, and backup policies for each is disproportionate.
- Evidence: notification-service already shares a database across email/sms/notifications — proves the pattern works.
Alternative 1: Keep database-per-service (current ADR-001 plan of ~18 databases). - Not rejected entirely, but represents a missed optimization. 18 databases is better than 35 but still carries unnecessary overhead for small services. The incremental effort to go from 18 to 6 is low since PgBouncer and Flyway config changes are mechanical.
Alternative 2: Single database with all schemas. - Rejected: Too much blast radius. A single database failure takes down everything. 6 domain-aligned databases provide a balance between isolation and efficiency. Also, Keycloak must remain separate (it manages its own schema lifecycle).
Alternative 3: Move to managed database service (AlloyDB, Aurora equivalent). - Not rejected but deferred. AlloyDB offers better performance and HA but is a more complex migration. Can be evaluated independently of the consolidation strategy.
Falsifiability Criteria
- If schema-level isolation causes Flyway migration conflicts between services in the same database → split back to separate databases for affected services
- If PgBouncer performance degrades with larger per-database connection pools → tune pool_size or revert
- If Cloud SQL connection limits (750) are insufficient with 6 databases at higher per-DB concurrency → upgrade instance tier or add read replicas
- If Airbyte CDC replication breaks with schema-per-service model → adjust CDC source configuration or maintain separate databases for replicated services
Evidence Quality
| Evidence | Assurance |
|---|---|
| Clean service boundaries (no cross-DB joins) | L1 (H6) |
| Shared notification DB works in production | L2 (verified — email/sms/notifications) |
| Small databases are over-provisioned | L1 (migration count analysis) |
| PgBouncer can route by schema | L1 (PgBouncer documentation) |
| Flyway supports schema-scoped migrations | L2 (Flyway documentation, schemas config) |
| Actual data volumes per database | L0 (H8 — need production access) |
| Airbyte CDC compatibility with schemas | L0 (needs testing) |
Overall: L1 (WLNK capped by data volumes L0 and CDC compatibility L0)
Bounded Validity
- Scope: All application databases except Keycloak. Applies after ADR-001 service consolidation.
- Expiry: Re-evaluate if data volumes (H8) show any single domain database exceeding Cloud SQL instance capacity, or if any domain needs independent scaling (read replicas, different instance tiers).
- Review trigger: If Airbyte CDC cannot handle schema-based source configuration, or if any service needs to move to a different database engine.
- Monitoring: Track per-schema table sizes, connection utilization per database, and Flyway migration execution time.
Consequences
Positive: - 83% reduction in database count (35 → 6) — dramatically simpler operations - PgBouncer routing simplified from 41 entries to 6 - Better connection pool utilization (fewer idle connections per small database) - Faster Cloud SQL failover (6 databases vs 35) - Simpler backup/restore procedures - Reduced Terraform/IaC complexity - Lower Airbyte CDC source count (20 → 6)
Negative: - Larger blast radius per database (a commerce_db failure affects payments, shoutouts, inventory, and events) - Flyway migration ordering may need coordination within a database (mitigated by schema isolation) - Schema permission management adds a layer of complexity
Mitigated by: Schema-level isolation preserves service boundaries. Each service’s Flyway migrations only touch its own schema. No cross-schema foreign keys allowed. Database-level backups capture all schemas atomically (advantage for consistency).
Decision date: 2026-01-31 Review by: 2026-07-31