Modernization

Engineering Kickoff Package

Last updated: 2026-02-01 | Modernization

Engineering Kickoff Package

Key Takeaways

  1. Go recommendation: CONDITIONAL GO — Proceed with Wave 1 (Foundation) immediately. The evidence base is strong (4 L2, 5 L1 hypotheses), the target architecture is validated, and the migration strategy is phased with rollback at every step. Two blockers remain (H8 data volumes, H9 PCI scope) but neither prevents Wave 1 from starting.
  2. First migration domain: Notification pipeline (email + sms + notifications) — lowest risk (shared DB already, no external ID coupling, self-contained), highest immediate value (fixes deprecated Mandrill library, reduces 3 services to 1).
  3. Sprint 0 scope: 6 items — regional GKE upgrade, CI security enforcement, OpenTelemetry setup, integration test framework, BPM state machine POC, notification service consolidation POC.
  4. 15 prioritized engineering stories spanning all 4 migration waves — ready for backlog grooming.
  5. 2 blockers to resolve before Wave 2: obtain production database row counts (H8), confirm Stripe PCI scope (H9).

Migration Decision Question

Is the team ready to start Sprint 0 of the modernization?


Go/No-Go Recommendation

Verdict: CONDITIONAL GO

Go for Wave 1 (Foundation + BPM replacement). Proceed immediately.

Conditional on resolving before Wave 2: 1. H8: Obtain actual production database row counts and table sizes (needed for migration window planning) 2. H9: Confirm PCI scope via Stripe dashboard (likely SAQ-A but must verify)

Evidence Supporting Go

Factor Evidence Confidence
Architecture is sound H14 falsified — upgrade, not rewrite L1
Service boundaries clean H6 — no shared DB backdoors L1
Multi-brand is config-only H11 — verified across all domains L2
Contracts discoverable H12 — 75+ message types mapped L2
Shared libraries stable H13 — core-lib proven foundation L1
No feature parity gaps All Gen 1 features deprecated or replaced L1
Dead code identified ~110 repos archivable, 12+ dead services L1
Migration strategy phased Every phase has rollback plan L1

Risks Accepted

Risk Mitigation
Near-zero test coverage (H7 falsified) Add tests as part of each migration phase
Data volumes unknown (H8 L0) Start with small-data domains first (Wave 2)
PCI scope unconfirmed (H9 L0) Payment domain is Wave 2, not Wave 1

Prioritized Migration Backlog

Sprint 0: Foundation (Wave 1)

# Story Priority Effort Acceptance Criteria
1 Upgrade GKE to regional P0 L Regional cluster operational, automatic zone failover tested, all tenants migrated
2 Enforce CI security scanning P0 S Trivy blocks on high/critical, Qwiet fails on critical findings, all pipelines updated
3 Deploy OpenTelemetry auto-instrumentation P0 M OTel agent in common Helm chart, traces visible in Grafana for all services
4 Set up integration test framework P0 M Test runner in CI, first 5 service tests passing, test database provisioning automated
5 POC: Spring State Machine for purchase workflow P1 M State machine replicates all 10 purchase states, unit tests pass, shadow validation against CIB Seven
6 POC: Notification service consolidation P1 M Merged email+sms+notifications with maintained Mandrill client, all 15 RabbitMQ handlers, integration tests

Wave 2: Low-Risk Migrations

# Story Priority Effort Acceptance Criteria
7 Consolidate notification-service P0 M email+sms+notifications merged, lutung replaced, all notification channels tested, deployed to staging
8 Consolidate payment-service P1 L stripe+subscriptions+wallet+transaction merged, financial reconciliation validated, Stripe webhooks tested
9 Replace purchase-request BPM P1 M Spring State Machine in production, CIB Seven drained, Keycloak plugin removed

Wave 3: Medium-Risk Migrations

# Story Priority Effort Acceptance Criteria
10 Consolidate identity-service P1 M celebrity+fan+users merged, all profile operations tested, Keycloak integration preserved
11 Consolidate content-service P1 L content+media merged, Mux integration unified, NFS→GCS migration started
12 Consolidate shoutout-service P1 M shoutout+shoutout-bpm merged (BPM state machine from story 9 pattern), Mux video workflow tested
13 Upgrade class-catalog P2 M Arlo dead code removed, journey merged, learning credits tested

Wave 4: High-Risk & Frontend

# Story Priority Effort Acceptance Criteria
14 Unify frontend monorepo P1 XL Shared component library, CSS standardization, repo merged, dead code removed
15 Shared cluster consolidation P1 XL All tenants in shared regional cluster, NetworkPolicies enforced, cost reduction measured

First Migration Domain: Notification Pipeline

Why Notification First

Criterion Notification Score Alternative (Payment) Score
Database migration needed None (shared DB already) Yes (4 DBs → 1)
External ID coupling None Stripe IDs (high risk)
Dependencies on other domains Low (inbound only via RabbitMQ) Medium (inventory, BPM)
Immediate value High (fixes deprecated Mandrill lib) High (reduces 4 services)
Rollback complexity Low (re-deploy 3 services) Medium (restore 4 databases)
Test coverage needed Low (async message processing) High (financial transactions)

Notification Consolidation Plan

  1. Create notification-service with all 3 source services’ code as Maven modules
  2. Replace lutung 0.0.8 with direct Mandrill HTTP API client
  3. Merge RabbitMQ handlers (15 inbound total)
  4. Add integration tests for all notification channels
  5. Deploy to staging, validate with synthetic traffic
  6. Deploy to production with Istio traffic shifting
  7. Drain old services, remove deployments

Sprint 0 Scope

Infrastructure Prerequisites

Item Owner Definition of Done
Regional GKE cluster Platform team All tenants on regional clusters, zone failover tested
CI security enforcement DevOps All pipelines fail on high/critical vulnerabilities
OpenTelemetry Platform team Traces in Grafana for all services, 10% sampling rate
Integration test framework Engineering Test runner in CI, database provisioning, first 5 services covered

Application POCs

Item Owner Definition of Done
Purchase state machine Backend team Replicates all CIB Seven states, unit + integration tests, shadow validation
Notification consolidation Backend team Merged service with all handlers, replaced Mandrill library, staging deployment

Resolve Before Wave 2

Item Owner Definition of Done
H8: Production data volumes Platform/DBA Row counts and table sizes for all 35 databases (at least 1 tenant)
H9: PCI scope Engineering lead Stripe dashboard accessed, SAQ level confirmed, documented

Team Skills Assessment

Skills Needed for Migration

Skill Current State Gap
Java 21 / Spring Boot 3.5 All Gen 2 services use this No gap
Angular 18 / Nx Both frontend repos use this No gap
Spring GraphQL All services use this No gap
RabbitMQ / core-lib messaging All services use this No gap
Keycloak administration In use across 4 tenants No gap
Terraform / Atlantis Mature IaC workflow No gap
Helm / ArgoCD Common chart v0.0.179, GitOps mature No gap
Istio service mesh Path routing, mTLS No gap
Spring State Machine Not currently used Training needed
OpenTelemetry Not currently adopted Training needed
NetworkPolicies Not currently deployed Training needed
Tailwind CSS (for frontends team) peeq-mono uses it; frontends team uses Bootstrap Training needed
Integration testing Very low coverage (H7) Practice needed

Training Recommendations

Topic Audience Format
Spring State Machine Backend developers Workshop + POC (Sprint 0 story #5)
OpenTelemetry for Java All backend developers Documentation review + enablement
Kubernetes NetworkPolicies Platform/DevOps Hands-on lab + staging deployment
Tailwind CSS Frontend (frontends repo team) Pair programming during CSS migration
Integration testing patterns All developers Test framework setup (Sprint 0 story #4) + code reviews

Final Hypotheses Scorecard

# Hypothesis Final Assurance Verdict
H1 Broadcast not in production L2 (Verified) Confirmed — archive related repos
H2 Dwolla inactive L2 (Verified) Confirmed — archive repos
H3 Gen 1 fully replaced by Gen 2 L1 (Validated) Only infra Gen 1 remains (retire)
H4 Frontend unification feasible L1 (Validated) CSS restyling, not logic rewrite
H5 >50% repos archivable L1 (Validated) ~110 of 191 (58%)
H6 No shared DB backdoors L1 (Validated) Clean boundaries confirmed
H7 >60% test coverage L0 (Falsified) 2-3 test files per service — major gap
H8 Data volumes manageable L0 (Partial) DB tier known, row counts needed
H9 No compliance blockers L0 (Partial) Likely SAQ-A, need Stripe confirmation
H10 APIs backward-compatible L0 (Partial) GraphQL additive pattern supports it
H11 Multi-brand is config-only L2 (Verified) All domains + infrastructure confirmed
H12 RabbitMQ contracts discoverable L2 (Verified) ~75 message types fully mapped
H13 core-lib stable foundation L1 (Validated) Consistent across all services
H14 Gen 3 rewrite justified L1 (Falsified) Incremental upgrade recommended

Scorecard Summary


Knowledge Base Deliverables Summary

Phase 2/3 Documents (Sessions 0-9)

Document Session Lines Purpose
domain-model/glossary.md 0 ~100 Canonical domain terms
architecture/current-state.md 0 ~220 Platform overview + Mermaid
architecture/service-catalog.md 0 ~360 All 191 repos cataloged
frontend/frontend-architecture.md 1 ~400 Frontend analysis + API inventory
architecture/user-identity.md 2 ~350 Keycloak + identity domain
architecture/content-streaming.md 3 ~400 Content + media + webinar
architecture/payment-processing.md 4-5 ~500 Billing + wallet + transactions
architecture/events-business-logic.md 6 ~400 Shoutout + inventory + classes
architecture/communication-infrastructure.md 7 ~400 Email + SMS + chat + SSE
architecture/infrastructure-devops.md 8 ~770 GKE + Terraform + CI/CD + Helm
architecture/integration-patterns.md 9 ~490 Cross-domain synthesis + RabbitMQ map
architecture/data-models.md 9 ~305 Database inventory + data lineage

Phase 4 Documents (Sessions 10-13)

Document Session Lines Purpose
modernization/migration-decisions.md 0-12 ~80 Progressive migration register (27 entries)
modernization/gap-analysis.md 10 ~260 Constraints + gaps + buy-vs-build
modernization/tech-debt-inventory.md 10 ~470 32 prioritized debt items
modernization/target-architecture.md 11 ~560 H14 evaluation + consolidation map
modernization/migration-strategy.md 12 ~370 Strangler fig + 4 waves + rollback
modernization/engineering-kickoff.md 13 This doc Backlog + Sprint 0 + Go/No-Go
decisions/ADR-001-service-consolidation.md 12 ~80 ~35 → ~18 services
decisions/ADR-002-frontend-unification.md 12 ~80 Single Angular monorepo
decisions/ADR-003-java-standardization.md 12 ~75 Java 21 LTS alignment
decisions/ADR-004-multi-brand-architecture.md 12 ~85 Shared cluster + namespaces

Total: 20 documents + 4 ADRs across 14 sessions


  1. Immediately: Archive ~110 repos (script in service-catalog.md archive candidate list)
  2. This week: Remove frontend dead code (5 API gateways — low effort, high clarity)
  3. Sprint 0: Execute 6 foundation stories (regional GKE, CI security, OTel, test framework, 2 POCs)
  4. Before Wave 2: Resolve H8 (data volumes) and H9 (PCI scope)
  5. Engineering review: Present 4 ADRs to team for approval/feedback

Last updated: 2026-01-30 — Session 13 Review by: 2026-04-30 Staleness risk: High — kickoff package should be actioned promptly