Modernization

Gap Analysis & Constraints

Last updated: 2026-02-01 | Modernization

Gap Analysis & Constraints

Key Takeaways

  1. CIB Seven (Camunda 7) EOL is the highest-urgency constraint — Camunda 7 CE support ended October 2025. Two BPM services (purchase-request-bpm, shoutout-bpm) and the Keycloak identity sync plugin depend on it. Replacement is mandatory regardless of other modernization decisions.
  2. 11 active external API integrations are immovable — Stripe, Mux, Zoom, Stream Chat, Twilio, Mandrill, Google Calendar, Elasticsearch, LogRocket, CastAI, and Airbyte/Snowflake. API keys, webhook URLs, and external IDs must be preserved during any migration.
  3. ~110 of 191 repos are archivable — 63 repos untouched since before 2025, plus ~47 Gen 1 services with confirmed Gen 2 replacements. Only ~35 application services and ~15 infrastructure repos are active.
  4. Observability has critical gaps — No alerting configuration, APM disabled, no SLOs, no distributed tracing adoption, no error budget tracking. Monitoring infrastructure exists (Prometheus, Grafana, Elasticsearch) but is under-utilized.
  5. Buy-vs-build analysis identifies 4 SaaS replacement candidates — Email delivery (Mandrill → SendGrid/Postmark), SMS (Twilio already SaaS — keep), BPM engine (CIB Seven → Temporal/lightweight state machine), logging pipeline (peeq-logging → Cloud Logging native).

Migration Decision Question

What are the highest-impact gaps and debts blocking modernization?


1. Immovable Constraints

These cannot be changed by the modernization effort — the architecture must accommodate them.

1.1 External API Integrations (Must Preserve)

External Service Platform Services What Must Be Preserved Risk
Stripe stripe, subscriptions, purchase-request-bpm Customer IDs, product IDs, subscription IDs, webhook URLs, Checkout Sessions High — payment disruption
Mux content, media, shoutout Asset IDs, playback IDs, webhook URLs, signed URLs High — video content loss
Zoom webinar Meeting IDs, registrant IDs, API credentials Medium — event disruption
Stream Chat chat Channel IDs, API keys, user tokens Medium — chat history
Twilio sms Account SID, phone numbers, messaging service Low — stateless
Mandrill email API key, templates, sender domains Low — stateless
Google Calendar webinar Event IDs, OAuth credentials Low — reconstructible
Elasticsearch search, peeq-logging Indices, Kibana dashboards Medium — search downtime
LogRocket frontends, peeq-mono App IDs, session config Low — frontend only
CastAI GKE clusters Agent config, node policies Low — infra only
Airbyte/Snowflake analytics CDC connections (20 DBs), warehouse schemas Medium — analytics gap

1.2 Multi-Brand Architecture (Must Support)

1.3 Keycloak as Universal Identity (Must Migrate Carefully)

1.4 Database-Per-Service Isolation (Must Respect)

1.5 Data Volume (Partially Unknown)


2. Dead Code & Inactive Services Assessment

2.1 Confirmed Dead Backend Services (Archive Immediately)

Service Evidence Session
peeq-dwolla Not in ArgoCD, no Gen 2 references, last commit Jan 2023 5
peeq-mux-livestream Mux Spaces API deprecated, no ArgoCD app 3
peeq-jitsi-meet Docker container only, webinar uses Zoom 3
peeq-meet-and-greet-bpm Camunda 7.17, Jitsi dependency, no Gen 2 6
peeq-custom-tixr Gen 1 Tixr integration, inactive 6
peeq-conference-sse Gen 1, meet-and-greet only 7
peeq-websocket Gen 1 Node.js, single EC2 SPOF, Jitsi 7
peeq-sse Gen 1, replaced by Gen 2 SSE 7
peeq-logging Gen 1 Node.js, superseded by Cloud Logging 8
peeq-shared-secret Gen 1 Java 11, superseded by AVP + GCP SM 8
broadcast (all variants) Never deployed, Mux Spaces deprecated 3
conference Never deployed, no ArgoCD app 3

Total: 12+ services confirmed dead or replaced

2.2 Frontend Dead Code

Dead Code Location Evidence
BroadcastGateway peeq-mono Calls non-existent broadcast backend
ConferenceGateway peeq-mono Calls non-existent conference backend
StreamGateway peeq-mono Calls non-existent stream backend
DwollaService peeq-mono + frontends Calls non-existent Dwolla backend
LoggingGateway peeq-mono Calls non-existent logging API

Impact: ~17% of frontend API gateway code targets non-existent production services.

2.3 Backend Dead Code

Dead Code Service Evidence
Arlo LMS integration class-catalog Deprecated, migration complete
Deprecated GraphQL queries celebrity 3 queries marked deprecated
Email GraphQL API email All marked deprecated, migrating to Keycloak
mux-sync Python ETL standalone Utility, not a production service

2.4 Archive Candidate Summary

Category Count Action
Gen 1 services with Gen 2 replacement ~35 Archive repos
Inactive/never-deployed services ~15 Archive repos
Legacy DB repos (schemas in Gen 2 Flyway) ~20 Archive repos
POC/experiment repos ~20 Archive repos
Legacy frontend repos ~15 Archive repos
Total archivable ~105 H5 validated
Active repos ~50 35 app + 15 infra
Unclear ~36 Need further review

3. Feature Parity Gaps

3.1 Gen 1 Features Not in Gen 2

Feature Gen 1 Service Gen 2 Status Gap Impact
Live broadcasting peeq-mux-livestream Not built None — Mux Spaces deprecated, feature not in use
Meet-and-greet peeq-meet-and-greet-bpm Not built None — feature deprecated, Jitsi dependency
Tixr ticketing peeq-custom-tixr Not built None — no evidence of active use
Socket.IO real-time peeq-websocket Replaced by SSE None — SSE covers use cases
Jitsi video conferencing peeq-jitsi-meet Not built None — Zoom replaced all video conferencing

Finding: No feature parity gaps. All Gen 1 unique features were either deprecated, replaced by SaaS (Zoom), or replaced by Gen 2 alternatives (SSE).

3.2 Missing Capabilities (Not in Gen 1 or Gen 2)

Capability Current State Impact on Modernization
API gateway Istio path-based routing only; no rate limiting, versioning, or API keys Need API management layer for Gen 3
Distributed tracing Istio Stackdriver config exists but not adopted Cannot debug cross-service requests
Alerting Prometheus deployed, no alert rules configured Zero automated incident detection
SLO/SLI definitions None Cannot measure service reliability
Error budgets None Cannot balance velocity vs. reliability
Feature flags None (tenant config only) Cannot do gradual rollouts or A/B testing
API versioning No versioning in GraphQL schemas Breaking changes affect all consumers simultaneously
Double-entry bookkeeping Transaction service uses single-table log Financial reporting limited; refunds/chargebacks hard to audit
Automated testing 2-3 test files per service (H7 falsified for coverage) Regression risk during migration
Circuit breakers None in service code Cascading failure risk
Request retry/idempotency Not observed in RabbitMQ consumers Duplicate message processing possible

4. Coupling Hotspots

4.1 Highest-Coupling Services

Service Inbound Dependencies Outbound Dependencies Coupling Score
Keycloak 28+ services (JWT) Magic Link SPIs Critical
Inventory stripe, subscriptions, shoutout, class-catalog, purchase-request-bpm Tags, product catalog High
SSE 7+ publishing services Redis, PostgreSQL High
Notifications 7 inbound message types email, sms, SSE High
Purchase-Request BPM stripe (triggers) wallet, inventory, email, SSE High
Email 5 inbound message types Mandrill API Medium
Stripe frontend, webhooks Inventory, subscriptions Medium

4.2 Coupling-Driven Migration Constraints

  1. Keycloak must be last — changing identity affects all 28+ services simultaneously
  2. Inventory requires facade — 5 dependents cannot all migrate at once; need backward-compatible API during transition
  3. BPM instances must drain — purchase-request-bpm and shoutout-bpm have in-flight state that cannot be cold-migrated
  4. SSE is infrastructure — treat as platform service, upgrade in place rather than migrate
  5. Notification pipeline is a chain — notifications → email/sms → Mandrill/Twilio. Consolidate before migrating dependents

5. Buy-vs-Build Analysis

5.1 SaaS Replacement Candidates

Service Current SaaS Candidate Effort to Switch Recommendation
Email delivery Mandrill (via deprecated lutung library) SendGrid, Postmark, Amazon SES Low — API surface simple, templates portable Replace library (keep Mandrill or switch provider)
SMS delivery Twilio (already SaaS) Keep Twilio None — already SaaS Keep
Chat Stream Chat (already SaaS) Keep Stream Chat None — already SaaS Keep
BPM engine CIB Seven 2.0 (EOL) Temporal, Conductor, state machine Medium — 2 workflows, ~10 states each Build lightweight state machine (workflows too simple for heavy BPM)
Logging pipeline peeq-logging (Node.js) → Elasticsearch GCP Cloud Logging native Low — remove custom pipeline Replace with Cloud Logging
Search Self-managed Elasticsearch 7.x Elastic Cloud, Algolia, Meilisearch Medium — indices need rebuild Evaluate — Cloud Logging may eliminate log search need; content search is separate question
Video transcoding Mux (already SaaS) Keep Mux None — already SaaS Keep
Webinar Zoom (already SaaS) Keep Zoom None — already SaaS Keep
Analytics Airbyte + Snowflake (already SaaS) Keep None — already SaaS Keep

5.2 Services Already Using SaaS

The platform already delegates significant functionality to SaaS: - Payments: Stripe (Checkout, Elements, Billing Portal) - Video: Mux (transcoding, delivery, signed URLs) - Video conferencing: Zoom (webinars, registrations) - Chat: Stream Chat (messaging, channels) - SMS: Twilio (messaging, verification) - Email: Mandrill (transactional email) - Analytics: Airbyte + Snowflake (CDC + warehouse) - Session replay: LogRocket (frontend monitoring) - Cost optimization: CastAI (GKE node management)

Finding: The platform already follows a SaaS-delegation pattern extensively. Gen 3 should continue this pattern. The only custom infrastructure at risk is the BPM engine and logging pipeline.


6. Security Gaps

Gap Current State Risk Level Remediation
Security scanning non-enforcing Trivy + Qwiet scan but don’t fail builds High Enforce scan gates in CI pipeline
No Binary Authorization Any container can deploy to GKE High Add Binary Auth with signing
No NetworkPolicies All pods can communicate freely Medium Add namespace-level NetworkPolicies
CORS allows all origins Observed in service configs Medium Restrict to known tenant domains
Public GCS buckets Some storage buckets are public Medium Audit and restrict bucket ACLs
No WAF Istio provides routing only Medium Add Cloud Armor or WAF rules
CIB Seven EOL Camunda 7 CE support ended Oct 2025 High Replace BPM engine (security patches stopping)
Elasticsearch 7.x End of life Medium Upgrade to 8.x or replace with Cloud Logging

7. Compliance Gaps

Area Current State Risk H9 Status
PCI scope Stripe handles all card data (likely SAQ-A) Low Need Stripe dashboard confirmation
PII handling PII in 5 locations (Keycloak, checkout recipients, profiles, shoutout recipients) Medium No data classification policy documented
Data retention No retention policies observed Medium Soft deletes exist but no automated purge
Audit logging Created_on/updated_on timestamps exist; no audit trail service Medium Cannot trace who changed what
GDPR/right to delete Soft delete flags exist but no data subject request workflow Medium Manual process likely required

8. Hypothesis Summary for Gap Analysis

# Hypothesis Assurance Gap Implication
H1 Broadcast not in production L2 No gap — feature not needed
H2 Dwolla inactive L2 No gap — archive repos
H3 Gen 1 fully replaced by Gen 2 L1 Only infra Gen 1 services remain (retire)
H5 >50% repos archivable L1 ~110 repos archivable (~58%)
H6 No shared DB backdoors L1 Clean service boundaries enable independent migration
H7 >60% test coverage L0 Falsified Major gap — very low test coverage increases migration regression risk
H8 Data volumes manageable L0 Partial Gap — cannot plan migration windows without actual data
H9 No compliance blockers L0 Gap — PCI scope unconfirmed, no data classification

Last updated: 2026-01-30 — Session 10 Review by: 2026-04-30 Staleness risk: Medium — gaps may be addressed as modernization progresses