Gap Analysis & Constraints
Key Takeaways
- CIB Seven (Camunda 7) EOL is the highest-urgency
constraint — Camunda 7 CE support ended October 2025. Two BPM
services (purchase-request-bpm, shoutout-bpm) and the Keycloak identity
sync plugin depend on it. Replacement is mandatory regardless of other
modernization decisions.
- 11 active external API integrations are immovable —
Stripe, Mux, Zoom, Stream Chat, Twilio, Mandrill, Google Calendar,
Elasticsearch, LogRocket, CastAI, and Airbyte/Snowflake. API keys,
webhook URLs, and external IDs must be preserved during any
migration.
- ~110 of 191 repos are archivable — 63 repos
untouched since before 2025, plus ~47 Gen 1 services with confirmed Gen
2 replacements. Only ~35 application services and ~15 infrastructure
repos are active.
- Observability has critical gaps — No alerting
configuration, APM disabled, no SLOs, no distributed tracing adoption,
no error budget tracking. Monitoring infrastructure exists (Prometheus,
Grafana, Elasticsearch) but is under-utilized.
- Buy-vs-build analysis identifies 4 SaaS replacement
candidates — Email delivery (Mandrill → SendGrid/Postmark), SMS
(Twilio already SaaS — keep), BPM engine (CIB Seven →
Temporal/lightweight state machine), logging pipeline (peeq-logging →
Cloud Logging native).
Migration Decision Question
What are the highest-impact gaps and debts blocking
modernization?
1. Immovable Constraints
These cannot be changed by the modernization effort — the
architecture must accommodate them.
1.1 External API
Integrations (Must Preserve)
| External Service |
Platform Services |
What Must Be Preserved |
Risk |
| Stripe |
stripe, subscriptions, purchase-request-bpm |
Customer IDs, product IDs, subscription IDs, webhook URLs, Checkout
Sessions |
High — payment disruption |
| Mux |
content, media, shoutout |
Asset IDs, playback IDs, webhook URLs, signed URLs |
High — video content loss |
| Zoom |
webinar |
Meeting IDs, registrant IDs, API credentials |
Medium — event disruption |
| Stream Chat |
chat |
Channel IDs, API keys, user tokens |
Medium — chat history |
| Twilio |
sms |
Account SID, phone numbers, messaging service |
Low — stateless |
| Mandrill |
email |
API key, templates, sender domains |
Low — stateless |
| Google Calendar |
webinar |
Event IDs, OAuth credentials |
Low — reconstructible |
| Elasticsearch |
search, peeq-logging |
Indices, Kibana dashboards |
Medium — search downtime |
| LogRocket |
frontends, peeq-mono |
App IDs, session config |
Low — frontend only |
| CastAI |
GKE clusters |
Agent config, node policies |
Low — infra only |
| Airbyte/Snowflake |
analytics |
CDC connections (20 DBs), warehouse schemas |
Medium — analytics gap |
1.2 Multi-Brand
Architecture (Must Support)
- 4 production brands: The Agile Network, NIL Game
Plan, VT NIL, Speed of AI
- All differentiation is config-only (H11 L2
Verified) — same Docker images, same Helm charts
- Config mechanism:
values-globals.yaml
per tenant (domain, Keycloak realm, API keys, feature toggles)
- Implication: Any Gen 3 architecture must support
multi-tenancy. Current cluster-per-tenant model works but is
expensive.
1.3
Keycloak as Universal Identity (Must Migrate Carefully)
- 28+ services validate JWT tokens against
Keycloak
- All user IDs are Keycloak UUIDs — no local identity
generation anywhere
- Custom SPIs: Magic Link (passwordless email/SMS)
and Session Restrictor must be preserved
- CIB Seven plugin: Keycloak→Camunda identity sync —
must be replaced when CIB Seven is replaced
- Implication: Keycloak must be the LAST thing
migrated, or must remain backward-compatible throughout
1.4
Database-Per-Service Isolation (Must Respect)
- 35 PostgreSQL databases per tenant — no
cross-database foreign keys
- All inter-service data references use Keycloak
UUIDs or external IDs
- Shared DB exception: email/sms/notifications share
peeq-notification-service-db
- Implication: Services CAN be migrated independently
(good), but data consistency depends on messaging contracts (must not
break)
1.5 Data Volume (Partially
Unknown)
- Cloud SQL tier: db-custom-2-6656 (2 vCPU, 6.5 GB
RAM)
- Max connections: 750 shared across 35
databases
- H8 still L0: Actual row counts and table sizes
unknown — need production DB access
- Implication: Cannot estimate migration windows or
decide database consolidation without actual volume data
2. Dead Code &
Inactive Services Assessment
| Service |
Evidence |
Session |
| peeq-dwolla |
Not in ArgoCD, no Gen 2 references, last commit Jan 2023 |
5 |
| peeq-mux-livestream |
Mux Spaces API deprecated, no ArgoCD app |
3 |
| peeq-jitsi-meet |
Docker container only, webinar uses Zoom |
3 |
| peeq-meet-and-greet-bpm |
Camunda 7.17, Jitsi dependency, no Gen 2 |
6 |
| peeq-custom-tixr |
Gen 1 Tixr integration, inactive |
6 |
| peeq-conference-sse |
Gen 1, meet-and-greet only |
7 |
| peeq-websocket |
Gen 1 Node.js, single EC2 SPOF, Jitsi |
7 |
| peeq-sse |
Gen 1, replaced by Gen 2 SSE |
7 |
| peeq-logging |
Gen 1 Node.js, superseded by Cloud Logging |
8 |
| peeq-shared-secret |
Gen 1 Java 11, superseded by AVP + GCP SM |
8 |
| broadcast (all variants) |
Never deployed, Mux Spaces deprecated |
3 |
| conference |
Never deployed, no ArgoCD app |
3 |
Total: 12+ services confirmed dead or replaced
2.2 Frontend Dead Code
| Dead Code |
Location |
Evidence |
| BroadcastGateway |
peeq-mono |
Calls non-existent broadcast backend |
| ConferenceGateway |
peeq-mono |
Calls non-existent conference backend |
| StreamGateway |
peeq-mono |
Calls non-existent stream backend |
| DwollaService |
peeq-mono + frontends |
Calls non-existent Dwolla backend |
| LoggingGateway |
peeq-mono |
Calls non-existent logging API |
Impact: ~17% of frontend API gateway code targets
non-existent production services.
2.3 Backend Dead Code
| Dead Code |
Service |
Evidence |
| Arlo LMS integration |
class-catalog |
Deprecated, migration complete |
| Deprecated GraphQL queries |
celebrity |
3 queries marked deprecated |
| Email GraphQL API |
email |
All marked deprecated, migrating to Keycloak |
mux-sync Python ETL |
standalone |
Utility, not a production service |
2.4 Archive Candidate Summary
| Category |
Count |
Action |
| Gen 1 services with Gen 2 replacement |
~35 |
Archive repos |
| Inactive/never-deployed services |
~15 |
Archive repos |
| Legacy DB repos (schemas in Gen 2 Flyway) |
~20 |
Archive repos |
| POC/experiment repos |
~20 |
Archive repos |
| Legacy frontend repos |
~15 |
Archive repos |
| Total archivable |
~105 |
H5 validated |
| Active repos |
~50 |
35 app + 15 infra |
| Unclear |
~36 |
Need further review |
3. Feature Parity Gaps
3.1 Gen 1 Features Not in Gen
2
| Feature |
Gen 1 Service |
Gen 2 Status |
Gap Impact |
| Live broadcasting |
peeq-mux-livestream |
Not built |
None — Mux Spaces deprecated, feature not in
use |
| Meet-and-greet |
peeq-meet-and-greet-bpm |
Not built |
None — feature deprecated, Jitsi dependency |
| Tixr ticketing |
peeq-custom-tixr |
Not built |
None — no evidence of active use |
| Socket.IO real-time |
peeq-websocket |
Replaced by SSE |
None — SSE covers use cases |
| Jitsi video conferencing |
peeq-jitsi-meet |
Not built |
None — Zoom replaced all video conferencing |
Finding: No feature parity gaps. All Gen 1 unique
features were either deprecated, replaced by SaaS (Zoom), or replaced by
Gen 2 alternatives (SSE).
3.2 Missing
Capabilities (Not in Gen 1 or Gen 2)
| Capability |
Current State |
Impact on Modernization |
| API gateway |
Istio path-based routing only; no rate limiting, versioning, or API
keys |
Need API management layer for Gen 3 |
| Distributed tracing |
Istio Stackdriver config exists but not adopted |
Cannot debug cross-service requests |
| Alerting |
Prometheus deployed, no alert rules configured |
Zero automated incident detection |
| SLO/SLI definitions |
None |
Cannot measure service reliability |
| Error budgets |
None |
Cannot balance velocity vs. reliability |
| Feature flags |
None (tenant config only) |
Cannot do gradual rollouts or A/B testing |
| API versioning |
No versioning in GraphQL schemas |
Breaking changes affect all consumers simultaneously |
| Double-entry bookkeeping |
Transaction service uses single-table log |
Financial reporting limited; refunds/chargebacks hard to audit |
| Automated testing |
2-3 test files per service (H7 falsified for coverage) |
Regression risk during migration |
| Circuit breakers |
None in service code |
Cascading failure risk |
| Request retry/idempotency |
Not observed in RabbitMQ consumers |
Duplicate message processing possible |
4. Coupling Hotspots
4.1 Highest-Coupling Services
| Service |
Inbound Dependencies |
Outbound Dependencies |
Coupling Score |
| Keycloak |
28+ services (JWT) |
Magic Link SPIs |
Critical |
| Inventory |
stripe, subscriptions, shoutout, class-catalog,
purchase-request-bpm |
Tags, product catalog |
High |
| SSE |
7+ publishing services |
Redis, PostgreSQL |
High |
| Notifications |
7 inbound message types |
email, sms, SSE |
High |
| Purchase-Request BPM |
stripe (triggers) |
wallet, inventory, email, SSE |
High |
| Email |
5 inbound message types |
Mandrill API |
Medium |
| Stripe |
frontend, webhooks |
Inventory, subscriptions |
Medium |
4.2 Coupling-Driven
Migration Constraints
- Keycloak must be last — changing identity affects
all 28+ services simultaneously
- Inventory requires facade — 5 dependents cannot all
migrate at once; need backward-compatible API during transition
- BPM instances must drain — purchase-request-bpm and
shoutout-bpm have in-flight state that cannot be cold-migrated
- SSE is infrastructure — treat as platform service,
upgrade in place rather than migrate
- Notification pipeline is a chain — notifications →
email/sms → Mandrill/Twilio. Consolidate before migrating
dependents
5. Buy-vs-Build Analysis
5.1 SaaS Replacement
Candidates
| Service |
Current |
SaaS Candidate |
Effort to Switch |
Recommendation |
| Email delivery |
Mandrill (via deprecated lutung library) |
SendGrid, Postmark, Amazon SES |
Low — API surface simple, templates portable |
Replace library (keep Mandrill or switch
provider) |
| SMS delivery |
Twilio (already SaaS) |
Keep Twilio |
None — already SaaS |
Keep |
| Chat |
Stream Chat (already SaaS) |
Keep Stream Chat |
None — already SaaS |
Keep |
| BPM engine |
CIB Seven 2.0 (EOL) |
Temporal, Conductor, state machine |
Medium — 2 workflows, ~10 states each |
Build lightweight state machine (workflows too
simple for heavy BPM) |
| Logging pipeline |
peeq-logging (Node.js) → Elasticsearch |
GCP Cloud Logging native |
Low — remove custom pipeline |
Replace with Cloud Logging |
| Search |
Self-managed Elasticsearch 7.x |
Elastic Cloud, Algolia, Meilisearch |
Medium — indices need rebuild |
Evaluate — Cloud Logging may eliminate log search
need; content search is separate question |
| Video transcoding |
Mux (already SaaS) |
Keep Mux |
None — already SaaS |
Keep |
| Webinar |
Zoom (already SaaS) |
Keep Zoom |
None — already SaaS |
Keep |
| Analytics |
Airbyte + Snowflake (already SaaS) |
Keep |
None — already SaaS |
Keep |
5.2 Services Already Using
SaaS
The platform already delegates significant functionality to SaaS: -
Payments: Stripe (Checkout, Elements, Billing Portal) -
Video: Mux (transcoding, delivery, signed URLs) -
Video conferencing: Zoom (webinars, registrations) -
Chat: Stream Chat (messaging, channels) -
SMS: Twilio (messaging, verification) -
Email: Mandrill (transactional email) -
Analytics: Airbyte + Snowflake (CDC + warehouse) -
Session replay: LogRocket (frontend monitoring) -
Cost optimization: CastAI (GKE node management)
Finding: The platform already follows a
SaaS-delegation pattern extensively. Gen 3 should continue this pattern.
The only custom infrastructure at risk is the BPM engine and logging
pipeline.
6. Security Gaps
| Gap |
Current State |
Risk Level |
Remediation |
| Security scanning non-enforcing |
Trivy + Qwiet scan but don’t fail builds |
High |
Enforce scan gates in CI pipeline |
| No Binary Authorization |
Any container can deploy to GKE |
High |
Add Binary Auth with signing |
| No NetworkPolicies |
All pods can communicate freely |
Medium |
Add namespace-level NetworkPolicies |
| CORS allows all origins |
Observed in service configs |
Medium |
Restrict to known tenant domains |
| Public GCS buckets |
Some storage buckets are public |
Medium |
Audit and restrict bucket ACLs |
| No WAF |
Istio provides routing only |
Medium |
Add Cloud Armor or WAF rules |
| CIB Seven EOL |
Camunda 7 CE support ended Oct 2025 |
High |
Replace BPM engine (security patches stopping) |
| Elasticsearch 7.x |
End of life |
Medium |
Upgrade to 8.x or replace with Cloud Logging |
7. Compliance Gaps
| Area |
Current State |
Risk |
H9 Status |
| PCI scope |
Stripe handles all card data (likely SAQ-A) |
Low |
Need Stripe dashboard confirmation |
| PII handling |
PII in 5 locations (Keycloak, checkout recipients, profiles,
shoutout recipients) |
Medium |
No data classification policy documented |
| Data retention |
No retention policies observed |
Medium |
Soft deletes exist but no automated purge |
| Audit logging |
Created_on/updated_on timestamps exist; no audit trail service |
Medium |
Cannot trace who changed what |
| GDPR/right to delete |
Soft delete flags exist but no data subject request workflow |
Medium |
Manual process likely required |
8. Hypothesis Summary for
Gap Analysis
| # |
Hypothesis |
Assurance |
Gap Implication |
| H1 |
Broadcast not in production |
L2 |
No gap — feature not needed |
| H2 |
Dwolla inactive |
L2 |
No gap — archive repos |
| H3 |
Gen 1 fully replaced by Gen 2 |
L1 |
Only infra Gen 1 services remain (retire) |
| H5 |
>50% repos archivable |
L1 |
~110 repos archivable (~58%) |
| H6 |
No shared DB backdoors |
L1 |
Clean service boundaries enable independent migration |
| H7 |
>60% test coverage |
L0 Falsified |
Major gap — very low test coverage increases
migration regression risk |
| H8 |
Data volumes manageable |
L0 Partial |
Gap — cannot plan migration windows without actual
data |
| H9 |
No compliance blockers |
L0 |
Gap — PCI scope unconfirmed, no data
classification |
Last updated: 2026-01-30 — Session 10 Review by:
2026-04-30 Staleness risk: Medium — gaps may be addressed as
modernization progresses