Modernization

Tech Debt Inventory

Last updated: 2026-02-01 | Modernization

Tech Debt Inventory

Key Takeaways

  1. P0 debt (must fix for migration): CIB Seven EOL (2 BPM services + Keycloak plugin), deprecated Mandrill library (lutung 0.0.8 unmaintained), zonal GKE/Cloud SQL (no HA), and near-zero test coverage across all services.
  2. P1 debt (should fix during migration): Cluster-per-tenant cost scaling, security scanning non-enforcement, frontend CSS framework split (Tailwind vs Bootstrap), and missing observability (alerting, tracing, SLOs).
  3. Over-decomposed services identified: 6 services with <5 endpoints that should be consolidated — wallet (3 tables), transaction (1 table), onsite-event (2 tables), SSE (2 tables), chat (thin Stream wrapper), message-board (5 tables).
  4. Gen 1/Gen 2 overlap: ~35 Gen 1 repos still exist alongside Gen 2 replacements. 12+ confirmed dead services still have repos. Archive effort needed before migration to reduce confusion.
  5. Total tech debt items: 32 items across 7 categories, with 8 P0 (blocking), 14 P1 (important), and 10 P2 (improve when convenient).

Migration Decision Question

What technical debt blocks or complicates the modernization effort?


Priority Definitions

Priority Meaning Action
P0 Blocks migration — must resolve before or during migration Fix immediately or as prerequisite
P1 Complicates migration — should resolve during migration Fix as part of migration work
P2 Quality improvement — fix when touching related code Fix opportunistically

Effort Definitions

Size Scope
S 1-2 services, <1 week equivalent effort
M 2-4 services, 1-3 weeks equivalent effort
L 4-6 services, 3-6 weeks equivalent effort
XL 6+ services, 6+ weeks equivalent effort

1. BPM Engine (CIB Seven / Camunda 7) — P0

Item Detail
Debt CIB Seven 2.0 (fork of Camunda 7 CE) — community support ended October 2025
Affected Services purchase-request-bpm, shoutout-bpm, cibseven-keycloak plugin
Risk No security patches, no bug fixes, JDK compatibility concerns
Effort M
Remediation Replace with lightweight state machine (workflows are ~10 states each, not complex enough to justify a full BPM engine). Alternatively, migrate to Temporal or Conductor.
Dependencies Keycloak identity sync plugin must also be replaced
Session 4, 6

2. Test Coverage — P0

Item Detail
Debt Near-zero automated test coverage across all Gen 2 services (2-3 test files per service)
Affected Services All 28+ Gen 2 services
Risk Regression during migration — no safety net for refactoring or upgrading
Effort XL (foundational — ongoing effort)
Remediation Phase 1: Add integration tests for critical paths (payment, auth, inventory). Phase 2: Add unit tests as services are touched during migration. Target 60% coverage for migrated services.
Dependencies CI pipeline needs test enforcement gates
Session 2-7 (H7 falsified)

3. Zonal Infrastructure — P0

Item Detail
Debt GKE clusters and Cloud SQL instances are zonal (us-central1-a), not regional
Affected Services All services + databases (entire platform)
Risk Zone failure = complete tenant outage, no automatic failover
Effort L
Remediation Upgrade GKE to regional clusters, Cloud SQL to regional HA. Requires Terraform module updates + maintenance window per tenant.
Dependencies Terraform modules, CastAI node policies
Session 8

4. Mandrill Library (lutung) — P0

Item Detail
Debt Email service uses lutung 0.0.8, an unmaintained Java Mandrill library
Affected Services email service
Risk No security updates, potential incompatibility with newer JDKs
Effort S
Remediation Replace with official Mandrill REST API calls (HTTP client) or switch to SendGrid/Postmark with maintained SDK
Dependencies None — isolated to email service
Session 7

5. Deprecated Keycloak Email Verification — P0

Item Detail
Debt Email service has GraphQL API for email verification (checkStatus, confirmCode, sendCode) marked as deprecated — migrating to Keycloak native
Affected Services email, users, Keycloak SPIs
Risk Dual verification paths create confusion; deprecated code still callable
Effort S
Remediation Complete migration to Keycloak native verification, remove deprecated GraphQL operations
Dependencies Frontend must stop calling deprecated email verification API
Session 7

6. Security Scanning Non-Enforcement — P0

Item Detail
Debt Trivy and Qwiet (ShiftLeft) scan containers and code but don’t fail CI builds
Affected Services All services (CI pipeline)
Risk Known vulnerabilities ship to production without blocking
Effort S
Remediation Add --exit-code 1 to Trivy scan, configure Qwiet to fail on high/critical findings. Add Binary Authorization for container signing.
Dependencies GitHub Actions reusable workflows
Session 8

7. Frontend Dead Code — P0

Item Detail
Debt ~17% of frontend API gateway code calls non-existent production services (broadcast, conference, stream, dwolla, logging)
Affected Services peeq-mono, frontends
Risk Confuses developers, generates runtime errors, complicates migration analysis
Effort S
Remediation Remove dead gateway services and related components from both frontend repos
Dependencies None — services don’t exist
Session 1

8. No Double-Entry Bookkeeping — P0

Item Detail
Debt Transaction service uses single-table JSON payment log — no debit/credit ledger
Affected Services transaction, wallet
Risk Financial reporting limited, refund/chargeback audit difficult, coin balance discrepancies possible
Effort M
Remediation If Gen 3 needs financial reporting, redesign transaction model with proper double-entry bookkeeping. If current simple log is sufficient, document the limitation and add reconciliation checks.
Dependencies wallet (coin balance), stripe (payment records)
Session 5, 9

9. Cluster-Per-Tenant Cost — P1

Item Detail
Debt Each of 4 production brands gets a dedicated GKE cluster + Cloud SQL + RabbitMQ + Redis + Keycloak
Affected Services All (infrastructure)
Risk Infrastructure cost scales linearly with tenant count; blocks affordable scaling to more brands
Effort XL
Remediation Consolidate to shared cluster with namespace-per-tenant isolation (NetworkPolicies, ResourceQuotas, Istio AuthorizationPolicy). H11 L2 confirms no code-level tenant branching.
Dependencies NetworkPolicies, Istio RBAC, Terraform refactor
Session 8

10. Frontend CSS Framework Split — P1

Item Detail
Debt peeq-mono uses Tailwind/DaisyUI; frontends uses Bootstrap/Angular Material
Affected Services peeq-mono, frontends
Risk Primary blocker for frontend unification — cannot merge repos without component restyling
Effort XL
Remediation Pick one framework (Tailwind recommended — modern, more flexible), migrate component-by-component. Not a big-bang rewrite.
Dependencies Design system decision must come first
Session 1

11. Missing Alerting Configuration — P1

Item Detail
Debt Prometheus deployed with kube-prometheus-stack but no custom alert rules configured
Affected Services All (monitoring)
Risk Zero automated incident detection — outages discovered by users, not monitoring
Effort M
Remediation Define SLIs/SLOs per service, create PrometheusRules for key indicators (error rate, latency, availability), configure PagerDuty/Slack integration
Dependencies SLO definitions (business input needed)
Session 8

12. No Distributed Tracing — P1

Item Detail
Debt Istio Stackdriver tracing configured but not adopted by services; no OpenTelemetry SDK integration
Affected Services All services
Risk Cannot debug cross-service request flows — critical for 28+ microservice architecture
Effort M
Remediation Add OpenTelemetry Java agent to common Helm chart (auto-instrumentation), configure sampling, enable trace visualization in Grafana or Cloud Trace
Dependencies Common Helm chart update, trace backend selection
Session 8

13. No NetworkPolicies — P1

Item Detail
Debt All pods in GKE can communicate freely — no namespace-level network isolation
Affected Services All (infrastructure)
Risk Lateral movement possible if any pod is compromised; blocks multi-tenant consolidation
Effort M
Remediation Define NetworkPolicies per namespace: default-deny ingress, explicit allow for known service-to-service paths
Dependencies Integration patterns doc provides the allow-list of communication paths
Session 8

14. CORS Allows All Origins — P1

Item Detail
Debt Backend services observed with permissive CORS configuration
Affected Services All GraphQL services
Risk Cross-origin attacks possible; violates defense-in-depth
Effort S
Remediation Restrict CORS origins to known tenant domains (4 production + dev/preview) via common configuration
Dependencies Tenant domain registry in values-globals.yaml
Session 8

15. Deprecated Arlo LMS Integration — P1

Item Detail
Debt class-catalog contains deprecated Arlo LMS integration code
Affected Services class-catalog
Risk Dead code confusion during migration
Effort S
Remediation Remove Arlo-related code paths, DB columns (if safe), and configuration
Dependencies None — Arlo confirmed deprecated
Session 6

16. Deprecated Celebrity GraphQL Queries — P1

Item Detail
Debt 3 GraphQL queries in celebrity service marked deprecated (celebrity, celebrities, celebritiesPaged)
Affected Services celebrity, frontends
Risk Frontend may still call deprecated queries; dual API surface complicates migration
Effort S
Remediation Verify frontend usage, migrate to non-deprecated equivalents, remove deprecated queries
Dependencies Frontend API call audit
Session 2

17. Inconsistent core-lib Versions — P1

Item Detail
Debt core-lib ranges from 0.0.67-0.0.69, messages lib from 0.0.48-0.0.73 across services
Affected Services All Gen 2 services
Risk Version drift could cause subtle behavior differences; complicates shared library upgrades
Effort S
Remediation Align all services to latest core-lib and messages versions as part of Spring Boot upgrade
Dependencies Must test each service after version bump
Session 2-7 (H13)

18. APM Disabled — P1

Item Detail
Debt Elastic APM agent is available but disabled by default across all services
Affected Services All services
Risk No application performance visibility; cannot identify slow queries, memory leaks, or performance regressions
Effort S
Remediation Enable APM via Helm chart toggle (already configurable), or switch to OpenTelemetry-based APM
Dependencies Elasticsearch/APM Server capacity planning
Session 8

19. PgBouncer Routing for 41 Databases — P1

Item Detail
Debt PgBouncer routes 41 databases across 3 replicas — some databases may be unused
Affected Services All database-connected services
Risk Connection overhead for unused databases; 750 max connection limit shared across 35 active DBs
Effort S
Remediation Audit PgBouncer routing table against actual database usage; remove unused database entries
Dependencies Production database access needed
Session 8

20. CastAI On-Demand Overrides — P1

Item Detail
Debt 25+ services forced to on-demand instances despite CastAI spot-first policy
Affected Services Infrastructure (GKE nodes)
Risk Reduced cost savings from spot instance optimization
Effort S
Remediation Audit on-demand overrides; most stateless services should tolerate spot eviction with proper PodDisruptionBudgets
Dependencies Service resilience testing
Session 8

21. Gen 1 DB Repos Still Active — P1

Item Detail
Debt ~20 peeq-*-db repos contain Gen 1 Flyway migrations, but Gen 2 services manage their own schemas
Affected Services None (repos are inert)
Risk Developer confusion about which schema source is authoritative
Effort S
Remediation Archive all Gen 1 DB repos with README pointing to Gen 2 service
Dependencies Confirm no CI pipeline references Gen 1 DB repos
Session 2-7

22. Elasticsearch 7.x EOL — P1

Item Detail
Debt Elasticsearch 7.15.2 and Kibana 7.15.2 are end-of-life
Affected Services search, peeq-logging, Kibana dashboards
Risk No security patches; compatibility issues with newer clients
Effort M
Remediation Upgrade to Elasticsearch 8.x, or replace log aggregation with Cloud Logging and search with a managed service
Dependencies Kibana dashboard export/import; search index rebuild
Session 8, 9

23. NFS Storage Coupling — P2

Item Detail
Debt 4 PVCs (50Gi each) for content, media, shoutout, streaming tied to GKE NFS provisioner
Affected Services content, media, shoutout, streaming
Risk NFS not cloud-native; blocks multi-region; tied to specific GKE cluster
Effort M
Remediation Migrate file storage to GCS (Google Cloud Storage) with signed URLs
Dependencies Spring Content filesystem abstraction in content service
Session 3, 9

24. No API Versioning — P2

Item Detail
Debt GraphQL schemas have no versioning strategy; breaking changes affect all consumers
Affected Services All 24+ GraphQL gateways
Risk Cannot evolve APIs without coordinated frontend+backend deployment
Effort M
Remediation Adopt GraphQL schema evolution best practices (additive changes, deprecation annotations, sunset period)
Dependencies Frontend deployment coordination
Session 1, 9

25. No Feature Flags — P2

Item Detail
Debt No feature flag system; only tenant-level config toggles
Affected Services All services
Risk Cannot do gradual rollouts, canary deployments, or A/B testing
Effort M
Remediation Add feature flag service (LaunchDarkly, Unleash, or custom) as part of Gen 3
Dependencies Frontend and backend integration
Session General

26. No Circuit Breakers — P2

Item Detail
Debt No circuit breaker pattern in service-to-service calls
Affected Services All services making synchronous calls
Risk Cascading failures when downstream service degrades
Effort S
Remediation Add Resilience4j circuit breakers to GraphQL client calls; Istio can also provide mesh-level circuit breaking
Dependencies Service dependency map (integration-patterns.md)
Session 9

27. No Message Idempotency — P2

Item Detail
Debt RabbitMQ consumers don’t implement idempotency checks
Affected Services All RabbitMQ consumers (~28 services)
Risk Duplicate message processing possible during network issues or consumer restarts
Effort M
Remediation Add message deduplication (idempotency key tracking) to core-lib MessageHandler base class
Dependencies core-lib update affecting all services
Session 9

28. Public GCS Buckets — P2

Item Detail
Debt Some GCS buckets observed with public access
Affected Services Content, media storage
Risk Unauthorized access to uploaded content; data exposure
Effort S
Remediation Audit all GCS bucket IAM policies; switch to signed URLs for content delivery
Dependencies Frontend URL rewriting
Session 8

29. No Data Retention Policy — P2

Item Detail
Debt No automated data retention or purge policies; soft delete flags exist but no cleanup
Affected Services All services with databases
Risk Unbounded data growth; GDPR/CCPA compliance risk
Effort M
Remediation Define retention policies per data category; implement automated purge jobs; add data subject request workflow
Dependencies Legal/compliance input on retention periods
Session 9

30. No WAF/DDoS Protection — P2

Item Detail
Debt No Web Application Firewall in front of Istio IngressGateway
Affected Services All public-facing services
Risk Application-layer attacks not filtered
Effort S
Remediation Add Google Cloud Armor to GCP Load Balancer with OWASP rules
Dependencies Load balancer configuration
Session 8

31. No Audit Trail Service — P2

Item Detail
Debt created_on/updated_on timestamps exist but no centralized audit trail
Affected Services All services
Risk Cannot trace who changed what; compliance audit difficult
Effort M
Remediation Add event sourcing or audit log service that captures change events
Dependencies RabbitMQ message infrastructure (already exists)
Session General

32. Dual Keycloak Instances — P2

Item Detail
Debt identityx-26 (Keycloak 26.3.2) and identityx-25 (Keycloak 25.x) both deployed to agilenetwork tenant
Affected Services Identity domain
Risk Configuration drift between versions; increased operational burden
Effort S
Remediation Complete migration to Keycloak 26 for all tenants; retire identityx-25
Dependencies Realm config migration, SPI compatibility
Session 2

Summary Matrix

By Priority

Priority Count Effort Range Key Items
P0 8 S to XL CIB Seven EOL, test coverage, zonal HA, Mandrill lib, security enforcement, frontend dead code, deprecated email verification, no bookkeeping
P1 14 S to XL Cluster cost, CSS split, alerting, tracing, NetworkPolicies, CORS, Arlo, deprecated APIs, core-lib versions, APM, PgBouncer, CastAI, Gen 1 DB repos, Elasticsearch EOL
P2 10 S to M NFS storage, API versioning, feature flags, circuit breakers, idempotency, public GCS, data retention, WAF, audit trail, dual Keycloak

By Domain

Domain P0 P1 P2 Total
Infrastructure 2 8 2 12
Payment/BPM 2 0 0 2
Frontend 2 1 0 3
Communication 1 0 0 1
Cross-cutting 1 3 5 9
Identity 0 1 1 2
Events 0 1 0 1
Content 0 0 1 1
All services 0 0 1 1

Over-Decomposed Services (Consolidation Candidates)

Service Endpoints Tables/Migrations Consolidation Target
wallet ~5 3 tables, 3 migrations → Payment domain service
transaction ~3 1 table, 6 migrations → Payment domain service
onsite-event ~3 2 tables, 2 migrations → Events domain service
chat ~5 2 tables (Stream metadata) → Communication service
message-board ~6 5 tables, 5 migrations → Communication service
SSE ~3 2 tables, 3 migrations Keep separate (infrastructure)

Note: SSE is listed as low-complexity but serves as platform infrastructure (8 inbound handlers, 7+ publishers). It should remain a separate service despite its small schema.


Last updated: 2026-01-30 — Session 10 Review by: 2026-04-30 Staleness risk: Medium — debt items may be resolved as modernization progresses