ADR-015: Testing Strategy
ADR-015: Testing Strategy
Status
Proposed — Pending engineering team review
Context
H7 falsified: test coverage across all Gen 2 services is near-zero. Most services have 2-3 test files with minimal assertions. There are no CI test gates — builds pass regardless of test results. This is the single biggest risk to any migration or consolidation effort.
Current State
| Metric | Value |
|---|---|
| Average test files per service | 2-3 |
| Test coverage | Near-zero (no coverage tooling configured) |
| CI test gates | None — tests don’t fail builds |
| Integration test framework | None |
| E2E test framework | None |
| Test database strategy | None standardized |
| BDD/Gherkin scenarios | None |
Impact
- Every service touched during consolidation (ADR-001) is a regression gamble
- Cannot validate database migration (ADR-005) without integration tests
- Cannot verify BPM migration (ADR-013) without workflow tests
- Cannot validate API compatibility after service merges
- Agentic coding amplifies this: agents can generate code fast, but without tests, they generate regressions equally fast
Decision
Adopt a tiered testing strategy prioritized by blast radius, using Testcontainers for integration tests and BDD/Gherkin for acceptance criteria. Leverage agentic coding to generate test suites from existing code and API contracts.
Testing Pyramid
| Layer | Framework | Scope | Target Coverage |
|---|---|---|---|
| Unit | JUnit 5 + Mockito | Individual classes, business logic | 80% line coverage for new/modified code |
| Integration | Spring Boot Test + Testcontainers | Service + database + RabbitMQ | All repository methods, message handlers |
| Contract | Spring Cloud Contract or Pact | GraphQL schema compatibility | All inter-service API contracts |
| E2E/BDD | Cucumber + REST Assured (backend), Playwright (frontend) | Full user flows | Critical paths: payment, auth, shoutout lifecycle |
Priority Order (by blast radius)
| Priority | Service(s) | Why First | Test Focus |
|---|---|---|---|
| P0 | payment-service (stripe, subscriptions, wallet, transaction) | Financial transactions. Errors = money loss. | Payment flows, refund logic, wallet balance, idempotency |
| P0 | purchase-workflow | BPM migration (ADR-013). Most complex state transitions. | All state transitions, timer events, error recovery |
| P1 | identity-service (celebrity, fan, users) | 28+ services depend on identity data. | Profile CRUD, Keycloak token validation, role mappings |
| P1 | shoutout-service + shoutout-bpm | Revenue-generating workflow with external integrations (Mux, FFmpeg). | Full shoutout lifecycle, video processing callbacks |
| P1 | inventory-service | Cross-cutting hub — 5 domain dependencies. | Product CRUD, availability checks, domain event publishing |
| P2 | notification-service (email, sms, notifications) | Delivery pipeline. | Message routing, template rendering, delivery status |
| P2 | content-service, webinar-service | Content management, Mux/Zoom integrations. | CRUD, external API interactions |
| P3 | All remaining services | Lower blast radius. | Basic CRUD, message handlers |
Testcontainers Strategy
// Shared test infrastructure — PostgreSQL + RabbitMQ + Redis
@Testcontainers
@SpringBootTest
class PaymentServiceIntegrationTest {
@Container
static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:16");
@Container
static RabbitMQContainer rabbit = new RabbitMQContainer("rabbitmq:3.12-management");
@Container
static GenericContainer<?> redis = new GenericContainer<>("redis:7-alpine");
}
Each service’s integration tests spin up real PostgreSQL, RabbitMQ, and Redis containers. Flyway migrations run against the test database. No mocking of infrastructure — tests validate actual behavior.
Agentic Test Generation Strategy
Agentic coding changes the test generation calculus:
- Agent reads existing service code — GraphQL schemas, entity models, message handlers, business logic
- Agent generates test scaffolding — unit tests for all public methods, integration tests for all repository/message handler methods
- Agent generates BDD scenarios — from GraphQL schema + business rules → Given/When/Then
- Human reviews — validates test assertions match actual business requirements
- Agent iterates — fixes failing tests, adds edge cases, improves coverage
This is the highest-ROI use of agentic coding: generating comprehensive test suites from existing code is mechanical work that agents excel at.
CI Gate Enforcement
| Gate | Threshold | When |
|---|---|---|
| Unit test pass | 100% | Every PR |
| Integration test pass | 100% | Every PR |
| Line coverage (new code) | 80% | Every PR |
| Line coverage (overall) | 60% initial → 80% target | Gradual enforcement |
| BDD scenario pass | 100% | Every PR touching covered features |
Coverage Tooling
- JaCoCo for Java code coverage (already in Maven ecosystem)
- SonarQube or Codecov for coverage reporting and PR checks
- GitHub Actions reusable workflow for test execution and coverage enforcement
Hypothesis Background
Primary: A tiered testing strategy prioritized by blast radius, combined with agentic test generation, can achieve 80% coverage on critical services before migration begins.
- Evidence: Near-zero test coverage confirmed across all services (L2 — H7 falsified)
- Evidence: All services use consistent patterns (core-lib, GraphQL, RabbitMQ) making test generation systematic (L1 — H13)
- Evidence: Testcontainers is the de facto standard for Spring Boot integration testing (L1)
- Evidence: Agentic coding can generate test scaffolding from existing code (L1 — demonstrated in other projects, not tested on this codebase)
Alternative 1: Write tests only during migration (test as you touch). - Partially accepted: this is the long-tail strategy for P2/P3 services. But P0 services (payment, purchase-workflow) need tests BEFORE migration.
Alternative 2: E2E tests only (skip unit/integration). - Rejected: E2E tests are slow, flaky, and don’t pinpoint failures. The pyramid exists for a reason.
Falsifiability Criteria
- If agentic test generation produces >30% false-positive tests (tests that pass but don’t actually validate behavior) → manual test writing required for critical paths
- If Testcontainers startup time exceeds 60s per test class → evaluate shared container strategy or TestNG parallel execution
- If 80% coverage target on payment services takes >2 sprints → reduce target to 60% and focus on critical path coverage only
- If CI gate enforcement blocks >50% of PRs in the first month → start with warning-only mode and gradually enforce
Evidence Quality
| Evidence | Assurance |
|---|---|
| Near-zero test coverage | L2 (H7 falsified — verified across multiple services) |
| Consistent service patterns | L1 (H13 — core-lib, GraphQL, RabbitMQ) |
| Testcontainers works with Spring Boot 3.x | L1 (documented, widely adopted) |
| JaCoCo integrates with Maven | L2 (standard Java tooling) |
| Agentic test generation effectiveness | L0 (unproven on this codebase) |
Overall: L0 (WLNK capped by agentic test generation effectiveness L0)
Bounded Validity
- Scope: All Gen 2 backend services. Frontend testing strategy is separate (Playwright for E2E, Jest/Vitest for unit).
- Expiry: Re-evaluate coverage targets after 6 months of data on actual defect rates.
- Review trigger: If agentic test generation proves ineffective. If Testcontainers adds unacceptable CI time.
- Monitoring: Track coverage % per service, CI pipeline duration, defect escape rate (bugs found in staging/production).
Consequences
Positive: - Safety net for all migration and consolidation work - CI gates prevent regression from shipping - BDD scenarios create living documentation of business rules - Agentic test generation leverages AI for highest-ROI mechanical work - Testcontainers validates real infrastructure behavior (not mocks)
Negative: - Significant upfront investment before migration work begins - CI pipeline time increases (Testcontainers startup) - Coverage targets may feel blocking initially - Test maintenance overhead as services evolve
Decision date: 2026-02-01 Review by: 2026-08-01