ADR-024: Feature Flag Strategy
ADR-024: Feature Flag Strategy
Status
Proposed — Pending engineering team review
Context
The platform has no feature flag system. The only runtime
configuration mechanism is tenant-level toggles in
values-globals.yaml (Helm values), which require a full
deployment to change. This blocks gradual rollouts, canary deployments,
A/B testing, and safe migration rollbacks.
Current State
| Component | Current | Gap |
|---|---|---|
| Feature flags | None | Cannot toggle features at runtime |
| Tenant config | values-globals.yaml per tenant |
Requires Helm deployment to change any value |
| Gradual rollout | Not possible | All-or-nothing deployments |
| A/B testing | Not possible | No user segmentation for features |
| Migration safety | No kill switch | Cannot disable new code paths without rollback |
| Dark launching | Not possible | Cannot deploy code without exposing to users |
Why Now
Service consolidation (ADR-001) and BPM migration (ADR-013) introduce significant risk. Feature flags enable: - Migration rollback without deployment: if Operaton BPM has issues, flag back to old path - Gradual consolidation: merge services but flag individual features on/off - Tenant-specific rollout: test changes on one brand before all four - Safe deployment: deploy code behind flag, enable when validated
Decision
Adopt Unleash (self-hosted) as the feature flag platform, integrated into both backend (Spring Boot) and frontend (Angular) applications. Use tenant-aware evaluation for multi-brand rollouts.
Why Unleash (Not Alternatives)
| Option | Assessment |
|---|---|
| Unleash (Recommended) | Open source, self-hosted (data stays in our infrastructure). Server-side + client-side SDKs. Supports custom strategies including tenant-based targeting. Free for self-hosted. |
| LaunchDarkly | Best-in-class SaaS but per-seat + per-MAU pricing. At 4 tenants with unknown user counts, cost is unpredictable. Evaluate if Unleash proves insufficient. |
| Flagsmith | Open source alternative. Fewer strategy types than Unleash. Smaller community. |
| Custom (Redis/DB-backed) | No SDK support, no audit trail, no gradual rollout strategies. Reinventing existing tools. |
| GCP Feature Flags (Firebase) | Firebase Remote Config is mobile-focused. Not designed for backend flag evaluation. |
Architecture
Unleash Server (platform namespace)
│
├── PostgreSQL (flag definitions, audit log)
│
├── Backend SDK (Unleash Java Client)
│ └── Spring Boot services evaluate flags server-side
│
└── Frontend SDK (Unleash Proxy + React/Angular Client)
└── Unleash Edge proxy for client-side evaluation
└── Angular apps evaluate flags client-side
Flag Taxonomy
| Flag Type | Purpose | Lifecycle | Example |
|---|---|---|---|
| Release | Gate new features | Removed after full rollout | operaton-bpm-enabled |
| Migration | Toggle between old/new code paths | Removed after migration complete | use-consolidated-payment-service |
| Experiment | A/B test variants | Removed after experiment concludes | new-checkout-flow |
| Ops | Runtime operational control | Permanent | maintenance-mode |
| Tenant | Per-brand feature availability | Permanent | nil-game-plan-exclusive-feature |
Evaluation Strategies
| Strategy | Description | Use Case |
|---|---|---|
| Tenant-based | Flag enabled for specific tenants | Roll out to VT NIL first, then others |
| Percentage rollout | Gradual % of users | Enable new checkout for 10% → 25% → 50% → 100% |
| User ID | Specific users | Beta testing with known users |
| Environment | Per environment | Enabled in staging, disabled in production |
| Date-based | Scheduled activation | Enable feature at product launch date |
Tenant-Aware Flag Context
// Spring Boot flag evaluation with tenant context
@Component
public class FeatureFlagService {
private final Unleash unleash;
public boolean isEnabled(String flagName, String tenantId, String userId) {
UnleashContext context = UnleashContext.builder()
.appName("payment-service")
.environment(activeProfile) // dev, staging, prod
.userId(userId)
.addProperty("tenantId", tenantId)
.build();
return unleash.isEnabled(flagName, context);
}
}
Migration Flag Pattern
For service consolidation (ADR-001) and BPM migration (ADR-013):
// Example: payment service consolidation
if (featureFlags.isEnabled("use-consolidated-payment-service", tenantId, userId)) {
// New consolidated payment logic
return consolidatedPaymentService.processPayment(request);
} else {
// Legacy separate service logic
return legacyStripeService.processPayment(request);
}
Flag lifecycle: 1. Deploy code behind flag (default: OFF) 2. Enable in staging environment 3. Canary — enable for 5% of production users 4. Gradual rollout — 25% → 50% → 100% 5. Remove flag — clean up branching code after full rollout confirmed
Unleash Deployment
| Component | Deployment | Resource |
|---|---|---|
| Unleash Server | Kubernetes (platform namespace) | 1 pod, 256Mi RAM |
| Unleash PostgreSQL | Sidecar or shared Cloud SQL | Minimal (flag definitions are small) |
| Unleash Edge | Kubernetes (platform namespace) | 1 pod, 128Mi RAM (for frontend proxying) |
Integration Points
| Layer | SDK | Evaluation |
|---|---|---|
| Spring Boot services | unleash-client-java |
Server-side, cached (10s refresh) |
| Angular frontend | @unleash/proxy-client-react (adapter for Angular) |
Client-side via Unleash Edge |
| BPM (Operaton) | Custom service task delegates | Server-side via FeatureFlagService |
| CI/CD pipeline | Unleash API | Pre-deployment checks (“is flag ready?”) |
Hypothesis Background
Primary: Self-hosted Unleash provides the feature flag capabilities needed for safe migration and gradual rollout without SaaS cost uncertainty.
- Evidence: No feature flag system exists (L2 — verified from code and config)
- Evidence: Tenant config requires Helm deployment to change (L2 —
values-globals.yamlmechanism) - Evidence: Service consolidation and BPM migration need safe rollback (L2 — ADR-001, ADR-013)
- Evidence: Unleash supports custom strategies including tenant-based (L1 — documented)
Alternative 1: LaunchDarkly (managed SaaS). - Not rejected permanently — if Unleash operational overhead is burdensome, LaunchDarkly is the best SaaS option. Evaluate after 6 months of Unleash operation.
Alternative 2: No feature flags — use canary deployments only. - Rejected: Canary deployments only control traffic splitting, not feature branching within code. Cannot toggle specific features independently of deployment.
Falsifiability Criteria
- If Unleash server becomes a single point of failure (flag evaluation fails) → add client-side caching fallback with stale-while-revalidate
- If flag count exceeds 100 active flags → flag management discipline is failing; add lifecycle policy enforcement
- If Unleash adds >5ms latency per flag evaluation → SDK caching is not working; investigate
- If self-hosted Unleash ops overhead exceeds 4 hours/month → evaluate LaunchDarkly migration
Evidence Quality
| Evidence | Assurance |
|---|---|
| No feature flags in platform | L2 (verified from code analysis) |
| Tenant config via Helm values | L2 (verified from values-globals.yaml) |
| Unleash supports tenant-based strategies | L1 (documented, not tested) |
| Unleash Java SDK works with Spring Boot | L1 (documented, widely used) |
| Unleash operational overhead | L0 (unknown — no experience running it) |
Overall: L1 (WLNK capped by unknown Unleash operational overhead)
Bounded Validity
- Scope: All backend services and frontend applications. Infrastructure (Terraform) uses separate mechanisms.
- Expiry: Re-evaluate tool choice after 6 months of operation.
- Review trigger: If Unleash operational overhead is excessive. If flag evaluation reliability is <99.9%. If LaunchDarkly pricing becomes predictable and affordable.
- Monitoring: Flag evaluation count, latency, error rate. Active flag count. Flag age (stale flags needing cleanup).
Consequences
Positive: - Safe migration rollback without deployment - Gradual rollout reduces blast radius of changes - Tenant-specific feature control - A/B testing capability for product decisions - Operational kill switches for incident response - Self-hosted — no per-user SaaS cost
Negative: - New infrastructure to operate (Unleash server) - Flag branching code adds complexity (must clean up after rollout) - Risk of “flag debt” — old flags never removed - Frontend SDK adds small bundle size - Flag evaluation adds per-request overhead (mitigated by SDK caching)
Decision date: 2026-02-01 Review by: 2026-08-01