ADR-016: CI/CD Pipeline & Security Gates
ADR-016: CI/CD Pipeline & Security Gates
Status
Proposed — Pending engineering team review
Context
Security scanning tools are deployed (Trivy for containers, Qwiet/ShiftLeft for code) but do not fail CI builds. Any container can deploy to GKE without verification. Known vulnerabilities ship to production without blocking. This is P0 tech debt.
Current State
| Component | Current | Gap |
|---|---|---|
| Container scanning | Trivy runs in CI | Does not fail builds on findings |
| Code scanning | Qwiet (ShiftLeft) runs | Does not fail builds on high/critical |
| Binary Authorization | Not configured | Any container image can deploy to GKE |
| Dependency scanning | Not observed | No automated CVE checking for Maven dependencies |
| Secret scanning | Not observed | No detection of leaked credentials in commits |
| CI workflows | 28 GitHub Actions workflows | No reusable workflow pattern — each service has its own |
| Container signing | None | No attestation that images were built by trusted CI |
Decision
Enforce security gates in CI/CD pipeline with a fail-fast approach: block builds on high/critical findings, add Binary Authorization for container deployment, and standardize on reusable GitHub Actions workflows.
Security Gate Pipeline
Code Push
│
├── Secret Scanning (gitleaks) ──── BLOCK if secrets found
├── Dependency Audit (Maven/npm) ── BLOCK if critical CVE
├── Code Scanning (Qwiet) ──────── BLOCK if high/critical
│
├── Build + Test (ADR-015) ─────── BLOCK if tests fail
│
├── Container Build (Jib) ──────── Sign image with cosign
├── Container Scan (Trivy) ─────── BLOCK if critical CVE
│
├── Push to Registry ───────────── Signed image only
│
└── Deploy (ArgoCD) ───────────── Binary Authorization validates signature
Enforcement Thresholds
| Gate | Block On | Allow With Review |
|---|---|---|
| Secret scanning | Any detected secret | Never — always block |
| Dependency CVE | Critical severity | High severity (with documented exception) |
| Code scan (SAST) | Critical + High severity | Medium severity |
| Container CVE | Critical severity | High severity (with documented exception) |
| Test coverage | Below threshold (ADR-015) | N/A |
| Binary Authorization | Unsigned container | Never — always block |
Reusable Workflow Architecture
Standardize all 28 service CI workflows into a single reusable workflow:
# .github/workflows/service-ci.yml (reusable)
# Called by each service with parameters:
# service-name, java-version, test-coverage-threshold
jobs:
security-scan: # gitleaks, dependency audit
build-test: # Maven build, JUnit, JaCoCo
container: # Jib build, Trivy scan, cosign sign
deploy: # ArgoCD sync (with Binary Auth)
Benefits: single place to update security policies, consistent gates across all services, reduced workflow duplication.
Binary Authorization
- Attestor: CI pipeline signs container images with
cosignafter successful build + scan - Policy: GKE Binary Authorization policy requires attestation from CI attestor
- Effect: Only containers built by trusted CI pipeline can deploy to production clusters
- Bypass: Break-glass procedure for emergency deployments (logged and alerted)
Hypothesis Background
Primary: Enforcing security gates in CI/CD prevents known vulnerabilities from reaching production without significantly impacting developer velocity.
- Evidence: Trivy and Qwiet are already running — they just don’t block (L2)
- Evidence: No Binary Authorization means any image can deploy (L2)
- Evidence: GitHub Actions reusable workflows are a proven pattern for standardization (L1)
Alternative: Keep scanning advisory-only, fix findings manually. - Rejected: Advisory-only has been the approach for years with no improvement. The current state proves this doesn’t work.
Falsifiability Criteria
- If security gates block >20% of PRs in the first month → review thresholds, may need gradual rollout
- If Binary Authorization causes deployment delays >5 minutes → optimize signing pipeline
- If false positives from Trivy/Qwiet exceed 30% of blocked builds → tune scanner configuration
Evidence Quality
| Evidence | Assurance |
|---|---|
| Trivy/Qwiet run but don’t block | L2 (verified in CI configurations) |
| No Binary Authorization | L2 (verified in GKE config) |
| Reusable workflows work for multi-service repos | L1 (GitHub docs, common pattern) |
| cosign/Binary Auth integration with GKE | L1 (GCP documentation) |
Overall: L1 (WLNK capped by untested workflow migration and false positive rates)
Bounded Validity
- Scope: All backend services and frontend builds. Infrastructure (Terraform) has separate policy.
- Expiry: Re-evaluate thresholds after 3 months of enforcement data.
- Review trigger: If developer velocity measurably decreases. If false positive rate is unacceptable.
Consequences
Positive: - Known vulnerabilities blocked before production - Container provenance verified via Binary Authorization - Standardized CI across all services (single reusable workflow) - Audit trail for all security findings and exceptions
Negative: - Initial friction as teams adapt to blocking gates - False positives may temporarily slow development - Binary Authorization adds complexity to deployment pipeline - Break-glass procedure needed for emergency deployments
Decision date: 2026-02-01 Review by: 2026-08-01