ADR-023: File Storage Strategy
ADR-023: File Storage Strategy
Status
Proposed — Pending engineering team review
Context
Four services use NFS PersistentVolumeClaims (50Gi each) for file storage: content, media, shoutout, and streaming. NFS is tied to the GKE cluster’s storage provisioner, blocks multi-region deployment, and does not scale cloud-natively. Additionally, some GCS buckets have public access (ADR-020 addresses the security aspect).
Current State
| Component | Current | Gap |
|---|---|---|
| NFS PVCs | 4 × 50Gi (content, media, shoutout, streaming) | Tied to GKE cluster; blocks multi-region |
| GCS usage | Media and celebrity services use GCS for some storage | Inconsistent — some NFS, some GCS |
| Signed URLs | Partial (media service uses GCS signed URLs) | Not consistent across services |
| CDN | None configured | No edge caching for content delivery |
| Upload flow | Direct to service → NFS or GCS | No resumable upload support |
| Content framework | Spring Content (content service) | Filesystem abstraction — can switch backends |
Impact
- NFS PVCs are cluster-bound — cannot migrate between clusters or regions without downtime
- Shared cluster (ADR-002) requires shared storage accessible from all tenant namespaces
- No CDN means all content served from origin (GKE), adding latency for geographically distributed fans
- Mixed NFS/GCS makes the storage model confusing and hard to maintain
Decision
Migrate all file storage from NFS PVCs to Google Cloud Storage (GCS) with signed URLs for access control and Cloud CDN for delivery. Standardize on GCS as the single file storage backend.
Why GCS (Not Alternatives)
| Option | Assessment |
|---|---|
| GCS + Cloud CDN (Recommended) | Already partially used. Cloud-native. Signed URLs for security. CDN for performance. Spring Content supports GCS backend. |
| AWS S3 | Would introduce multi-cloud complexity. No advantage over GCS since platform is GCP-native. |
| MinIO (self-managed S3) | Adds operational overhead for object storage — a non-differentiating capability. |
| Keep NFS + add NFS-to-GCS sync | Band-aid. NFS remains the bottleneck. |
| Filestore (Google managed NFS) | Managed but still NFS semantics. Doesn’t solve CDN or signed URL needs. |
Target Architecture
Upload Flow:
Client → Service → GCS (direct or resumable upload)
└── Private bucket per content type
Download Flow:
Client → Cloud CDN → GCS (signed URL)
└── Edge cache hit (fast)
└── Cache miss → origin (GCS signed URL)
Storage Buckets
| Bucket | Content | Access | TTL |
|---|---|---|---|
{tenant}-content-assets |
Articles, images, documents | Signed URL (1 hour) | Indefinite |
{tenant}-media-uploads |
Raw video uploads (pre-Mux) | Service account only | 30 days (processed by Mux) |
{tenant}-shoutout-videos |
Shoutout recordings | Signed URL (1 hour) | Indefinite |
{tenant}-profile-images |
Celebrity/fan profile images | Signed URL (24 hours) or public | Indefinite |
All buckets: - Private by default (no public access — ADR-020) - Uniform bucket-level access (no object-level ACLs) - Versioning enabled for content-assets and shoutout-videos - Lifecycle rules for media-uploads (delete after 30 days)
Signed URL Strategy
| Parameter | Value |
|---|---|
| Default TTL | 1 hour |
| Profile images | 24 hours (frequently accessed, low sensitivity) |
| Signing method | V4 signing with service account key |
| CDN integration | Cloud CDN caches response (honors Cache-Control) |
// Signed URL generation in service layer
public String generateSignedUrl(String bucketName, String objectPath) {
BlobInfo blobInfo = BlobInfo.newBuilder(bucketName, objectPath).build();
return storage.signUrl(blobInfo, 1, TimeUnit.HOURS,
Storage.SignUrlOption.withV4Signature());
}
Migration Approach
- Content service (Spring Content) — switch backend
from filesystem to GCS (Spring Content supports this via
spring-content-gcs) - Media service — already partially on GCS; complete migration of any NFS-resident files
- Shoutout service — migrate video storage from NFS to GCS; update FFmpeg pipeline to write to GCS
- Remove NFS PVCs — delete PersistentVolumeClaims after data migration verified
- Add Cloud CDN — configure CDN in front of GCS for content delivery
- Update frontend — replace any hardcoded NFS-based URLs with signed URL API calls
Data Migration
| Source | Destination | Strategy |
|---|---|---|
| NFS content PVC | {tenant}-content-assets bucket |
gsutil -m rsync from NFS mount |
| NFS media PVC | {tenant}-media-uploads bucket |
gsutil -m rsync (active files only) |
| NFS shoutout PVC | {tenant}-shoutout-videos bucket |
gsutil -m rsync from NFS mount |
| Existing GCS (scattered) | Consolidated per-tenant buckets | gsutil -m cp with path mapping |
Hypothesis Background
Primary: GCS with signed URLs and Cloud CDN provides better performance, security, and scalability than NFS PVCs while reducing operational complexity.
- Evidence: 4 NFS PVCs tied to GKE cluster storage (L2 — verified in Helm charts)
- Evidence: Media service already uses GCS signed URLs (L1 — code analysis)
- Evidence: Spring Content supports GCS backend (L1 — documented)
- Evidence: NFS blocks multi-region deployment (L2 — architectural constraint)
Alternative: Google Filestore (managed NFS). - Not rejected permanently — if services require POSIX filesystem semantics (e.g., FFmpeg local file processing), Filestore could serve as intermediate storage before GCS upload.
Falsifiability Criteria
- If signed URL generation adds >50ms latency per request → batch/cache signed URLs
- If Cloud CDN cache hit rate <60% for content → evaluate content access patterns (may not benefit from CDN)
- If FFmpeg in shoutout service cannot read from GCS directly → use Filestore as processing scratch space
- If GCS storage costs exceed NFS equivalent by >2x → evaluate storage class (Nearline for archival content)
Evidence Quality
| Evidence | Assurance |
|---|---|
| 4 NFS PVCs in Helm charts | L2 (verified) |
| Media service uses GCS | L1 (code analysis) |
| Spring Content GCS support | L1 (documented) |
| Content sizes and access patterns | L0 (need production data) |
| FFmpeg GCS compatibility | L0 (needs testing — may need FUSE mount) |
Overall: L0 (WLNK capped by unknown content sizes and FFmpeg GCS compatibility)
Bounded Validity
- Scope: All file storage for content, media, shoutout, and streaming services.
- Expiry: Re-evaluate if content delivery patterns change significantly (e.g., live streaming storage needs).
- Review trigger: If CDN costs are prohibitive. If FFmpeg cannot work with GCS (need Filestore intermediate).
- Monitoring: GCS storage costs, signed URL generation latency, CDN cache hit rate, upload success rate.
Consequences
Positive: - Cloud-native storage — no cluster-bound NFS dependency - Enables multi-region deployment - Signed URLs enforce access control at the storage layer - Cloud CDN reduces content delivery latency - Simplifies shared cluster migration (no NFS PVC sharing needed)
Negative: - Data migration effort (rsync from NFS to GCS) - Signed URL generation adds per-request overhead - CDN adds cost (per-GB egress + cache fill) - FFmpeg pipeline may need modification for GCS
Decision date: 2026-02-01 Review by: 2026-08-01