Skip to Content
GuidesScalability Patterns

Scalability Patterns

For: Architects designing large-scale deployments
Level: Advanced
Time to read: 35 minutes
Patterns: 8+ production patterns

This guide covers scaling patterns for multi-tenant systems, distributed workflows, and high-throughput deployments.


Scalability Fundamentals

Scaling Dimensions

┌─────────────────────────────────────┐ │ Vertical Scaling: CPU/Memory │ (Limited) ├─────────────────────────────────────┤ │ Horizontal Scaling: More pods │ (Better) ├─────────────────────────────────────┤ │ Multi-region: Geographic │ (Complex) ├─────────────────────────────────────┤ │ Sharding: Data partitioning │ (Advanced) └─────────────────────────────────────┘

Pattern 1: Multi-Tenant Isolation

Database-per-Tenant

# cascade.yaml tenants: - id: acme database: url: postgres://user:pass@postgres-acme:5432/acme_db - id: globex database: url: postgres://user:pass@postgres-globex:5432/globex_db

Benefits:

  • Complete data isolation
  • Independent scaling
  • Simple backups

Drawbacks:

  • Higher infrastructure cost
  • More databases to manage

Schema-per-Tenant

-- Single database, schema per tenant CREATE SCHEMA acme; CREATE SCHEMA globex; CREATE TABLE acme.orders (...); CREATE TABLE globex.orders (...);

Benefits:

  • Single database to manage
  • Shared infrastructure
  • Cost-efficient

Drawbacks:

  • Schema migrations affect all
  • Less isolation

Pattern 2: Distributed Workflows

Multi-Region Workflows

# Workflow spans multiple regions states: - name: ValidateRegional type: Parallel branches: - name: ValidateUS type: Task resource: urn:cascade:activity:validate@us-east timeout: 10s - name: ValidateEU type: Task resource: urn:cascade:activity:validate@eu-west timeout: 10s completion_strategy: ALL next: CombineResults

Workflow Sharding

// Shard workflows by customer ID func ShardWorkflow(customerID string) string { hash := fnv.New32a() hash.Write([]byte(customerID)) shardID := hash.Sum32() % 10 // 10 shards return fmt.Sprintf("shard-%d", shardID) } // In CDL // - name: ProcessOrder // type: Task // parameters: // shard: "{{ workflow.input.customer_id | shard }}"

Pattern 3: High-Throughput Deployment

Pod Replica Strategy

# cascade.yaml deployments: # API Gateway api: replicas: 3 resources: requests: cpu: 500m memory: 512Mi # Activity Workers workers: replicas: 10 resources: requests: cpu: 1000m memory: 1Gi # Workflow Engine engine: replicas: 5 resources: requests: cpu: 500m memory: 512Mi

Auto-scaling Rules

autoscaling: enabled: true api: min_replicas: 3 max_replicas: 20 cpu_threshold: 70% memory_threshold: 80% workers: min_replicas: 10 max_replicas: 50 activity_queue_depth: 100 target_queue_depth: 50

Pattern 4: Event-Driven Scaling

Message Queue Scaling

# Kafka/NATS scaling messaging: broker: kafka partitions: 50 # Match shard count replication: 3 # High availability topics: - name: orders partitions: 20 # Scale with throughput retention: 24h

Scaling Algorithm:

Queue Depth = Messages Published - Messages Consumed If Queue Depth > threshold: Scale workers up (add N replicas) If Queue Depth < threshold: Scale workers down

Pattern 5: Caching for Scale

Distributed Cache

# Redis cluster for horizontal caching cache: provider: redis-cluster replicas: 3 nodes: 6 partitioning: hash-slot eviction_policy: allkeys-lru max_memory: 100GB

Cache Topology:

┌─────────┐ ┌─────────┐ ┌─────────┐ │ Node 1 │ │ Node 2 │ │ Node 3 │ │ Slot │ │ Slot │ │ Slot │ │ 0-5460 │ │ 5461-10921 │ 10922-16383 └─────────┘ └─────────┘ └─────────┘ ↓ ↓ ↓ Single logical cache (16384 slots)

Pattern 6: Data Partitioning

Temporal Sharding

-- Partition orders by date CREATE TABLE orders_2024_q1 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2024-04-01'); CREATE TABLE orders_2024_q2 PARTITION OF orders FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

Benefits:

  • Faster queries (smaller tables)
  • Easier archiving (drop old partitions)
  • Parallel query execution

Range Sharding

-- Partition by customer ID range CREATE TABLE orders_shard_0 PARTITION OF orders FOR VALUES FROM (0) TO (100000); CREATE TABLE orders_shard_1 PARTITION OF orders FOR VALUES FROM (100000) TO (200000);

Pattern 7: Read Replicas

Primary-Replica Setup

database: primary: host: db-primary.prod.svc.cluster.local pool_size: 10 replicas: - host: db-replica-1.prod.svc.cluster.local - host: db-replica-2.prod.svc.cluster.local - host: db-replica-3.prod.svc.cluster.local

Query Routing:

// Route reads to replicas func GetCustomer(ctx context.Context, id string) (*Customer, error) { db := selectReplicaRoundRobin() // Load balance return db.GetCustomer(ctx, id) } // Route writes to primary func UpdateCustomer(ctx context.Context, customer *Customer) error { return primary.Update(customer) }

Pattern 8: Async Processing

Decouple with Events

# Synchronous workflow states: - name: ProcessOrder type: Task resource: urn:cascade:activity:heavy_processing timeout: 300s # 5 minutes! end: true # ✅ Asynchronous workflow states: - name: QueueOrder type: Task resource: urn:cascade:activity:queue_processing timeout: 5s # Fast next: ReturnResult - name: ReturnResult type: Task end: true # Separate async workflow # Consumes from queue, processes in background

Monitoring Scalability

Key Metrics

MetricTargetAlert
CPU utilization60-70%>80%
Memory usage70-80%>90%
Queue depth<50>100
Request latency p99<500ms>1s
Error rate<0.1%>0.5%
Throughput100+/sDecreasing

Scaling Alerts

alerts: - name: HighQueueDepth condition: "queue_depth > 100" action: scale_up target: "+5 workers" - name: LowQueueDepth condition: "queue_depth < 10 for 5m" action: scale_down target: "-2 workers"

Best Practices

DO:

  • Design for horizontal scaling
  • Shard strategically
  • Cache aggressively
  • Monitor metrics
  • Test at scale
  • Use async processing
  • Replicate data

DON’T:

  • Assume vertical scaling
  • Over-shard
  • Cache without invalidation
  • Ignore latency
  • Deploy single-pod
  • Block on external APIs

Updated: October 29, 2025
Version: 1.0
Patterns: 8+ production patterns

Last updated on