Scalability Patterns

For: Architects designing large-scale deployments
Level: Advanced
Time to read: 35 minutes
Patterns: 8+ production patterns

This guide covers scaling patterns for multi-tenant systems, distributed workflows, and high-throughput deployments.

Scalability Fundamentals

Scaling Dimensions


┌─────────────────────────────────────┐
│ Vertical Scaling: CPU/Memory        │ (Limited)
├─────────────────────────────────────┤
│ Horizontal Scaling: More pods       │ (Better)
├─────────────────────────────────────┤
│ Multi-region: Geographic            │ (Complex)
├─────────────────────────────────────┤
│ Sharding: Data partitioning         │ (Advanced)
└─────────────────────────────────────┘

Pattern 1: Multi-Tenant Isolation

Database-per-Tenant


# cascade.yaml
tenants:
  - id: acme
    database:
      url: postgres://user:pass@postgres-acme:5432/acme_db
  
  - id: globex
    database:
      url: postgres://user:pass@postgres-globex:5432/globex_db

Benefits:

Complete data isolation
Independent scaling
Simple backups

Drawbacks:

Higher infrastructure cost
More databases to manage

Schema-per-Tenant


-- Single database, schema per tenant
CREATE SCHEMA acme;
CREATE SCHEMA globex;
 
CREATE TABLE acme.orders (...);
CREATE TABLE globex.orders (...);

Benefits:

Single database to manage
Shared infrastructure
Cost-efficient

Drawbacks:

Schema migrations affect all
Less isolation

Pattern 2: Distributed Workflows

Multi-Region Workflows


# Workflow spans multiple regions
states:
  - name: ValidateRegional
    type: Parallel
    branches:
      - name: ValidateUS
        type: Task
        resource: urn:cascade:activity:validate@us-east
        timeout: 10s
      
      - name: ValidateEU
        type: Task
        resource: urn:cascade:activity:validate@eu-west
        timeout: 10s
    
    completion_strategy: ALL
    next: CombineResults

Workflow Sharding


// Shard workflows by customer ID
func ShardWorkflow(customerID string) string {
    hash := fnv.New32a()
    hash.Write([]byte(customerID))
    shardID := hash.Sum32() % 10  // 10 shards
    return fmt.Sprintf("shard-%d", shardID)
}
 
// In CDL
// - name: ProcessOrder
//   type: Task
//   parameters:
//     shard: "{{ workflow.input.customer_id | shard }}"

Pattern 3: High-Throughput Deployment

Pod Replica Strategy


# cascade.yaml
deployments:
  # API Gateway
  api:
    replicas: 3
    resources:
      requests:
        cpu: 500m
        memory: 512Mi
  
  # Activity Workers
  workers:
    replicas: 10
    resources:
      requests:
        cpu: 1000m
        memory: 1Gi
  
  # Workflow Engine
  engine:
    replicas: 5
    resources:
      requests:
        cpu: 500m
        memory: 512Mi

Auto-scaling Rules


autoscaling:
  enabled: true
  
  api:
    min_replicas: 3
    max_replicas: 20
    cpu_threshold: 70%
    memory_threshold: 80%
  
  workers:
    min_replicas: 10
    max_replicas: 50
    activity_queue_depth: 100
    target_queue_depth: 50

Pattern 4: Event-Driven Scaling

Message Queue Scaling


# Kafka/NATS scaling
messaging:
  broker: kafka
  partitions: 50           # Match shard count
  replication: 3           # High availability
  
  topics:
    - name: orders
      partitions: 20       # Scale with throughput
      retention: 24h

Scaling Algorithm:


Queue Depth = Messages Published - Messages Consumed
If Queue Depth > threshold:
  Scale workers up (add N replicas)
If Queue Depth < threshold:
  Scale workers down

Pattern 5: Caching for Scale

Distributed Cache


# Redis cluster for horizontal caching
cache:
  provider: redis-cluster
  replicas: 3
  nodes: 6
  
  partitioning: hash-slot
  eviction_policy: allkeys-lru
  max_memory: 100GB

Cache Topology:


┌─────────┐  ┌─────────┐  ┌─────────┐
│ Node 1  │  │ Node 2  │  │ Node 3  │
│ Slot    │  │ Slot    │  │ Slot    │
│ 0-5460  │  │ 5461-10921  │ 10922-16383
└─────────┘  └─────────┘  └─────────┘
  ↓             ↓             ↓
Single logical cache (16384 slots)

Pattern 6: Data Partitioning

Temporal Sharding


-- Partition orders by date
CREATE TABLE orders_2024_q1 PARTITION OF orders
  FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
 
CREATE TABLE orders_2024_q2 PARTITION OF orders
  FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

Benefits:

Faster queries (smaller tables)
Easier archiving (drop old partitions)
Parallel query execution

Range Sharding


-- Partition by customer ID range
CREATE TABLE orders_shard_0 PARTITION OF orders
  FOR VALUES FROM (0) TO (100000);
 
CREATE TABLE orders_shard_1 PARTITION OF orders
  FOR VALUES FROM (100000) TO (200000);

Pattern 7: Read Replicas

Primary-Replica Setup


database:
  primary:
    host: db-primary.prod.svc.cluster.local
    pool_size: 10
  
  replicas:
    - host: db-replica-1.prod.svc.cluster.local
    - host: db-replica-2.prod.svc.cluster.local
    - host: db-replica-3.prod.svc.cluster.local

Query Routing:


// Route reads to replicas
func GetCustomer(ctx context.Context, id string) (*Customer, error) {
    db := selectReplicaRoundRobin()  // Load balance
    return db.GetCustomer(ctx, id)
}
 
// Route writes to primary
func UpdateCustomer(ctx context.Context, customer *Customer) error {
    return primary.Update(customer)
}

Pattern 8: Async Processing

Decouple with Events


# Synchronous workflow
states:
  - name: ProcessOrder
    type: Task
    resource: urn:cascade:activity:heavy_processing
    timeout: 300s        # 5 minutes!
    end: true
 
# ✅ Asynchronous workflow
states:
  - name: QueueOrder
    type: Task
    resource: urn:cascade:activity:queue_processing
    timeout: 5s          # Fast
    next: ReturnResult
  
  - name: ReturnResult
    type: Task
    end: true
  
  # Separate async workflow
  # Consumes from queue, processes in background

Monitoring Scalability

Key Metrics

Metric	Target	Alert
CPU utilization	60-70%	`>80%`
Memory usage	70-80%	`>90%`
Queue depth	`<50`	`>100`
Request latency p99	`<500ms`	`>1s`
Error rate	`<0.1%`	`>0.5%`
Throughput	100+/s	Decreasing

Scaling Alerts


alerts:
  - name: HighQueueDepth
    condition: "queue_depth > 100"
    action: scale_up
    target: "+5 workers"
  
  - name: LowQueueDepth
    condition: "queue_depth < 10 for 5m"
    action: scale_down
    target: "-2 workers"

Best Practices

✅ DO:

Design for horizontal scaling
Shard strategically
Cache aggressively
Monitor metrics
Test at scale
Use async processing
Replicate data

❌ DON’T:

Assume vertical scaling
Over-shard
Cache without invalidation
Ignore latency
Deploy single-pod
Block on external APIs

Updated: October 29, 2025
Version: 1.0
Patterns: 8+ production patterns