Scalability Patterns
For: Architects designing large-scale deployments
Level: Advanced
Time to read: 35 minutes
Patterns: 8+ production patterns
This guide covers scaling patterns for multi-tenant systems, distributed workflows, and high-throughput deployments.
Scalability Fundamentals
Scaling Dimensions
┌─────────────────────────────────────┐
│ Vertical Scaling: CPU/Memory │ (Limited)
├─────────────────────────────────────┤
│ Horizontal Scaling: More pods │ (Better)
├─────────────────────────────────────┤
│ Multi-region: Geographic │ (Complex)
├─────────────────────────────────────┤
│ Sharding: Data partitioning │ (Advanced)
└─────────────────────────────────────┘Pattern 1: Multi-Tenant Isolation
Database-per-Tenant
# cascade.yaml
tenants:
- id: acme
database:
url: postgres://user:pass@postgres-acme:5432/acme_db
- id: globex
database:
url: postgres://user:pass@postgres-globex:5432/globex_dbBenefits:
- Complete data isolation
- Independent scaling
- Simple backups
Drawbacks:
- Higher infrastructure cost
- More databases to manage
Schema-per-Tenant
-- Single database, schema per tenant
CREATE SCHEMA acme;
CREATE SCHEMA globex;
CREATE TABLE acme.orders (...);
CREATE TABLE globex.orders (...);Benefits:
- Single database to manage
- Shared infrastructure
- Cost-efficient
Drawbacks:
- Schema migrations affect all
- Less isolation
Pattern 2: Distributed Workflows
Multi-Region Workflows
# Workflow spans multiple regions
states:
- name: ValidateRegional
type: Parallel
branches:
- name: ValidateUS
type: Task
resource: urn:cascade:activity:validate@us-east
timeout: 10s
- name: ValidateEU
type: Task
resource: urn:cascade:activity:validate@eu-west
timeout: 10s
completion_strategy: ALL
next: CombineResultsWorkflow Sharding
// Shard workflows by customer ID
func ShardWorkflow(customerID string) string {
hash := fnv.New32a()
hash.Write([]byte(customerID))
shardID := hash.Sum32() % 10 // 10 shards
return fmt.Sprintf("shard-%d", shardID)
}
// In CDL
// - name: ProcessOrder
// type: Task
// parameters:
// shard: "{{ workflow.input.customer_id | shard }}"Pattern 3: High-Throughput Deployment
Pod Replica Strategy
# cascade.yaml
deployments:
# API Gateway
api:
replicas: 3
resources:
requests:
cpu: 500m
memory: 512Mi
# Activity Workers
workers:
replicas: 10
resources:
requests:
cpu: 1000m
memory: 1Gi
# Workflow Engine
engine:
replicas: 5
resources:
requests:
cpu: 500m
memory: 512MiAuto-scaling Rules
autoscaling:
enabled: true
api:
min_replicas: 3
max_replicas: 20
cpu_threshold: 70%
memory_threshold: 80%
workers:
min_replicas: 10
max_replicas: 50
activity_queue_depth: 100
target_queue_depth: 50Pattern 4: Event-Driven Scaling
Message Queue Scaling
# Kafka/NATS scaling
messaging:
broker: kafka
partitions: 50 # Match shard count
replication: 3 # High availability
topics:
- name: orders
partitions: 20 # Scale with throughput
retention: 24hScaling Algorithm:
Queue Depth = Messages Published - Messages Consumed
If Queue Depth > threshold:
Scale workers up (add N replicas)
If Queue Depth < threshold:
Scale workers downPattern 5: Caching for Scale
Distributed Cache
# Redis cluster for horizontal caching
cache:
provider: redis-cluster
replicas: 3
nodes: 6
partitioning: hash-slot
eviction_policy: allkeys-lru
max_memory: 100GBCache Topology:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ Slot │ │ Slot │ │ Slot │
│ 0-5460 │ │ 5461-10921 │ 10922-16383
└─────────┘ └─────────┘ └─────────┘
↓ ↓ ↓
Single logical cache (16384 slots)Pattern 6: Data Partitioning
Temporal Sharding
-- Partition orders by date
CREATE TABLE orders_2024_q1 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE orders_2024_q2 PARTITION OF orders
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');Benefits:
- Faster queries (smaller tables)
- Easier archiving (drop old partitions)
- Parallel query execution
Range Sharding
-- Partition by customer ID range
CREATE TABLE orders_shard_0 PARTITION OF orders
FOR VALUES FROM (0) TO (100000);
CREATE TABLE orders_shard_1 PARTITION OF orders
FOR VALUES FROM (100000) TO (200000);Pattern 7: Read Replicas
Primary-Replica Setup
database:
primary:
host: db-primary.prod.svc.cluster.local
pool_size: 10
replicas:
- host: db-replica-1.prod.svc.cluster.local
- host: db-replica-2.prod.svc.cluster.local
- host: db-replica-3.prod.svc.cluster.localQuery Routing:
// Route reads to replicas
func GetCustomer(ctx context.Context, id string) (*Customer, error) {
db := selectReplicaRoundRobin() // Load balance
return db.GetCustomer(ctx, id)
}
// Route writes to primary
func UpdateCustomer(ctx context.Context, customer *Customer) error {
return primary.Update(customer)
}Pattern 8: Async Processing
Decouple with Events
# Synchronous workflow
states:
- name: ProcessOrder
type: Task
resource: urn:cascade:activity:heavy_processing
timeout: 300s # 5 minutes!
end: true
# ✅ Asynchronous workflow
states:
- name: QueueOrder
type: Task
resource: urn:cascade:activity:queue_processing
timeout: 5s # Fast
next: ReturnResult
- name: ReturnResult
type: Task
end: true
# Separate async workflow
# Consumes from queue, processes in backgroundMonitoring Scalability
Key Metrics
| Metric | Target | Alert |
|---|---|---|
| CPU utilization | 60-70% | >80% |
| Memory usage | 70-80% | >90% |
| Queue depth | <50 | >100 |
| Request latency p99 | <500ms | >1s |
| Error rate | <0.1% | >0.5% |
| Throughput | 100+/s | Decreasing |
Scaling Alerts
alerts:
- name: HighQueueDepth
condition: "queue_depth > 100"
action: scale_up
target: "+5 workers"
- name: LowQueueDepth
condition: "queue_depth < 10 for 5m"
action: scale_down
target: "-2 workers"Best Practices
✅ DO:
- Design for horizontal scaling
- Shard strategically
- Cache aggressively
- Monitor metrics
- Test at scale
- Use async processing
- Replicate data
❌ DON’T:
- Assume vertical scaling
- Over-shard
- Cache without invalidation
- Ignore latency
- Deploy single-pod
- Block on external APIs
Updated: October 29, 2025
Version: 1.0
Patterns: 8+ production patterns
Last updated on