Platform Capabilities: Infrastructure You Don’t Build

Cascade provides a complete platform stack out-of-the-box. This page shows exactly what infrastructure you avoid building by using CDL instead of imperative code.

Key insight: When you define workflows in CDL, the platform provides durable execution, state management, retry logic, database access, and observability automatically. You write business logic. Platform handles infrastructure.

Overview: What Cascade Eliminates

When you write orchestration in CDL, you get these systems automatically:

Core Services:

Durable execution (Temporal orchestration engine)
State persistence (PostgreSQL with tenant isolation)
Event streaming (NATS JetStream)
Human task management (pause workflows for days or months)
Error handling (automatic retry with exponential backoff)
Policy engines (OPA and DMN decision support)

Infrastructure Services:

Multi-tenant isolation (automatic tenant_id injection)
Secret management (Vault integration)
Observability (OpenTelemetry traces and metrics)
Schema migrations (Atlas declarative migrations)
API gateway (unified REST and gRPC)

Result: 96-98% code reduction compared to equivalent imperative implementation.

Capability Comparison Matrix

Capability	Manual Implementation	Cascade Platform
State machine	1,000+ lines	Automatic (Temporal)
Retry logic	40 lines per scenario	Config (1 line)
Database access	800-1,200 lines	Platform SDK
Schema management	5,000-30,000 LOC/year	50-100 lines HCL
Event system	500+ lines (Kafka)	NATS built-in
Webhook handling	150 lines	Platform handles
Conditional logic	50 lines	8 lines (Choice)
Human tasks	200 lines	10 lines config
Tenant isolation	Manual (error-prone)	Automatic (secure)
API gateway	100+ endpoints	10-20 domain-driven

Overall reduction: 11,340+ lines → 165-500 lines (96-98% less code)

Deep Dive: Core Capabilities

1. Durable State Management

What you avoid building: State machine infrastructure with crash recovery, distributed locking, event sourcing, and transaction management.

Imperative Approach: Build Your Own State Machine (~1,000 lines)


// Traditional Node.js orchestration
class WorkflowStateMachine {
  constructor() {
    this.db = new PostgreSQL();
    this.redis = new Redis();
    this.eventStore = new EventStore();
    this.kafka = new KafkaClient();
  }
  
  async execute(orderId) {
    // Step 1: Persist state before any operation
    const workflowState = {
      orderId,
      workflowId: uuidv4(),
      currentStep: "waiting_for_approval",
      data: {},
      version: 1,
      timestamp: Date.now(),
      status: "RUNNING"
    };
    
    await this.db.saveWorkflowState(workflowState);
    await this.redis.set(`workflow:${workflowState.workflowId}`, JSON.stringify(workflowState));
    
    // Step 2: Create approval task
    const taskId = await this.db.createTask({
      orderId,
      assignedTo: "manager",
      timeout: Date.now() + (24 * 60 * 60 * 1000),  // 24 hours
      type: "APPROVAL_REQUIRED"
    });
    
    // ⚠️ PROBLEM: How do you wait for hours/days/weeks?
    
    // Option A: Polling (BAD - wastes resources)
    while (true) {
      const task = await this.db.getTask(taskId);
      if (task.status === "completed") {
        break;
      }
      await sleep(60000); // Check every minute
      // ❌ Server must stay running continuously
      // ❌ Database hit every minute
      // ❌ Doesn't scale
    }
    
    // Option B: Event queue (COMPLEX - 200+ lines)
    await this.kafka.publish("task.created", {
      orderId,
      taskId,
      workflowId: workflowState.workflowId
    });
    
    // ⚠️ Separate consumer process needed (different codebase!)
    // ⚠️ Must reconstruct exact execution context
    // ⚠️ Need distributed locking to prevent concurrent updates
    // ⚠️ Complex error scenarios (message lost, duplicate delivery)
    
    // Option C: Database triggers (LIMITED - vendor lock-in)
    // ❌ Can't handle complex business logic
    // ❌ Hard to test and debug
    // ❌ PostgreSQL vs MySQL differences
    
    // You need to manually implement:
    // 1. State serialization/deserialization (100+ lines)
    //    - JSON encoding, versioning, schema evolution
    // 2. Event sourcing (200+ lines)
    //    - Event log, event replay, snapshots
    // 3. Crash recovery (150+ lines)
    //    - Detect crashed workflows, resume from last checkpoint
    // 4. Distributed locking (80+ lines)
    //    - Redis locks, deadlock detection, lease renewal
    // 5. Transaction management (120+ lines)
    //    - ACID guarantees, rollback, compensation
    // 6. Timeout handling (60+ lines)
    //    - TTL tracking, timeout callbacks, escalation
    // 7. Retry mechanisms (80+ lines)
    //    - Exponential backoff, error classification
    // 8. Webhook correlation (200+ lines)
    //    - UUID mapping, signature validation
    //
    // = ~990 lines of infrastructure code
  }
  
  // Separate webhook handler (DIFFERENT PROCESS/CODEBASE!)
  async handleTaskComplete(taskId, decision) {
    // ⚠️ How do we resume from the EXACT point?
    
    // 1. Load workflow state from database
    const state = await this.db.getWorkflowState({ taskId });
    if (!state) {
      throw new Error("Workflow state not found - data loss!");
    }
    
    // 2. Reconstruct execution context
    // ⚠️ Need to rebuild variables, local state, call stack
    const context = {
      orderId: state.orderId,
      currentStep: state.currentStep,
      data: state.data
    };
    
    // 3. Handle concurrent updates
    // ⚠️ Need optimistic locking (version checking)
    const lock = await this.redis.acquireLock(`workflow:${state.workflowId}`, 30000);
    if (!lock) {
      throw new Error("Failed to acquire lock");
    }
    
    try {
      // 4. Continue execution
      if (state.currentStep === "waiting_for_approval") {
        if (decision === "APPROVED") {
          await this.processOrder(context);
        } else {
          await this.rejectOrder(context);
        }
      }
      
      // 5. Update state
      await this.db.updateWorkflowState(state.workflowId, {
        currentStep: "completed",
        status: "SUCCESS",
        completedAt: Date.now()
      });
      
    } finally {
      await this.redis.releaseLock(lock);
    }
    
    // Total for webhook handler: ~200 lines
  }
}
 
// Total infrastructure code: ~1,190 lines
// And this doesn't even handle:
// - Process crashes during execution
// - Deployment rollouts (loses in-memory state)
// - Database connection failures
// - Redis unavailability
// - Kafka rebalancing

Production Reality: Most teams give up on durable execution and just use polling loops or database triggers, losing the ability to pause workflows for days/weeks.

Cascade Approach: Zero Infrastructure Code


workflows:
  - name: process-order
    start: WaitForApproval
    
    states:
      - name: WaitForApproval
        type: Task
        resource: "urn:cascade:waitFor:human"
        parameters:
          schema:
            type: object
            properties:
              decision: {type: string, enum: ["APPROVED", "REJECTED"]}
              notes: {type: string}
          assignment:
            role: "Manager"
            # Automatically routes to managers in tenant
          timeout: "7d"  # ← Workflow pauses for 7 DAYS!
        result_path: "$.approval"
        next: CheckDecision
      
      - name: CheckDecision
        type: Choice
        choices:
          - variable: "$.approval.decision"
            string_equals: "APPROVED"
            next: ProcessPayment
        default: NotifyRejection
      
      - name: ProcessPayment
        type: Task
        resource: "urn:cascade:action:process-payment"
        next: Complete

What happens automatically:

State Persistence (Temporal + PostgreSQL):
- Every state transition persisted atomically
- Event sourcing with full audit trail
- Workflow survives process crashes
- Workflow survives Kubernetes pod restarts
- Workflow survives deployments
Pause & Resume:
- Workflow pauses at WaitForApproval
- State saved to database (0 memory consumption)
- Manager can submit approval hours/days/weeks later
- Workflow resumes from exact point
- No polling, no wasted resources
Distributed Locking:
- Temporal ensures only one execution per workflow instance
- No concurrent updates possible
- No deadlocks
Timeout Handling:
- After 7 days, timeout event fires automatically
- Can route to escalation flow
- Declarative configuration
Error Recovery:
- Platform retries transient failures
- Permanent failures trigger compensation
- Full observability via OpenTelemetry

Platform provides automatically:

Durable execution (Temporal engine)
Event sourcing (PostgreSQL with event log)
Automatic recovery after crashes
Resume from exact point when event arrives
No polling (event-driven)
Distributed locking (Temporal handles)
Transaction management (ACID guarantees)
Timeout handling (declarative)
OpenTelemetry tracing (automatic)
Audit trail (compliance-ready)

Infrastructure code required: 0 lines

Performance: Sub-1ms orchestration overhead

Visual Comparison

2. Retry & Resilience

What you avoid building: Exponential backoff logic, error classification, retry attempt tracking, circuit breakers, and fallback strategies.

Imperative approach (40+ lines per scenario):


async function callPaymentAPI(payload) {
  let attempt = 0;
  const maxAttempts = 3;
  let delay = 1000;
  
  while (attempt < maxAttempts) {
    try {
      return await stripe.charge(payload);
    } catch (error) {
      attempt++;
      
      // Classify error manually
      const isTransient = 
        error.code === "NetworkError" ||
        error.code === "ServiceUnavailable";
      
      if (!isTransient || attempt >= maxAttempts) {
        throw error;
      }
      
      // Exponential backoff with jitter
      await sleep(delay);
      delay = Math.min(delay * 2, 10000);
      delay = delay * (1 + Math.random() * 0.2);
    }
  }
}

Cascade approach (12 lines):


- name: ChargePayment
  type: Task
  resource: "urn:cascade:action:stripe.charge"
  parameters:
    amount.$: "$.order.total"
    customer.$: "$.customer.id"
  
  retry:
    - error_equals: ["NetworkError", "ServiceUnavailable", "TimeoutError"]
      max_attempts: 3
      interval_seconds: 1
      backoff_rate: 2.0
      max_interval_seconds: 10
      jitter_strategy: "FULL"
  
  catch:
    - error_equals: ["CardDeclined", "ValidationError"]
      result_path: "$.payment_error"
      next: NotifyPaymentFailed
  
  result_path: "$.payment"
  next: ReserveInventory

Platform handles automatically:

Exponential backoff (with jitter)
Error classification (permanent vs transient)
Retry attempt tracking
Error logging and tracing
Metrics (retry count, success rate)
Circuit breaker (optional)

Code reduction: 70% fewer lines

3. Database Operations & Tenant Isolation

What you avoid building: Manual query construction, connection pooling, tenant isolation checks, error handling, and query logging.

Imperative Approach: Manual Everything (800-1,200 lines per app)


// Traditional Node.js with PostgreSQL
class InventoryService {
  constructor() {
    // Manual connection pool setup (~300 lines)
    this.pool = new PgPool({
      host: process.env.DB_HOST,
      database: process.env.DB_NAME,
      max: 20,
      idleTimeoutMillis: 30000,
      connectionTimeoutMillis: 2000,
    });
  }
  
  async checkStock(productIds, warehouseId, tenantId) {
    // ⚠️ SECURITY RISK: Easy to forget tenant_id
    const query = `
      SELECT 
        product_id, 
        available_quantity,
        reserved_quantity
      FROM inventory
      WHERE product_id = ANY($1)
        AND warehouse_id = $2
        AND tenant_id = $3  -- ← MUST NEVER FORGET THIS!
    `;
    
    let client;
    try {
      // Manual connection management
      client = await this.pool.connect();
      
      const result = await client.query(query, [
        productIds,
        warehouseId,
        tenantId  // ← Must pass everywhere, error-prone
      ]);
      
      return result.rows;
      
    } catch (error) {
      // Manual error handling, retry logic, logging...
      if (error.code === 'CONNECTION_LOST') {
        await sleep(1000);
        return this.checkStock(productIds, warehouseId, tenantId);
      }
      
      if (error.code === 'DEADLOCK') {
        throw new RetryableError("Database deadlock");
      }
      
      // Log error manually
      logger.error('Query failed', { error, query, params });
      throw error;
      
    } finally {
      // Manual connection cleanup
      if (client) client.release();
    }
  }
  
  async reserveInventory(productId, quantity, warehouseId, tenantId) {
    // Another 80+ lines for transaction management...
    const client = await this.pool.connect();
    try {
      await client.query('BEGIN');
      
      // Lock row
      await client.query(`
        SELECT available_quantity 
        FROM inventory 
        WHERE product_id = $1 AND warehouse_id = $2 AND tenant_id = $3
        FOR UPDATE
      `, [productId, warehouseId, tenantId]);
      
      // Update quantity
      await client.query(`
        UPDATE inventory 
        SET reserved_quantity = reserved_quantity + $1
        WHERE product_id = $2 AND warehouse_id = $3 AND tenant_id = $4
      `, [quantity, productId, warehouseId, tenantId]);
      
      await client.query('COMMIT');
    } catch (error) {
      await client.query('ROLLBACK');
      throw error;
    } finally {
      client.release();
    }
  }
}
 
// Every service needs:
// - Connection pooling (~300 lines)
// - Error classification (~200 lines)
// - Retry logic (~150 lines)
// - Logging integration (~100 lines)
// - Transaction management (~200 lines)
// - Deadlock recovery (~80 lines)
// = 1,030+ lines BEFORE business logic

Security vulnerability: Forgetting tenant_id in ONE query exposes all customer data. This has caused major data breaches in production systems.

Cascade Approach 1: Pure CDL (Zero Custom Code) ⭐

For simple queries, use declarative query definitions:


# service.yaml - Define queries declaratively
spec:
  components:
    queries:
      # Read-only query (type-safe)
      - name: check-inventory
        type: sql
        operation: select
        source: |
          SELECT 
            product_id,
            available_quantity,
            reserved_quantity
          FROM inventory
          WHERE product_id = ANY(:productIds)
            AND warehouse_id = :warehouseId
            -- tenant_id is AUTOMATIC (platform injects)
        
        parameters:
          productIds: 
            type: array
            items: { type: uuid }
            required: true
          warehouseId: 
            type: uuid
            required: true
        
        returns:
          type: array
          items:
            type: object
            properties:
              product_id: { type: uuid }
              available_quantity: { type: integer }
              reserved_quantity: { type: integer }

Use in workflow (no custom code needed):


workflows:
  - name: process-order
    states:
      - name: CheckInventory
        type: Task
        resource: "urn:cascade:query:check-inventory"
        parameters:
          productIds.$: "$.order.items[*].product_id"
          warehouseId.$: "$.order.warehouse_id"
        result_path: "$.inventory"
        next: ValidateStock
      
      - name: ValidateStock
        type: Choice
        choices:
          - variable: "$.inventory[?(@.available_quantity < @.reserved_quantity)]"
            is_present: true
            next: OutOfStock
        default: ReserveInventory

What happens automatically:

✓ Tenant isolation (platform injects tenant_id)
✓ Connection pooling (20-100 connections)
✓ Query validation (compile-time checks)
✓ Type safety (parameters validated)
✓ Error categorization (retryable vs permanent)
✓ OpenTelemetry tracing (distributed tracing)
✓ Query logging (structured logs)
✓ Performance metrics (automatic)

Code required: 0 lines of custom code

Cascade Approach 2: SDK with Custom Logic (Escape Hatch)

When you need custom logic, use the Platform SDK:


# Workflow calls custom action
- name: CheckInventory
  type: Task
  resource: "urn:cascade:action:check-inventory-with-logic"
  parameters:
    product_ids.$: "$.order.items[*].product_id"
    warehouse.$: "$.order.warehouse"
  result_path: "$.inventory"
  next: ValidateStock

Custom action (Go with Platform SDK):


package actions
 
import (
    "context"
    cascade "github.com/cascade-platform/sdk-go"
)
 
func CheckInventoryWithLogic(ctx context.Context, input Input) (*Output, error) {
    sdk := cascade.FromContext(ctx)
    
    // ✅ Tenant isolation is AUTOMATIC
    rows, err := sdk.DatabaseQuery(ctx, `
        SELECT 
            product_id, 
            available_quantity,
            reserved_quantity,
            warehouse_id
        FROM inventory
        WHERE product_id = ANY($1) 
          AND warehouse_id = $2
          -- tenant_id is AUTOMATIC (platform injects)
    `, map[string]interface{}{
        "product_ids": input.ProductIDs,
        "warehouse":   input.Warehouse,
        // NO need to pass tenant_id - SDK adds it automatically
    })
    
    if err != nil {
        return nil, err // Platform handles retry/logging
    }
    
    // Custom business logic
    available := make([]Item, 0)
    for _, row := range rows {
        if row["available_quantity"].(int) > row["reserved_quantity"].(int) {
            available = append(available, Item{
                ProductID:  row["product_id"].(string),
                Available:  row["available_quantity"].(int),
                Reserved:   row["reserved_quantity"].(int),
            })
        }
    }
    
    return &Output{
        Items:     available,
        AllAvailable: len(available) == len(input.ProductIDs),
    }, nil
}

Platform SDK provides automatically:

Automatic tenant_id injection (impossible to forget)
Row-level security (RLS) enforcement
Connection pooling (20-100 connections)
Query logging (structured logs)
OpenTelemetry tracing (automatic)
Error categorization (retryable vs permanent)
Query performance metrics
Connection lifecycle management

Code reduction: 81% fewer lines (1,030 → 35 lines) plus security by default

Cascade Approach 3: WASM Runtime (Maximum Performance) 🚀

For ultra-fast execution, compile to WASM:


# Workflow uses WASM action
- name: CheckInventory
  type: Task
  resource: "urn:cascade:action:check-inventory-wasm"
  runtime: wasm  # ← Compiled to WebAssembly
  parameters:
    product_ids.$: "$.order.items[*].product_id"
    warehouse.$: "$.order.warehouse"
  result_path: "$.inventory"

WASM action (Rust compiled to WASM):


// actions/check_inventory.rs
use cascade_wasm_sdk::*;
 
#[cascade_action]
pub fn check_inventory(input: Input) -> Result<Output, Error> {
    // Access platform capabilities via WASM host functions
    let rows = database_query(
        "SELECT product_id, available_quantity, reserved_quantity 
         FROM inventory 
         WHERE product_id = ANY($1) AND warehouse_id = $2",
        &[&input.product_ids, &input.warehouse]
    )?;
    
    // Custom logic (runs in WASM sandbox)
    let available: Vec<Item> = rows.iter()
        .filter(|r| r.available > r.reserved)
        .map(|r| Item {
            product_id: r.product_id.clone(),
            available: r.available,
            reserved: r.reserved,
        })
        .collect();
    
    Ok(Output {
        items: available,
        all_available: available.len() == input.product_ids.len(),
    })
}

Build and deploy:


# Compile Rust to WASM
cargo build --target wasm32-wasi --release
 
# Platform automatically loads and executes
# Cold start: <1ms
# Hot path: <0.1ms

WASM Benefits:

Sub-1ms cold start (vs 200ms for containers)
Near-native performance (no JIT warmup)
Memory isolation (sandboxed execution)
1000+ concurrent instances per node
No Docker overhead (runs in-process)

Performance Comparison

Approach	Cold Start	Hot Path	Memory	Security	Use Case
Pure CDL	0ms (config)	Sub-0.1ms	0 MB	Maximum	Simple queries, 80% of cases
SDK (Go)	100-200ms	1-5ms	10-50 MB	High	Custom logic needed
WASM (Rust)	Sub-1ms	Sub-0.1ms	1-5 MB	Maximum	Performance-critical paths
Container	3-10s	5-50ms	100-500 MB	Medium	Legacy code, complex deps

Visual Comparison

See comprehensive database operations guide →

4. Conditional Logic & Decision Making

Decision priority for conditional logic:

CDL Choice States (1-3 conditions) - Default choice, sub-0.1ms
OPA Policies (5-20 rules) - Complex logic, versioned
DMN Tables (10-100+ rules) - Business analyst authoring

CDL Choice States (Simple Conditions)

For 1-3 simple conditions, use CDL Choice states (fastest):


- name: RouteByAmount
  type: Choice
  choices:
    - variable: "$.expense.amount"
      numeric_less_than: 500
      next: AutoApprove
    
    - variable: "$.expense.amount"
      numeric_less_than: 5000
      next: ManagerApproval
    
    - variable: "$.expense.amount"
      numeric_less_than: 20000
      next: DirectorApproval
  
  default: CFOApproval

Performance: Sub-0.1ms (in-process evaluation)

Supported operators:

Numeric comparisons (Equals, LessThan, GreaterThan, etc.)
String comparisons
Boolean equals
Timestamp comparisons
Logical operators (And, Or, Not)
Type checks (IsPresent, IsNull, IsNumeric, etc.)

OPA Policy (Complex Rules)

For 5-20 rules with complex logic:


package cascade.expense

default approver = "manager"
default requires_approval = true

# Executive privilege
approver = "auto" {
    input.employee.level == "executive"
}

# Amount-based routing
approver = "auto" { input.amount < 500 }
approver = "manager" { input.amount >= 500; input.amount < 5000 }
approver = "director" { input.amount >= 5000; input.amount < 20000 }
approver = "cfo" { input.amount >= 20000 }

# Category overrides
approver = "cfo" {
    input.category == "travel"
    input.amount > 10000
}

Performance: 1-5ms (Redis cached)

When to use:

Complex nested conditions
Business rules requiring versioning
Rules that change frequently
Cross-cutting concerns

DMN Table (Business Analyst Authoring)

For 10-100+ rules in decision table format:


- name: EvaluateExpenseRules
  type: Task
  resource: "urn:cascade:dmn:expense-routing-rules"
  parameters:
    expense_amount.$: "$.expense.amount"
    employee_level.$: "$.employee.level"
    category.$: "$.expense.category"
  result_path: "$.routing_decision"

Decision table (visual editor):

Amount	Employee Level	Category	→ Approver	Requires Approval
< 500	ANY	ANY	auto	false
< 5000	executive	ANY	auto	false
< 5000	ANY	ANY	manager	true
< 20000	ANY	travel	cfo	true
< 20000	ANY	ANY	director	true
>= 20000	ANY	ANY	cfo	true

When to use:

10-100+ rules
Business analysts need to edit
Visual decision table format
Regulatory compliance requirements

5. Event System (NATS Built-in)

What you avoid building: Kafka cluster management, consumer groups, schema registry, DLQ handling, and monitoring infrastructure.

Imperative approach (500+ lines):

Kafka cluster setup and configuration
Topic management
Consumer group coordination
Schema registry
Error handling and dead letter queues
Monitoring (lag, throughput)

Cascade approach (20 lines):

Publish event:


- name: PublishOrderCreated
  type: Task
  resource: "urn:cascade:event:publish"
  parameters:
    type: "com.acme.order.created"
    source: "urn:cascade:workflow:order-processing"
    data:
      order_id.$: "$.order.id"
      customer_id.$: "$.order.customer_id"
      total.$: "$.order.total"
  next: Complete

Declarative routing (no consumer code):


apiVersion: cascade.io/v1
kind: EventRouter
metadata:
  name: order-events
 
spec:
  routing_rules:
    - name: route-high-value
      pattern: "com.acme.order.created"
      filters:
        - "$.data.total > 1000"
      actions:
        - start_process: "fraud-detection"
          parameters:
            order_id: "$.data.order_id"
    
    - name: route-inventory
      pattern: "com.acme.order.*"
      actions:
        - signal_process: "inventory-management"
          signal_name: "order_placed"

NATS provides automatically:

Pub/sub messaging (no Kafka setup)
CloudEvents v1.0 compliance
JSONPath filtering (declarative)
Dead letter queue (automatic)
At-least-once delivery
Distributed tracing (OpenTelemetry)
Hot-reload routing rules (no restart)

Code reduction: 95% fewer lines, no Kafka cluster

6. Webhook Integration

What you avoid building: Webhook URL generation, correlation tracking, state persistence, timeout handling, and signature validation.

Imperative approach challenges:

Separate code paths (initial call and webhook handler)
Manual state management and correlation
Timeout handling missing
Orchestration logic duplicated

Cascade approach:


- name: InitiatePayment
  type: Task
  resource: "urn:cascade:action:stripe.charge"
  parameters:
    amount.$: "$.order.total"
  result_path: "$.payment"
  next: WaitForCallback
 
- name: WaitForCallback
  type: Task
  resource: "urn:cascade:waitFor:webhook"
  parameters:
    schema:
      type: object
      properties:
        transaction_id: {type: string}
        status: {type: string, enum: ["success", "failure"]}
    timeout: "30m"
  result_path: "$.callback"
  catch:
    - error_equals: ["TimeoutError"]
      next: RetryPayment
  next: CheckStatus
 
- name: CheckStatus
  type: Choice
  choices:
    - variable: "$.callback.status"
      string_equals: "success"
      next: ReserveInventory
  default: RefundPayment

Platform provides:

Webhook URL generation (automatic)
Correlation (workflow instance mapping)
State persistence (Temporal)
Resume from exact point
Timeout handling (declarative)
Signature validation (configurable)
Retry on timeout

Code reduction: 90% fewer lines

7. Schema Management (Atlas Declarative Migrations)

What you avoid building: Manual SQL migrations with rollback procedures, schema drift detection, and breaking change management.

Imperative approach (5,000-30,000 LOC annually):


-- Migration 001_create_customers.sql (100 lines)
CREATE TABLE app.customers (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) NOT NULL UNIQUE,
    ...
);
 
-- Migration 002_create_orders.sql (80 lines)
-- Migration 003_add_status_column.sql (50 lines)
-- ... 47 more migrations for a 50-table system

Challenges:

50-100 migrations per year for medium company
100-300 LOC per migration
Manual testing required
Rollback procedures complex
Schema drift detection manual

Cascade approach (50-100 lines total):


schema "app" {}
 
table "customers" {
  schema = schema.app
  column "id" {
    type    = "uuid"
    default = sql("gen_random_uuid()")
  }
  column "email" {
    type = "varchar(255)"
    null = false
  }
  column "status" {
    type    = "varchar(50)"
    default = "active"
  }
  
  primary_key {
    columns = [column.id]
  }
  unique "unique_email" {
    columns = [column.email]
  }
}
 
table "orders" {
  schema = schema.app
  column "id" {
    type    = "uuid"
    default = sql("gen_random_uuid()")
  }
  column "customer_id" {
    type = "uuid"
    null = false
  }
  
  foreign_key "fk_customer" {
    columns     = [column.customer_id]
    ref_columns = [table.customers.column.id]
    on_delete   = "RESTRICT"
  }
}

Platform provides automatically:

Automatic migration generation (SQL created for you)
Breaking change detection (prevents data loss)
Schema drift detection (continuous validation)
Zero-downtime migrations
Query validation at build time
Type generation for all queries
Version tracking and rollback
CI/CD integration

Code reduction: 99% (5,000-30,000 LOC annually → 50-100 lines HCL)

See comprehensive schema management guide →

8. REST API & gRPC: Unified Gateway Architecture

Problem with traditional approach: REST frameworks auto-generate every table as an endpoint, resulting in 100+ endpoints that are hard to document, secure, and evolve.

Cascade approach: Unified API Gateway with flexible endpoint options—auto-generated CRUD for simple cases, domain-driven custom endpoints for complex workflows.

Gateway handles automatically:

JWT validation (via Ory Hydra)
Rate limiting (Redis plus Token Bucket)
Authorization enforcement
Tenant extraction from JWT claims
Idempotency checks
Structured error responses
Request/response logging
OpenTelemetry tracing

Option 1: Auto-Generated CRUD & Search (Simple Resources)

For standard resource operations, declare in dspec:


# resources/orders.dspec.yaml
spec:
  resources:
    - name: Order
      description: Customer order
      type: aggregate
      entity: orders
      
      fields:
        - name: id
          type: uuid
          description: Order ID
        - name: customer_id
          type: uuid
          required: true
        - name: status
          type: string
          enum: ["PENDING", "PROCESSING", "COMPLETED", "CANCELLED"]
          default: "PENDING"
        - name: total
          type: decimal
          minimum: 0
        - name: created_at
          type: timestamp
          readonly: true
      
      # Auto-generate CRUD endpoints
      endpoints:
        create: true
        read: true
        update: true
        delete: true
      
      # Auto-generate search/filtering
      search:
        - field: status
          type: exact
        - field: customer_id
          type: exact
        - field: created_at
          type: range
        - field: total
          type: range
      
      # Auto-generate pagination
      pagination:
        default_limit: 50
        max_limit: 500

Cascade auto-generates these endpoints:


GET    /orders                      # List with search & pagination
POST   /orders                      # Create
GET    /orders/{id}                 # Read
PUT    /orders/{id}                 # Update
DELETE /orders/{id}                 # Delete

Automatically handled:

Input validation (from dspec schema)
Row-level security (tenant isolation)
Optimistic concurrency (versioning)
Audit logging (all changes tracked)
OpenTelemetry tracing

Result: 0 lines of API code for standard CRUD resources

Option 2: Refine & Customize Auto-Generated Endpoints

When you need custom logic, use the refine mechanism:


# resources/orders.dspec.yaml
spec:
  resources:
    - name: Order
      entity: orders
      
      endpoints:
        create: 
          enabled: true
          refine: "urn:cascade:action:validate-order-creation"  # ← Custom logic
        
        read: true  # Standard CRUD
        
        update:
          enabled: true
          refine: "urn:cascade:action:validate-order-update"
          allowed_fields: ["status", "notes"]  # Only these can be updated
        
        delete:
          enabled: false  # Not allowed
      
      # Custom endpoints beyond CRUD
      custom_endpoints:
        - name: "cancel-order"
          method: POST
          path: "/orders/{id}/cancel"
          action: "urn:cascade:action:cancel-order"
          description: "Cancel an order (only if status is PENDING or PROCESSING)"
        
        - name: "ship-order"
          method: POST
          path: "/orders/{id}/ship"
          action: "urn:cascade:action:ship-order"
          description: "Mark order as shipped"

Custom refine action (Go with Platform SDK):


package actions
 
func ValidateOrderCreation(ctx context.Context, order Order) (*Order, error) {
    sdk := cascade.FromContext(ctx)
    
    // Custom validation: check inventory
    inventory, err := sdk.DatabaseQuery(ctx, `
        SELECT available_quantity FROM inventory 
        WHERE product_id = ANY($1)
    `, map[string]interface{}{"product_ids": order.Items})
    
    if err != nil {
        return nil, err
    }
    
    // Custom logic: verify sufficient inventory
    for _, item := range order.Items {
        if item.Quantity > inventory[item.ProductID] {
            return nil, &ValidationError{
                Field: "items",
                Message: "Insufficient inventory",
            }
        }
    }
    
    return &order, nil
}

Result: Auto-generated CRUD + custom business logic where needed

Option 3: MVP Domain-Driven Endpoints (Complex Workflows)

For complex processes requiring orchestration:


# REST API: Domain-driven endpoints only
spec:
  api_endpoints:
    - name: GetAssignedTasks
      method: GET
      path: /tasks
      description: List tasks assigned to current user
      
    - name: CompleteUserTask
      method: POST
      path: /tasks/{id}/complete
      description: Submit human task completion
      parameters:
        - name: id
          type: uuid
          required: true
        - name: body
          type: object
          schema:
            type: object
            properties:
              decision: {type: string}
              notes: {type: string}
    
    - name: QueryProcessInstance
      method: GET
      path: /processes/{id}
      description: Get process execution state and history

When to use each approach:

Use Case	Approach	Reason
Standard CRUD (80% of APIs)	Auto-generated	No code needed, instant
CRUD + validation (15%)	Refine mechanism	Custom logic, auto security
Complex workflows (5%)	Domain-driven	Orchestration required

Rate limiting (applied automatically):


anonymous:
  limit: 100 requests / 15 min
  burst: 20
 
authenticated_user:
  limit: 1,000 requests / 15 min
  burst: 100
 
service_account:
  limit: 50,000 requests / 15 min
  burst: 2,000

gRPC for internal communication:

Service-to-service communication only
Not exposed to external clients
High throughput (1000+ req/s)
Protocol efficiency critical

See API Gateway design →

The Hidden Cost: Complexity Growth Over Time

Research-Backed Reality

Large-scale production orchestration systems don’t stay manageable. Real-world data shows exponential growth in imperative codebases:

Production system complexity:

Windows 10: ~50M lines of code
Google Chrome: ~6.7M lines of code
Linux Kernel: ~27.8M lines of code
Typical Kubernetes Operator: 5,000-15,000 lines
Enterprise BPM Implementation: 30,000-150,000 lines

When you build orchestration with imperative code, your codebase follows an exponential growth curve. Each new feature, edge case, and failure mode requires new code that creates new interdependencies spawning more code.

Cascade separates business logic from infrastructure. Complexity grows linearly (or stays flat) while the platform absorbs the exponential burden.

The Exponential Growth Crisis

Research on microservices and complex systems shows imperative codebases follow a 2.5-3x multiplier per year, while declarative systems stay linear.

The 390x multiplier: By Year 5, imperative code is 390x larger than Cascade’s declarative approach (390,000 vs 1,800 lines).

Year-by-Year Breakdown

Year	Imperative LOC	Cascade LOC	Key Milestone
1	10,000	1,000	MVP viable, both productive
2	25,000 (+150%)	1,200 (+20%)	Divergence begins
3	65,000 (+160%)	1,400 (+17%)	Crisis zone entry
4	156,000 (+140%)	1,600 (+14%)	Imperative unsustainable
5	390,000 (+150%)	1,800 (+13%)	Business impact severe

Where the 390,000 Lines Go

By Year 5, imperative codebase allocation:

Compare to Cascade’s 1,800 lines:

500 lines: CDL workflows
300 lines: OPA policies
200 lines: Template definitions
150 lines: Test specifications
150 lines: Integration connectors
500 lines: Schema and validation

Platform handles the other 388,200 lines automatically.

Year 5 Business Impact Comparison

Imperative (390,000 LOC) - UNMAINTAINABLE:

Deployment: 8-12 hours (high risk)
Team size: 25+ engineers
On-call: 8-person rotation
Incident rate: ~1 critical per week
Mean time to fix: 3-5 days
Feature velocity: 80% slowed
Test burden: 185,000 lines
Technical debt: $2-3M to refactor

Cascade (1,800 LOC) - STILL MAINTAINABLE:

Deployment: 5-10 seconds (zero risk)
Team size: 1-2 engineers
On-call: 1-person rotation
Incident rate: ~1 major per year
Mean time to fix: 30 minutes
Feature velocity: No slowdown
Test burden: 2,000 lines
Technical debt: None

Net savings: $2.2-4.2M over 5 years, 5-8x productivity lift

The Bifurcation Point

Key insight: The bifurcation happens at Year 2-3. That’s when compounding cost becomes unavoidable and when imperative teams stop shipping features.

Why Cascade Stays Flat

Declarative model: Describe what you want, not how to achieve it
Platform absorption: Infrastructure concerns handled by platform
No framework bloat: No need for custom frameworks
Automatic updates: Platform improvements benefit all workflows
Configuration over code: Thresholds and policies change in YAML

Total Cost of Ownership

Summary: Complete Capability Stack

Capability	Imperative	CDL	Platform Provides
State machine	1,000+ lines	0 lines	Temporal
Retry logic	40 lines	12 lines	CSL Interpreter
Database	800-1,200 lines	0 lines	Capability SDK
Schema	5,000-15,000/yr	50-100 lines	Atlas
Event system	500+ lines	20 lines	NATS
Webhooks	150 lines	15 lines	Platform
Conditional	50 lines	8 lines	Choice states
Human tasks	200 lines	10 lines	WaitForInput
TOTAL	~11,340+ lines	~165-500 lines	Platform

Overall reduction: 96-98% less code

Plus automatic security, observability, disaster recovery, multi-tenancy, and compliance built-in.

Next Steps

See It in Action

Complete production order processing example

Platform SDK Guide

Learn to use storage, cache, and events

Choice States Reference

Complete conditional logic syntax

Architecture

How platform capabilities are implemented