Skip to Content
CapabilitiesProcess Orchestration

Process Orchestration: Durable Workflows at Scale

Status: MVP (v1.0.0) | Maturity: Production-Ready | Tests: 5
Throughput: 1000+ workflows/sec | Latency: P95 <10s | SLO: 99.5%

Cascade Platform orchestrates business processes using Apache Temporal, enabling durable, fault-tolerant workflows that survive infrastructure failures.


What is Process Orchestration?

Process orchestration is the ability to coordinate complex, multi-step workflows that may span minutes, hours, or days. Unlike traditional workflows that live in memory, Cascade’s orchestration provides:

  • Durable Execution: State persists to PostgreSQL; workflow survives server crashes
  • Automatic Retries: Failed activities retry with exponential backoff
  • Event Sourcing: Complete audit trail of all state transitions
  • Human Tasks: Pause for manual approval, then resume automatically
  • Distributed Execution: Activities run on any worker; scale independently
  • Timeout Handling: Configure timeouts per activity or workflow

Example: A loan approval workflow that:

  1. Collects application (HumanTask)
  2. Runs credit check (Activity)
  3. Evaluates policy (OPA decision)
  4. Routes to manager if >$100K (Choice)
  5. Waits for approval (HumanTask)
  6. Disburses funds (Activity)

Architecture: CSL → Temporal → State

The Execution Pipeline

┌─────────────────────────────────────────────────────────┐ │ 1. CDL Application (YAML) │ │ ├── workflows: [definition] │ │ ├── states: [task, choice, humantask, etc] │ │ └── activities: [Go functions] │ └──────────────┬──────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────┐ │ 2. CSL Interpreter (Validation + Compilation) │ │ ├── Parse CDL │ │ ├── Resolve URNs (policies, schemas, activities) │ │ ├── Validate state transitions │ │ └── Compile to Temporal Workflow │ └──────────────┬──────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────┐ │ 3. Temporal Server (Orchestration) │ │ ├── Schedule workflow execution │ │ ├── Maintain execution history │ │ ├── Route activities to workers │ │ └── Handle failures & retries │ └──────────────┬──────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────┐ │ 4. PostgreSQL (State Store) │ │ ├── Persist workflow state │ │ ├── Store activity results │ │ ├── Maintain execution timeline │ │ └── Enable audit trail queries │ └──────────────┬──────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────┐ │ 5. Your Application │ │ ├── Reads completed workflows │ │ ├── Shows status to users │ │ └── Triggers downstream actions │ └─────────────────────────────────────────────────────────┘

State Types & Execution

Cascade supports 7 state types for different orchestration patterns:

1. Task State (Activity Execution)

Purpose: Execute a Go function (activity)

When to use: Calling external systems, business logic, data processing

CDL Syntax:

- name: CheckCredit type: Task resource: urn:cascade:activity:credit_check parameters: applicant_id: "{{ workflow.input.applicant_id }}" threshold: 750 result: $.credit_score retries: max_attempts: 3 backoff: initial_interval: 1s max_interval: 60s multiplier: 2 timeout: 30s next: EvaluateEligibility

Execution:

┌──────────────┐ │ Enter Task │ └──────┬───────┘ ┌──────────────┐ │ Call Activity│ (may fail) └──────┬───────┘ ├─ Success → Store result in context ├─ Failure → Retry (up to max_attempts) └─ Max retries exceeded → Error path or end

Performance:

  • Activity execution: 1-1000ms (depends on your code)
  • Retry delays: 1s → 2s → 4s → 8s (exponential)
  • Total timeout: 30 seconds (fail if not complete)

Example Activity (Go):

package activities import "context" // CreditCheckInput is the activity input type CreditCheckInput struct { ApplicantID string `json:"applicant_id"` Threshold int `json:"threshold"` } // CreditCheckOutput is the result type CreditCheckOutput struct { Score int `json:"score"` Eligible bool `json:"eligible"` Message string `json:"message"` } // CheckCredit calls credit bureau API func CheckCredit(ctx context.Context, input *CreditCheckInput) (*CreditCheckOutput, error) { // Call external credit service creditAPI := getCreditServiceClient() score, err := creditAPI.Query(ctx, input.ApplicantID) if err != nil { return nil, err // Temporal will retry } return &CreditCheckOutput{ Score: score, Eligible: score >= input.Threshold, Message: fmt.Sprintf("Credit score: %d", score), }, nil }

2. Choice State (Conditional Branching)

Purpose: Route execution based on conditions

When to use: Different paths based on data (approvals, rejections, escalations)

CDL Syntax:

- name: EvaluateEligibility type: Choice choices: - condition: "{{ $.credit_score >= 750 }}" next: FastTrackApproval description: "Excellent credit" - condition: "{{ $.credit_score >= 650 }}" next: ManagerReview description: "Good credit, needs review" - condition: "{{ $.amount <= 10000 }}" next: AutoApproval description: "Small amount, auto-approve" default: Rejection

Performance:

  • Evaluation: <0.1ms (usually 1-5 microseconds)
  • Can evaluate 100K+ choices/second

Routing Matrix:

Score >= 750 and Amount > 10000 → FastTrackApproval Score >= 650 and Amount > 10000 → ManagerReview Score >= 650 and Amount <= 10000 → AutoApproval Score < 650 → Rejection (default)

3. HumanTask State (Pause for Input)

Purpose: Wait for human approval/input

When to use: Manual approvals, user confirmation, information collection

CDL Syntax:

- name: ManagerReview type: HumanTask description: "Loan requires manager approval" ui: schema: urn:cascade:schema:loan_approval_form target: appsmith # or rjsf, echarts, tanstack assignee: role: loan_manager tags: - department: "{{ $.department }}" timeout: 24h timeoutAction: ESCALATE_TO_DIRECTOR next: DocumentSignature

Execution:

┌────────────────────────────────────────┐ │ 1. Create Task │ │ - Generate form (JSON Schema) │ │ - Assign to user/role │ │ - Send notification │ └──────────────┬───────────────────────┘ ┌────────────────────────────────────────┐ │ 2. Wait (workflow paused) │ │ - State persisted in PostgreSQL │ │ - Temporal maintains coroutine │ │ - No resources consumed │ └──────────────┬───────────────────────┘ ┌────────────────────────────────────────┐ │ 3. Human Action │ │ - User approves/rejects in UI │ │ - Form submission → Temporal signal │ └──────────────┬───────────────────────┘ ┌────────────────────────────────────────┐ │ 4. Resume Workflow │ │ - Load state from PostgreSQL │ │ - Proceed to next state │ │ - Continue execution │ └────────────────────────────────────────┘

Handling the Response:

// User submits form in UI // Backend receives approval: POST /workflows/{workflowID}/signal { "action": "approve", "form_data": { "approval_notes": "Looks good", "approved_by": "manager@company.com" } } // Temporal resumes workflow with this data // Available in context: $.human_task_result.form_data

Timeout Handling:

timeout: 24h # Wait max 24 hours timeoutAction: ESCALATE_TO_DIRECTOR # Auto-escalate if timeout

4. Parallel State (Multi-Branch Execution)

Purpose: Execute multiple activities simultaneously

When to use: Independent operations, reducing total workflow time

CDL Syntax:

- name: ProcessApplicationDocuments type: Parallel branches: - name: VerifyIdentity type: Task resource: urn:cascade:activity:verify_identity parameters: document_id: "{{ $.applicant.id_number }}" result: $.identity_check - name: CheckCriminalRecord type: Task resource: urn:cascade:activity:criminal_check parameters: name: "{{ $.applicant.name }}" result: $.criminal_check - name: VerifyIncome type: Task resource: urn:cascade:activity:income_verification parameters: applicant_id: "{{ $.applicant_id }}" result: $.income_check completion_strategy: ALL # Wait for all branches next: CombineResults

Execution Timeline:

Sequential (3 tasks × 10s each = 30s): ├─ VerifyIdentity: [==========] 10s ├─ CheckCriminalRecord: [==========] 10s └─ VerifyIncome: [==========] 10s ▶ 30s total Parallel (3 tasks simultaneous): ├─ VerifyIdentity: [==========] ├─ CheckCriminalRecord: [==========] └─ VerifyIncome: [==========] ▶ 10s total (3x faster!)

Combining Results:

// After parallel tasks complete, results available: $.identity_check // Result from branch 1 $.criminal_check // Result from branch 2 $.income_check // Result from branch 3 // All in context, can now merge: - name: CombineResults type: Task resource: urn:cascade:activity:combine_verification_results parameters: identity: "{{ $.identity_check }}" criminal: "{{ $.criminal_check }}" income: "{{ $.income_check }}"

Completion Strategies:

completion_strategy: ALL # Wait for all (default) completion_strategy: ANY # Proceed when any completes completion_strategy: N_OF_M # Wait for specific count n: 2 # Proceed when 2 complete

5. Wait State (Timed Pause)

Purpose: Delay workflow execution

When to use: Rate limiting, scheduled operations, cooldown periods

CDL Syntax:

- name: CooldownBeforeRetry type: Wait duration: 1h next: RetryApproval # Alternative: Wait until specific time - name: WaitUntilNextDay type: Wait until: "{{ now | addDate 1 day | startOfDay }}" next: CheckForUpdates

Performance:

  • Zero cost: No polling or timers; Temporal handles durably
  • Survives server restarts
  • Can wait microseconds to years

6. Receive State (Event Waiting)

Purpose: Wait for external events (webhooks, messages)

When to use: Integrations, payment confirmations, third-party APIs

CDL Syntax:

- name: AwaitPaymentConfirmation type: Receive event_name: payment_confirmed timeout: 5m timeoutAction: RETRY_PAYMENT next: SendConfirmation

Event Flow:

Workflow waits: ┌─────────────────────────────┐ │ AwaitPaymentConfirmation │ │ (listening for event) │ └─────────────────────────────┘ │ (External system: payment processor) ├─ POST /webhooks/events │ event: payment_confirmed │ payment_id: pay-123 ┌─────────────────────────────┐ │ Event matched! Resume with │ │ event data in context │ └─────────────────────────────┘

7. EvaluatePolicy State (Business Rules)

Purpose: Execute policy evaluation (OPA, DMN, etc)

When to use: Complex business rules, compliance checks

CDL Syntax:

- name: CheckCompliancePolicy type: EvaluatePolicy resource: urn:cascade:policy:loan_approval_rules parameters: amount: "{{ $.amount }}" customer_tier: "{{ $.customer.tier }}" previous_defaults: "{{ $.customer.default_count }}" result: $.policy_decision next: ApplyPolicyOutcome

OPA Policy Example:

package loan_policies # Approve if credit score excellent and amount reasonable allow { input.credit_score >= 750 input.amount <= 100000 } # Require escalation for large amounts require_escalation { input.amount > 100000 input.credit_score >= 650 } # Deny if high risk deny { input.credit_score < 600 } deny { input.previous_defaults > 2 }

Advanced Features

1. Retries & Exponential Backoff

Automatic retry with intelligent backoff:

- name: CallUnreliableAPI type: Task resource: urn:cascade:activity:external_api_call retries: max_attempts: 5 # Retry up to 5 times backoff: initial_interval: 1s # Start with 1 second max_interval: 60s # Cap at 60 seconds multiplier: 2 # Double each time timeout: 30s next: ProcessResponse

Retry Timeline:

Attempt 1: 0s [Call API] ❌ Failed Wait 1s Attempt 2: 1s [Call API] ❌ Failed Wait 2s Attempt 3: 3s [Call API] ❌ Failed Wait 4s Attempt 4: 7s [Call API] ❌ Failed Wait 8s Attempt 5: 15s [Call API] ✅ Success Continue workflow...

2. Context & State Management

Workflow context is updated at each state:

// Input context { "workflow_id": "wf-abc123", "input": { "applicant_id": "alice", "amount": 50000 } } // After CreditCheck task: { "workflow_id": "wf-abc123", "input": { ... }, "credit_score": 720 // ← Added by task } // After EvaluateEligibility choice: { "workflow_id": "wf-abc123", "input": { ... }, "credit_score": 720, "routing_decision": "FastTrack" // ← Added by choice } // After ManagerReview HumanTask: { "workflow_id": "wf-abc123", "input": { ... }, "credit_score": 720, "routing_decision": "FastTrack", "approval_notes": "Looks good", "approved_by": "manager@company.com" // ← Added by human }

3. Compensation & Error Handling

Rollback failed operations:

- name: ChargeCard type: Task resource: urn:cascade:activity:charge_credit_card parameters: amount: "{{ $.total }}" compensation: - name: RefundCard type: Task resource: urn:cascade:activity:refund_credit_card parameters: charge_id: "{{ $.charge_id }}" next: SendShipping

If SendShipping fails:

1. SendShipping fails 2. Temporal invokes compensation chain (reverse order) 3. RefundCard executes (undoes ChargeCard) 4. Workflow ends or goes to error handler

Real-World Example: Insurance Claims Workflow

Complete CDL for multi-week insurance claim:

apiVersion: cascade.io/v1 kind: Application metadata: name: insurance-claims namespace: finance spec: workflows: - name: ProcessClaim description: "End-to-end insurance claim processing" start: InitializeClaim states: # 1. Validate claim - name: InitializeClaim type: Task resource: urn:cascade:activity:validate_claim parameters: claim_number: "{{ workflow.input.claim_id }}" policy_id: "{{ workflow.input.policy_id }}" result: $.claim_data timeout: 10s next: RequestDocumentation # 2. Request documents from claimant - name: RequestDocumentation type: HumanTask description: "Claimant uploads supporting documents" ui: schema: urn:cascade:schema:claim_documentation_form target: appsmith timeout: 7d timeoutAction: CANCEL_CLAIM next: VerifyDocuments # 3. Verify all documents in parallel - name: VerifyDocuments type: Parallel branches: - name: AuthenticateDocument1 type: Task resource: urn:cascade:activity:verify_document_authenticity - name: CheckMedicalRecords type: Task resource: urn:cascade:activity:check_medical_records - name: VerifyDamagePhotos type: Task resource: urn:cascade:activity:verify_damage_photos completion_strategy: ALL next: EvaluatePolicy # 4. Evaluate underwriting policy - name: EvaluatePolicy type: EvaluatePolicy resource: urn:cascade:policy:claim_approval_criteria parameters: claim_amount: "{{ $.claim_data.amount }}" damage_severity: "{{ $.verification.damage_level }}" policy_type: "{{ $.claim_data.policy_type }}" result: $.policy_evaluation next: RouteClaim # 5. Route based on amount - name: RouteClaim type: Choice choices: - condition: "{{ $.policy_evaluation.auto_approve }}" next: ProcessApproval description: "Auto-approval by policy" - condition: "{{ $.claim_data.amount > 50000 }}" next: ManagerReview description: "Large claims need manager" - condition: "{{ $.policy_evaluation.investigation_required }}" next: InitiateInvestigation description: "Suspicious claim" default: ProcessRejection # 6a. Auto-approval path - name: ProcessApproval type: Task resource: urn:cascade:activity:approve_claim parameters: claim_id: "{{ workflow.input.claim_id }}" amount: "{{ $.claim_data.amount }}" result: $.approval_reference next: SendPayment # 6b. Manager review path - name: ManagerReview type: HumanTask description: "Manager reviews large claim" ui: schema: urn:cascade:schema:claim_review_form target: appsmith assignee: role: claims_manager timeout: 3d next: ReviewDecision - name: ReviewDecision type: Choice choices: - condition: "{{ $.manager_decision == 'approve' }}" next: ProcessApproval - condition: "{{ $.manager_decision == 'request_more_info' }}" next: RequestAdditionalInfo default: ProcessRejection - name: RequestAdditionalInfo type: HumanTask ui: schema: urn:cascade:schema:additional_info_request target: appsmith timeout: 7d next: ProcessApproval # 6c. Investigation path - name: InitiateInvestigation type: Task resource: urn:cascade:activity:create_investigation parameters: claim_id: "{{ workflow.input.claim_id }}" investigation_type: "{{ $.policy_evaluation.investigation_type }}" result: $.investigation_reference next: WaitForInvestigation - name: WaitForInvestigation type: Wait duration: 14d next: ProcessInvestigationResults - name: ProcessInvestigationResults type: Task resource: urn:cascade:activity:get_investigation_results parameters: investigation_id: "{{ $.investigation_reference }}" result: $.investigation_result next: ProcessApprovalOrRejection - name: ProcessApprovalOrRejection type: Choice choices: - condition: "{{ $.investigation_result.fraud_detected }}" next: ProcessRejection - condition: "{{ $.investigation_result.approved }}" next: ProcessApproval default: ProcessRejection # 7. Payment processing - name: SendPayment type: Task resource: urn:cascade:activity:process_payment parameters: claim_id: "{{ workflow.input.claim_id }}" amount: "{{ $.claim_data.amount }}" payee: "{{ $.claim_data.payee }}" result: $.payment_confirmation retries: max_attempts: 3 backoff: initial_interval: 5s max_interval: 60s multiplier: 2 timeout: 2m next: NotifyClaimant # 8. Rejection - name: ProcessRejection type: Task resource: urn:cascade:activity:reject_claim parameters: claim_id: "{{ workflow.input.claim_id }}" reason: "{{ $.rejection_reason }}" result: $.rejection_reference next: NotifyClaimantRejection # 9. Final notification - name: NotifyClaimant type: Task resource: urn:cascade:activity:send_approval_email parameters: email: "{{ $.claim_data.claimant_email }}" amount: "{{ $.claim_data.amount }}" payment_date: "{{ $.payment_confirmation.date }}" end: true - name: NotifyClaimantRejection type: Task resource: urn:cascade:activity:send_rejection_email parameters: email: "{{ $.claim_data.claimant_email }}" reason: "{{ $.rejection_reason }}" end: true # Activities (Go functions) activities: - name: validate_claim description: "Validate claim format and policy" urn: urn:cascade:activity:validate_claim - name: verify_document_authenticity description: "Verify document is authentic" urn: urn:cascade:activity:verify_document_authenticity # ... more activities # Policies policies: - name: claim_approval_criteria description: "Automatic claim approval criteria" engine: opa spec: auto_approve: true # if conditions met

Performance Characteristics

Throughput

OperationThroughputNotes
Workflow starts1000+ wf/secLimited by PostgreSQL
Task transitions10K+ ops/secIn-memory
Choice evaluations100K+ ops/secFast comparison
State persistence500+ ops/secDatabase I/O

Latency (Cascading Effects)

ComponentLatencyImpact
Task execution1-1000msYour code time
Choice evaluation<1msNegligible
State persistence5-50msDatabase round-trip
HumanTask waitMinutes to daysHuman time
Temporal overhead<10msScheduling
Total (simple workflow)P95 <10sDominated by activities

Scalability

  • Horizontal: Activities scale independently (add more workers)
  • Vertical: Single workflow throughput limited by Temporal server
  • Storage: 1 year of workflows ≈ 50GB (PostgreSQL)

Monitoring & Observability

Metrics

# Prometheus metrics automatically exported cascade_workflow_duration_seconds # How long workflows take cascade_workflow_failures_total # Failed workflows cascade_task_duration_seconds # Activity execution time cascade_task_retries_total # Retry count cascade_state_transition_duration # State latency

Distributed Tracing

GET /workflows/{id}/trace Returns complete trace: workflow_start: 0ms ├─ enter_state[CreditCheck]: 2ms │ └─ call_activity[check_credit]: 150ms │ └─ external_call: 120ms │ └─ data_parse: 30ms ├─ enter_state[Choice]: 0.5ms ├─ enter_state[ManagerReview]: 1ms │ └─ human_waiting: 3h 45m ├─ resume_from_human: 5ms ├─ enter_state[SendPayment]: 1.5ms │ └─ call_activity[charge_card]: 200ms └─ workflow_end: 0.5ms Total: 3h 45m 20s

Troubleshooting

Workflow Stuck in State

Symptom: Workflow hasn’t progressed in hours

Diagnosis:

cascade process inspect {workflow_id} # Check last state and timestamp

Solutions:

  • HumanTask timeout not configured? Add: timeout: 24h
  • Activity error? Check logs: cascade logs {workflow_id}
  • Send signal to resume: cascade process signal {workflow_id} --action continue

Activity Timeout

Symptom: Activity timeout exceeded

Solution:

- name: SlowAPI type: Task timeout: 5m # Increase if your API needs time retries: max_attempts: 2

State Explosion

Symptom: PostgreSQL fills up quickly

Cause: Too many workflow instances

Solutions:

  • Archive completed workflows: cascade admin archive --older-than 30d
  • Increase retention: Check config.yaml

Best Practices

DO:

  • Use Parallel for independent operations (faster)
  • Set timeouts on all tasks
  • Use Choice for simple routing
  • Monitor via OTEL metrics
  • Archive old workflows

DON’T:

  • Create infinite loops (add safeguards)
  • Store large data in context (use ID references)
  • Poll for events (use Receive state)
  • Retry everything (set reasonable limits)

Next Steps

Ready to build?First Workflow Tutorial

Need to understand policies?Policy Evaluation Capability

Want to visualize workflows?Debugging Guide

Production deployment?Operations Guide


Updated: October 29, 2025
Version: 1.0
Temporal Version: v1.20+
Production-Ready: Yes

Last updated on