Process Orchestration: Durable Workflows at Scale
Status: MVP (v1.0.0) | Maturity: Production-Ready | Tests: 5
Throughput: 1000+ workflows/sec | Latency: P95 <10s | SLO: 99.5%
Cascade Platform orchestrates business processes using Apache Temporal, enabling durable, fault-tolerant workflows that survive infrastructure failures.
What is Process Orchestration?
Process orchestration is the ability to coordinate complex, multi-step workflows that may span minutes, hours, or days. Unlike traditional workflows that live in memory, Cascade’s orchestration provides:
- Durable Execution: State persists to PostgreSQL; workflow survives server crashes
- Automatic Retries: Failed activities retry with exponential backoff
- Event Sourcing: Complete audit trail of all state transitions
- Human Tasks: Pause for manual approval, then resume automatically
- Distributed Execution: Activities run on any worker; scale independently
- Timeout Handling: Configure timeouts per activity or workflow
Example: A loan approval workflow that:
- Collects application (HumanTask)
- Runs credit check (Activity)
- Evaluates policy (OPA decision)
- Routes to manager if >$100K (Choice)
- Waits for approval (HumanTask)
- Disburses funds (Activity)
Architecture: CSL → Temporal → State
The Execution Pipeline
┌─────────────────────────────────────────────────────────┐
│ 1. CDL Application (YAML) │
│ ├── workflows: [definition] │
│ ├── states: [task, choice, humantask, etc] │
│ └── activities: [Go functions] │
└──────────────┬──────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 2. CSL Interpreter (Validation + Compilation) │
│ ├── Parse CDL │
│ ├── Resolve URNs (policies, schemas, activities) │
│ ├── Validate state transitions │
│ └── Compile to Temporal Workflow │
└──────────────┬──────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 3. Temporal Server (Orchestration) │
│ ├── Schedule workflow execution │
│ ├── Maintain execution history │
│ ├── Route activities to workers │
│ └── Handle failures & retries │
└──────────────┬──────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 4. PostgreSQL (State Store) │
│ ├── Persist workflow state │
│ ├── Store activity results │
│ ├── Maintain execution timeline │
│ └── Enable audit trail queries │
└──────────────┬──────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 5. Your Application │
│ ├── Reads completed workflows │
│ ├── Shows status to users │
│ └── Triggers downstream actions │
└─────────────────────────────────────────────────────────┘State Types & Execution
Cascade supports 7 state types for different orchestration patterns:
1. Task State (Activity Execution)
Purpose: Execute a Go function (activity)
When to use: Calling external systems, business logic, data processing
CDL Syntax:
- name: CheckCredit
type: Task
resource: urn:cascade:activity:credit_check
parameters:
applicant_id: "{{ workflow.input.applicant_id }}"
threshold: 750
result: $.credit_score
retries:
max_attempts: 3
backoff:
initial_interval: 1s
max_interval: 60s
multiplier: 2
timeout: 30s
next: EvaluateEligibilityExecution:
┌──────────────┐
│ Enter Task │
└──────┬───────┘
│
▼
┌──────────────┐
│ Call Activity│ (may fail)
└──────┬───────┘
│
├─ Success → Store result in context
│
├─ Failure → Retry (up to max_attempts)
│
└─ Max retries exceeded → Error path or endPerformance:
- Activity execution: 1-1000ms (depends on your code)
- Retry delays: 1s → 2s → 4s → 8s (exponential)
- Total timeout: 30 seconds (fail if not complete)
Example Activity (Go):
package activities
import "context"
// CreditCheckInput is the activity input
type CreditCheckInput struct {
ApplicantID string `json:"applicant_id"`
Threshold int `json:"threshold"`
}
// CreditCheckOutput is the result
type CreditCheckOutput struct {
Score int `json:"score"`
Eligible bool `json:"eligible"`
Message string `json:"message"`
}
// CheckCredit calls credit bureau API
func CheckCredit(ctx context.Context, input *CreditCheckInput) (*CreditCheckOutput, error) {
// Call external credit service
creditAPI := getCreditServiceClient()
score, err := creditAPI.Query(ctx, input.ApplicantID)
if err != nil {
return nil, err // Temporal will retry
}
return &CreditCheckOutput{
Score: score,
Eligible: score >= input.Threshold,
Message: fmt.Sprintf("Credit score: %d", score),
}, nil
}2. Choice State (Conditional Branching)
Purpose: Route execution based on conditions
When to use: Different paths based on data (approvals, rejections, escalations)
CDL Syntax:
- name: EvaluateEligibility
type: Choice
choices:
- condition: "{{ $.credit_score >= 750 }}"
next: FastTrackApproval
description: "Excellent credit"
- condition: "{{ $.credit_score >= 650 }}"
next: ManagerReview
description: "Good credit, needs review"
- condition: "{{ $.amount <= 10000 }}"
next: AutoApproval
description: "Small amount, auto-approve"
default: RejectionPerformance:
- Evaluation:
<0.1ms(usually 1-5 microseconds) - Can evaluate 100K+ choices/second
Routing Matrix:
Score >= 750 and Amount > 10000 → FastTrackApproval
Score >= 650 and Amount > 10000 → ManagerReview
Score >= 650 and Amount <= 10000 → AutoApproval
Score < 650 → Rejection (default)3. HumanTask State (Pause for Input)
Purpose: Wait for human approval/input
When to use: Manual approvals, user confirmation, information collection
CDL Syntax:
- name: ManagerReview
type: HumanTask
description: "Loan requires manager approval"
ui:
schema: urn:cascade:schema:loan_approval_form
target: appsmith # or rjsf, echarts, tanstack
assignee:
role: loan_manager
tags:
- department: "{{ $.department }}"
timeout: 24h
timeoutAction: ESCALATE_TO_DIRECTOR
next: DocumentSignatureExecution:
┌────────────────────────────────────────┐
│ 1. Create Task │
│ - Generate form (JSON Schema) │
│ - Assign to user/role │
│ - Send notification │
└──────────────┬───────────────────────┘
│
▼
┌────────────────────────────────────────┐
│ 2. Wait (workflow paused) │
│ - State persisted in PostgreSQL │
│ - Temporal maintains coroutine │
│ - No resources consumed │
└──────────────┬───────────────────────┘
│
▼
┌────────────────────────────────────────┐
│ 3. Human Action │
│ - User approves/rejects in UI │
│ - Form submission → Temporal signal │
└──────────────┬───────────────────────┘
│
▼
┌────────────────────────────────────────┐
│ 4. Resume Workflow │
│ - Load state from PostgreSQL │
│ - Proceed to next state │
│ - Continue execution │
└────────────────────────────────────────┘Handling the Response:
// User submits form in UI
// Backend receives approval:
POST /workflows/{workflowID}/signal
{
"action": "approve",
"form_data": {
"approval_notes": "Looks good",
"approved_by": "manager@company.com"
}
}
// Temporal resumes workflow with this data
// Available in context: $.human_task_result.form_dataTimeout Handling:
timeout: 24h # Wait max 24 hours
timeoutAction: ESCALATE_TO_DIRECTOR # Auto-escalate if timeout4. Parallel State (Multi-Branch Execution)
Purpose: Execute multiple activities simultaneously
When to use: Independent operations, reducing total workflow time
CDL Syntax:
- name: ProcessApplicationDocuments
type: Parallel
branches:
- name: VerifyIdentity
type: Task
resource: urn:cascade:activity:verify_identity
parameters:
document_id: "{{ $.applicant.id_number }}"
result: $.identity_check
- name: CheckCriminalRecord
type: Task
resource: urn:cascade:activity:criminal_check
parameters:
name: "{{ $.applicant.name }}"
result: $.criminal_check
- name: VerifyIncome
type: Task
resource: urn:cascade:activity:income_verification
parameters:
applicant_id: "{{ $.applicant_id }}"
result: $.income_check
completion_strategy: ALL # Wait for all branches
next: CombineResultsExecution Timeline:
Sequential (3 tasks × 10s each = 30s):
├─ VerifyIdentity: [==========] 10s
├─ CheckCriminalRecord: [==========] 10s
└─ VerifyIncome: [==========] 10s
▶ 30s total
Parallel (3 tasks simultaneous):
├─ VerifyIdentity: [==========]
├─ CheckCriminalRecord: [==========]
└─ VerifyIncome: [==========]
▶ 10s total (3x faster!)Combining Results:
// After parallel tasks complete, results available:
$.identity_check // Result from branch 1
$.criminal_check // Result from branch 2
$.income_check // Result from branch 3
// All in context, can now merge:
- name: CombineResults
type: Task
resource: urn:cascade:activity:combine_verification_results
parameters:
identity: "{{ $.identity_check }}"
criminal: "{{ $.criminal_check }}"
income: "{{ $.income_check }}"Completion Strategies:
completion_strategy: ALL # Wait for all (default)
completion_strategy: ANY # Proceed when any completes
completion_strategy: N_OF_M # Wait for specific count
n: 2 # Proceed when 2 complete5. Wait State (Timed Pause)
Purpose: Delay workflow execution
When to use: Rate limiting, scheduled operations, cooldown periods
CDL Syntax:
- name: CooldownBeforeRetry
type: Wait
duration: 1h
next: RetryApproval
# Alternative: Wait until specific time
- name: WaitUntilNextDay
type: Wait
until: "{{ now | addDate 1 day | startOfDay }}"
next: CheckForUpdatesPerformance:
- Zero cost: No polling or timers; Temporal handles durably
- Survives server restarts
- Can wait microseconds to years
6. Receive State (Event Waiting)
Purpose: Wait for external events (webhooks, messages)
When to use: Integrations, payment confirmations, third-party APIs
CDL Syntax:
- name: AwaitPaymentConfirmation
type: Receive
event_name: payment_confirmed
timeout: 5m
timeoutAction: RETRY_PAYMENT
next: SendConfirmationEvent Flow:
Workflow waits:
┌─────────────────────────────┐
│ AwaitPaymentConfirmation │
│ (listening for event) │
└─────────────────────────────┘
│
│ (External system: payment processor)
│
├─ POST /webhooks/events
│ event: payment_confirmed
│ payment_id: pay-123
│
▼
┌─────────────────────────────┐
│ Event matched! Resume with │
│ event data in context │
└─────────────────────────────┘7. EvaluatePolicy State (Business Rules)
Purpose: Execute policy evaluation (OPA, DMN, etc)
When to use: Complex business rules, compliance checks
CDL Syntax:
- name: CheckCompliancePolicy
type: EvaluatePolicy
resource: urn:cascade:policy:loan_approval_rules
parameters:
amount: "{{ $.amount }}"
customer_tier: "{{ $.customer.tier }}"
previous_defaults: "{{ $.customer.default_count }}"
result: $.policy_decision
next: ApplyPolicyOutcomeOPA Policy Example:
package loan_policies
# Approve if credit score excellent and amount reasonable
allow {
input.credit_score >= 750
input.amount <= 100000
}
# Require escalation for large amounts
require_escalation {
input.amount > 100000
input.credit_score >= 650
}
# Deny if high risk
deny {
input.credit_score < 600
}
deny {
input.previous_defaults > 2
}Advanced Features
1. Retries & Exponential Backoff
Automatic retry with intelligent backoff:
- name: CallUnreliableAPI
type: Task
resource: urn:cascade:activity:external_api_call
retries:
max_attempts: 5 # Retry up to 5 times
backoff:
initial_interval: 1s # Start with 1 second
max_interval: 60s # Cap at 60 seconds
multiplier: 2 # Double each time
timeout: 30s
next: ProcessResponseRetry Timeline:
Attempt 1: 0s [Call API] ❌ Failed
Wait 1s
Attempt 2: 1s [Call API] ❌ Failed
Wait 2s
Attempt 3: 3s [Call API] ❌ Failed
Wait 4s
Attempt 4: 7s [Call API] ❌ Failed
Wait 8s
Attempt 5: 15s [Call API] ✅ Success
Continue workflow...2. Context & State Management
Workflow context is updated at each state:
// Input context
{
"workflow_id": "wf-abc123",
"input": {
"applicant_id": "alice",
"amount": 50000
}
}
// After CreditCheck task:
{
"workflow_id": "wf-abc123",
"input": { ... },
"credit_score": 720 // ← Added by task
}
// After EvaluateEligibility choice:
{
"workflow_id": "wf-abc123",
"input": { ... },
"credit_score": 720,
"routing_decision": "FastTrack" // ← Added by choice
}
// After ManagerReview HumanTask:
{
"workflow_id": "wf-abc123",
"input": { ... },
"credit_score": 720,
"routing_decision": "FastTrack",
"approval_notes": "Looks good",
"approved_by": "manager@company.com" // ← Added by human
}3. Compensation & Error Handling
Rollback failed operations:
- name: ChargeCard
type: Task
resource: urn:cascade:activity:charge_credit_card
parameters:
amount: "{{ $.total }}"
compensation:
- name: RefundCard
type: Task
resource: urn:cascade:activity:refund_credit_card
parameters:
charge_id: "{{ $.charge_id }}"
next: SendShippingIf SendShipping fails:
1. SendShipping fails
2. Temporal invokes compensation chain (reverse order)
3. RefundCard executes (undoes ChargeCard)
4. Workflow ends or goes to error handlerReal-World Example: Insurance Claims Workflow
Complete CDL for multi-week insurance claim:
apiVersion: cascade.io/v1
kind: Application
metadata:
name: insurance-claims
namespace: finance
spec:
workflows:
- name: ProcessClaim
description: "End-to-end insurance claim processing"
start: InitializeClaim
states:
# 1. Validate claim
- name: InitializeClaim
type: Task
resource: urn:cascade:activity:validate_claim
parameters:
claim_number: "{{ workflow.input.claim_id }}"
policy_id: "{{ workflow.input.policy_id }}"
result: $.claim_data
timeout: 10s
next: RequestDocumentation
# 2. Request documents from claimant
- name: RequestDocumentation
type: HumanTask
description: "Claimant uploads supporting documents"
ui:
schema: urn:cascade:schema:claim_documentation_form
target: appsmith
timeout: 7d
timeoutAction: CANCEL_CLAIM
next: VerifyDocuments
# 3. Verify all documents in parallel
- name: VerifyDocuments
type: Parallel
branches:
- name: AuthenticateDocument1
type: Task
resource: urn:cascade:activity:verify_document_authenticity
- name: CheckMedicalRecords
type: Task
resource: urn:cascade:activity:check_medical_records
- name: VerifyDamagePhotos
type: Task
resource: urn:cascade:activity:verify_damage_photos
completion_strategy: ALL
next: EvaluatePolicy
# 4. Evaluate underwriting policy
- name: EvaluatePolicy
type: EvaluatePolicy
resource: urn:cascade:policy:claim_approval_criteria
parameters:
claim_amount: "{{ $.claim_data.amount }}"
damage_severity: "{{ $.verification.damage_level }}"
policy_type: "{{ $.claim_data.policy_type }}"
result: $.policy_evaluation
next: RouteClaim
# 5. Route based on amount
- name: RouteClaim
type: Choice
choices:
- condition: "{{ $.policy_evaluation.auto_approve }}"
next: ProcessApproval
description: "Auto-approval by policy"
- condition: "{{ $.claim_data.amount > 50000 }}"
next: ManagerReview
description: "Large claims need manager"
- condition: "{{ $.policy_evaluation.investigation_required }}"
next: InitiateInvestigation
description: "Suspicious claim"
default: ProcessRejection
# 6a. Auto-approval path
- name: ProcessApproval
type: Task
resource: urn:cascade:activity:approve_claim
parameters:
claim_id: "{{ workflow.input.claim_id }}"
amount: "{{ $.claim_data.amount }}"
result: $.approval_reference
next: SendPayment
# 6b. Manager review path
- name: ManagerReview
type: HumanTask
description: "Manager reviews large claim"
ui:
schema: urn:cascade:schema:claim_review_form
target: appsmith
assignee:
role: claims_manager
timeout: 3d
next: ReviewDecision
- name: ReviewDecision
type: Choice
choices:
- condition: "{{ $.manager_decision == 'approve' }}"
next: ProcessApproval
- condition: "{{ $.manager_decision == 'request_more_info' }}"
next: RequestAdditionalInfo
default: ProcessRejection
- name: RequestAdditionalInfo
type: HumanTask
ui:
schema: urn:cascade:schema:additional_info_request
target: appsmith
timeout: 7d
next: ProcessApproval
# 6c. Investigation path
- name: InitiateInvestigation
type: Task
resource: urn:cascade:activity:create_investigation
parameters:
claim_id: "{{ workflow.input.claim_id }}"
investigation_type: "{{ $.policy_evaluation.investigation_type }}"
result: $.investigation_reference
next: WaitForInvestigation
- name: WaitForInvestigation
type: Wait
duration: 14d
next: ProcessInvestigationResults
- name: ProcessInvestigationResults
type: Task
resource: urn:cascade:activity:get_investigation_results
parameters:
investigation_id: "{{ $.investigation_reference }}"
result: $.investigation_result
next: ProcessApprovalOrRejection
- name: ProcessApprovalOrRejection
type: Choice
choices:
- condition: "{{ $.investigation_result.fraud_detected }}"
next: ProcessRejection
- condition: "{{ $.investigation_result.approved }}"
next: ProcessApproval
default: ProcessRejection
# 7. Payment processing
- name: SendPayment
type: Task
resource: urn:cascade:activity:process_payment
parameters:
claim_id: "{{ workflow.input.claim_id }}"
amount: "{{ $.claim_data.amount }}"
payee: "{{ $.claim_data.payee }}"
result: $.payment_confirmation
retries:
max_attempts: 3
backoff:
initial_interval: 5s
max_interval: 60s
multiplier: 2
timeout: 2m
next: NotifyClaimant
# 8. Rejection
- name: ProcessRejection
type: Task
resource: urn:cascade:activity:reject_claim
parameters:
claim_id: "{{ workflow.input.claim_id }}"
reason: "{{ $.rejection_reason }}"
result: $.rejection_reference
next: NotifyClaimantRejection
# 9. Final notification
- name: NotifyClaimant
type: Task
resource: urn:cascade:activity:send_approval_email
parameters:
email: "{{ $.claim_data.claimant_email }}"
amount: "{{ $.claim_data.amount }}"
payment_date: "{{ $.payment_confirmation.date }}"
end: true
- name: NotifyClaimantRejection
type: Task
resource: urn:cascade:activity:send_rejection_email
parameters:
email: "{{ $.claim_data.claimant_email }}"
reason: "{{ $.rejection_reason }}"
end: true
# Activities (Go functions)
activities:
- name: validate_claim
description: "Validate claim format and policy"
urn: urn:cascade:activity:validate_claim
- name: verify_document_authenticity
description: "Verify document is authentic"
urn: urn:cascade:activity:verify_document_authenticity
# ... more activities
# Policies
policies:
- name: claim_approval_criteria
description: "Automatic claim approval criteria"
engine: opa
spec:
auto_approve: true # if conditions metPerformance Characteristics
Throughput
| Operation | Throughput | Notes |
|---|---|---|
| Workflow starts | 1000+ wf/sec | Limited by PostgreSQL |
| Task transitions | 10K+ ops/sec | In-memory |
| Choice evaluations | 100K+ ops/sec | Fast comparison |
| State persistence | 500+ ops/sec | Database I/O |
Latency (Cascading Effects)
| Component | Latency | Impact |
|---|---|---|
| Task execution | 1-1000ms | Your code time |
| Choice evaluation | <1ms | Negligible |
| State persistence | 5-50ms | Database round-trip |
| HumanTask wait | Minutes to days | Human time |
| Temporal overhead | <10ms | Scheduling |
| Total (simple workflow) | P95 <10s | Dominated by activities |
Scalability
- Horizontal: Activities scale independently (add more workers)
- Vertical: Single workflow throughput limited by Temporal server
- Storage: 1 year of workflows ≈ 50GB (PostgreSQL)
Monitoring & Observability
Metrics
# Prometheus metrics automatically exported
cascade_workflow_duration_seconds # How long workflows take
cascade_workflow_failures_total # Failed workflows
cascade_task_duration_seconds # Activity execution time
cascade_task_retries_total # Retry count
cascade_state_transition_duration # State latencyDistributed Tracing
GET /workflows/{id}/trace
Returns complete trace:
workflow_start: 0ms
├─ enter_state[CreditCheck]: 2ms
│ └─ call_activity[check_credit]: 150ms
│ └─ external_call: 120ms
│ └─ data_parse: 30ms
├─ enter_state[Choice]: 0.5ms
├─ enter_state[ManagerReview]: 1ms
│ └─ human_waiting: 3h 45m
├─ resume_from_human: 5ms
├─ enter_state[SendPayment]: 1.5ms
│ └─ call_activity[charge_card]: 200ms
└─ workflow_end: 0.5ms
Total: 3h 45m 20sTroubleshooting
Workflow Stuck in State
Symptom: Workflow hasn’t progressed in hours
Diagnosis:
cascade process inspect {workflow_id}
# Check last state and timestampSolutions:
- HumanTask timeout not configured? Add:
timeout: 24h - Activity error? Check logs:
cascade logs {workflow_id} - Send signal to resume:
cascade process signal {workflow_id} --action continue
Activity Timeout
Symptom: Activity timeout exceeded
Solution:
- name: SlowAPI
type: Task
timeout: 5m # Increase if your API needs time
retries:
max_attempts: 2State Explosion
Symptom: PostgreSQL fills up quickly
Cause: Too many workflow instances
Solutions:
- Archive completed workflows:
cascade admin archive --older-than 30d - Increase retention: Check
config.yaml
Best Practices
✅ DO:
- Use Parallel for independent operations (faster)
- Set timeouts on all tasks
- Use Choice for simple routing
- Monitor via OTEL metrics
- Archive old workflows
❌ DON’T:
- Create infinite loops (add safeguards)
- Store large data in context (use ID references)
- Poll for events (use Receive state)
- Retry everything (set reasonable limits)
Next Steps
Ready to build? → First Workflow Tutorial
Need to understand policies? → Policy Evaluation Capability
Want to visualize workflows? → Debugging Guide
Production deployment? → Operations Guide
Updated: October 29, 2025
Version: 1.0
Temporal Version: v1.20+
Production-Ready: Yes