Troubleshooting
Common issues and how to solve them.
Installation Issues
Error: “Docker is not running”
Message: Cannot connect to the Docker daemon
Cause: Docker service is not started
Solution:
# macOS
open -a Docker
# Linux
sudo systemctl start docker
# Windows
# Use Docker Desktop GUI or:
# Start-Service com.docker.service (PowerShell)Error: “Port already in use”
Message: Error response from daemon: bind: address already in use
Cause: PostgreSQL, Redis, or another service port is in use
Solution:
Option 1 - Change the port in docker-compose.yml:
services:
postgres:
ports:
- "5433:5432" # Changed from 5432 to 5433Option 2 - Stop the conflicting process:
# Find what's using port 5432
lsof -i :5432
# Kill the process
kill -9 <PID>Error: “Services won’t start”
Message: service did not converge or containers keep restarting
Solution: Check the logs:
# View all logs
docker-compose logs -f
# View specific service
docker-compose logs -f postgres
docker-compose logs -f temporal
# View logs since the last restart
docker-compose logs --tail 50 postgresError: “Out of memory”
Message: Cannot allocate memory or Docker crashes
Cause: Insufficient system resources
Solution:
Docker Desktop (macOS/Windows):
- Open Docker Desktop preferences
- Go to Resources
- Increase Memory to 6-8GB
- Click Apply & Restart
Linux:
Edit /etc/docker/daemon.json:
{
"memory": "8g",
"memory-swap": "8g"
}Then restart Docker:
sudo systemctl restart dockerError: “Health check fails”
Message: make health shows ✗ for some services
Solution:
Wait for services to start (they take time):
# Wait 30 seconds
sleep 30
# Try again
make healthCheck individual service logs:
docker-compose logs redis
docker-compose logs temporalDeployment Issues
Error: “Application already exists”
Message: Application 'my-app' already exists
Cause: You’re trying to create an app with a name that’s already in use
Solution:
# Option 1: Use a different name
cascade app apply --file app.yaml --name my-app-v2
# Option 2: Delete the existing app first
cascade app delete my-app
cascade app apply --file app.yamlError: “CDL validation failed”
Message: Invalid state: unknown state type 'MyState'
Cause: YAML syntax error or invalid state type
Solution:
Check the error message for the specific problem:
# Example issues:
# - State type typo (Task vs Task)
# - Missing 'next' field
# - Invalid resource URN format
# - Missing required fieldsFix common issues:
# ❌ WRONG
- name: MyState
type: Taask # Typo!
next: NextState
# ✅ CORRECT
- name: MyState
type: Task
resource: urn:cascade:activity:my_activity
next: NextStateError: “Schema not found”
Message: urn:cascade:schema:leave_request_form not found
Cause: The schema URN doesn’t exist in your application
Solution:
Define the schema in your application:
spec:
schemas:
- urn: urn:cascade:schema:leave_request_form
type: object
properties:
employee_name:
type: string
leave_dates:
type: arrayOr reference an existing schema from the platform.
Workflow Issues
Workflow stuck in “RUNNING” state
Message: Workflow hasn’t completed after several hours
Cause: Workflow is waiting for something (HumanTask, external event)
Solution:
Check what state it’s in:
cascade process inspect <instance-id> --app=<app-name>
# Look for states with status "waiting" or "pending"Options:
- If waiting for HumanTask: Complete the human task in the UI
- If waiting for external event: Send the event
- If stuck for no reason: Check logs for errors
cascade logs --app=<app-name> --instance=<instance-id>Error: “Activity execution failed”
Message: Activity resource:urn:cascade:activity:my_activity failed after 3 retries
Cause: Your Go activity function threw an error
Solution:
- Check the error details in logs:
cascade logs --app=<app-name> --instance=<instance-id>-
Common issues:
- Database connection failed → Check database is running
- API call failed → Check network connectivity
- Invalid input parameters → Check parameter mapping
-
Fix and retry:
# The workflow will retry automatically
# Or manually retry the entire workflow
cascade process cancel <instance-id> --app=<app-name>
# Then start a new instanceError: “Policy evaluation failed”
Message: Policy urn:cascade:policy:my_policy evaluation failed
Cause: OPA policy has a syntax error or runtime issue
Solution:
- Check the policy file for syntax errors
# ❌ WRONG - missing colon
package my_policy
result = "value"
# ✅ CORRECT
package my_policy
result := "value"- Test the policy locally:
# Using OPA CLI (if installed)
opa run -s -d policy.rego
# Then test interactively:
# data.my_policy.result with input as {...}- Check logs for exact error:
cascade logs --instance=<instance-id> | grep -i policyError: “State not found”
Message: State 'NextState' not found in workflow
Cause: Typo in state name or missing state definition
Solution:
Check your YAML for typos:
# ❌ WRONG - typo in next state
- name: MyState
type: Task
next: NextSate # Typo!
# ✅ CORRECT
- name: MyState
type: Task
next: NextStateEnsure all states referenced in next are defined.
Performance Issues
UI Rendering is slow (>200ms)
Cause: Schema is complex or cache is not working
Solution:
- Check if Redis is running and healthy:
make health # Look for Redis ✓-
Simplify your schema (fewer fields, nested structures)
-
Check cache hit rate:
cascade metrics --instance=<instance-id> | grep -i cacheWorkflow execution is slow
Cause: Activities taking too long or Temporal configuration
Solution:
- Check activity execution times:
cascade trace --instance=<instance-id>
# Look for activities with long duration-
Common issues:
- Database queries too slow → Add indexes
- API calls slow → Increase timeout
- External system issues → Check connectivity
-
Check Temporal server health:
docker-compose logs temporalPolicy evaluation is slow (>10ms)
Cause: OPA policy is complex or cache is not configured
Solution:
-
Use the right policy engine:
- Choice for simple logic (
<0.1ms) - OPA for medium (expected
<5ms) - DMN for complex (expected 10-50ms)
- Choice for simple logic (
-
Check if caching is enabled:
- name: MyPolicy
type: EvaluatePolicy
policy: urn:cascade:policy:my_policy
cacheResult: true # Enable caching- Simplify your OPA policy:
- Fewer rules
- Avoid heavy computations
- Use built-in functions efficiently
Debug Commands
View Application Details
cascade app inspect my-appShows: workflows, policies, schemas, deployment status
View Workflow Instance
cascade process inspect <instance-id> --app=my-appShows: current state, status, elapsed time, outputs
View Execution Trace
cascade trace <instance-id> --format=jsonShows: all state transitions, timestamps, durations
View Logs
# Last 50 lines
cascade logs --app=my-app --instance=<instance-id> --tail 50
# Search for errors
cascade logs --app=my-app --instance=<instance-id> | grep ERROR
# Follow logs in real-time
cascade logs --app=my-app --instance=<instance-id> --followView Metrics
cascade metrics --instance=<instance-id>Shows: execution time, cache hits, policy evaluation times
List All Instances
cascade process list --app=my-app
# Filter by status
cascade process list --app=my-app --status=running
cascade process list --app=my-app --status=completed
cascade process list --app=my-app --status=failedConnection Issues
Cannot connect to PostgreSQL
Error: connection refused at localhost:5432
Cause: Database is not running or not accessible
Solution:
# Check if PostgreSQL container is running
docker-compose ps | grep postgres
# If not running, start it
docker-compose up -d postgres
# Wait a moment and test connection
docker-compose exec postgres psql -U postgres -d cascadeCannot connect to Redis
Error: connection refused at localhost:6379
Cause: Redis is not running
Solution:
# Check status
docker-compose logs redis
# Restart
docker-compose restart redis
# Verify
redis-cli ping # Should return PONGCannot connect to Temporal
Error: connection refused at localhost:7233
Cause: Temporal server is not running
Solution:
# Check status
docker-compose logs temporal
# Temporal takes time to start, wait and retry
sleep 15
cascade process list # Try a command that uses TemporalGetting Help
If none of these solutions work:
-
Check logs first:
docker-compose logs -f -
Check GitHub issues:
- Search existing issues: https://github.com/cascade-platform/cascade/issues
- Check if your issue is already known
-
Ask in community:
- Slack: Community Slack
- Discord: Community Discord
-
Create a GitHub issue:
- Include: error message, steps to reproduce, logs
- Example output from
make health - Your OS and Docker version
Tip: Most issues are resolved by:
- Checking
make health(are all services running?) - Checking logs (what’s the actual error?)
- Restarting Docker (
docker-compose down && make setup)
Happy troubleshooting! 🔧