Server
The Durable Workflow server is a standalone, language-neutral workflow orchestration service. It exposes the same durable execution engine as the PHP package over HTTP, letting you write workflows in Python, PHP, or any language that speaks HTTP.
If you are deciding between the standalone server and package embedding, start with Deployment Modes. This page covers the service-mode distribution.
Use the standalone server when you need:
- Polyglot workflows — Python workers executing PHP-authored workflows, or vice versa
- Microservice orchestration — orchestrate services written in different languages
- Centralized workflow runtime — multiple applications sharing one workflow engine
- Non-Laravel environments — use Durable Workflow outside Laravel
If you already run v2 embedded in a Laravel app, use the embedded-to-server migration guide to prepare type keys, deploy the server beside embedded execution, connect workers, and route only new workflow starts to the server. Keep Deployment Modes nearby during that cutover so ids, command outcomes, task semantics, and runtime ownership rules stay explicit.
Use Worker Compatibility and Routing when you roll worker build cohorts, drain old cohorts, or need to keep long-running runs pinned to compatible executors during rollback.
Use Server Role Topology when you
need the live role vocabulary, process classes, authority boundaries, failure
domains, or migration path that GET /api/cluster/info publishes.
Quick Start
Published Image + SQLite
The fastest source-free way to run the server is the published Docker image.
This quickstart uses SQLite, database queues, and file cache inside the
container. Mount /app/database so bootstrap and the API server share the same
SQLite file:
server_image=durableworkflow/server:0.2.261
export DW_AUTH_TOKEN=dev-token
docker volume create durable-workflow-server-quickstart
docker run --rm \
-v durable-workflow-server-quickstart:/app/database \
-e DW_AUTH_DRIVER=token \
-e DW_AUTH_TOKEN="$DW_AUTH_TOKEN" \
"$server_image" server-bootstrap
docker rm -f durable-workflow-server >/dev/null 2>&1 || true
docker run -d --name durable-workflow-server \
-p 8080:8080 \
-v durable-workflow-server-quickstart:/app/database \
-e DW_AUTH_DRIVER=token \
-e DW_AUTH_TOKEN="$DW_AUTH_TOKEN" \
"$server_image"
until curl -sf http://localhost:8080/api/ready >/dev/null; do sleep 1; done
curl http://localhost:8080/api/health
curl -H "Authorization: Bearer $DW_AUTH_TOKEN" \
http://localhost:8080/api/cluster/info \
| jq '.topology | {current_shape, current_roles, execution_mode}'
This starts one API server container and creates the default namespace. It is enough for local Python SDK workers and CLI checks. Use the published Compose path below when you want MySQL, Redis, separate worker and scheduler containers, or a closer production rehearsal.
Published Image + Compose
Use the published Compose artifact when you want the source-free multi-container stack backed by MySQL and Redis:
curl -fsSLO https://raw.githubusercontent.com/durable-workflow/server/main/docker-compose.published.yml
server_image=durableworkflow/server:0.2.261
export DW_AUTH_TOKEN=dev-token
env DW_SERVER_IMAGE="$server_image" docker compose \
-f docker-compose.published.yml up -d --wait
curl -H "Authorization: Bearer $DW_AUTH_TOKEN" \
http://localhost:8080/api/cluster/info \
| jq '.topology | {current_shape, current_roles, execution_mode}'
Ports
| Service | Port | Purpose |
|---|---|---|
| Server API | 8080 | Control-plane and worker-protocol endpoints |
| MySQL | 3306 | Database (exposed for development convenience) |
| Redis | 6379 | Cache and queue (exposed for development convenience) |
Configuration
The server uses environment variables for configuration. Key settings are
summarized below; the full operator-facing DW_* contract is documented in
the server config reference.
Database
DB_CONNECTION=mysql
DB_HOST=mysql
DB_PORT=3306
DB_DATABASE=workflow
DB_USERNAME=workflow
DB_PASSWORD=secret
Supported: MySQL 8.0+, PostgreSQL 13+, SQLite 3.35+.
Cache and Queue
CACHE_STORE=redis
QUEUE_CONNECTION=redis
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_PASSWORD=null
REDIS_DB=0
Cache must support atomic locks. Queue drivers: Redis, Amazon SQS, Beanstalkd, database.
Atomic cache locks are required for server-side task queue admission caps and query-task backpressure. Use Redis for multi-node deployments that need workflow, activity, or query admission to hold across every server process.
Authentication
The server supports three auth modes:
Token-based (default):
DW_AUTH_DRIVER=token
DW_AUTH_TOKEN=your-secret-token-here
All requests must send Authorization: Bearer your-secret-token-here.
For least-privilege deployments, configure role-scoped tokens instead of one shared token:
DW_AUTH_DRIVER=token
DW_WORKER_TOKEN=worker-secret
DW_OPERATOR_TOKEN=operator-secret
DW_ADMIN_TOKEN=admin-secret
Worker tokens can register workers, poll tasks, heartbeat, and complete work. Operator tokens can start, list, signal, query, update, repair, cancel, terminate, archive, and observe workflows. Admin tokens can use administrative endpoints such as namespace and retention management.
HMAC signature:
DW_AUTH_DRIVER=signature
DW_SIGNATURE_KEY=your-signature-secret
Requests must include X-Signature, calculated as
hash_hmac('sha256', request_body, DW_SIGNATURE_KEY). The server
also accepts role-scoped signature keys:
DW_AUTH_DRIVER=signature
DW_WORKER_SIGNATURE_KEY=worker-signature-secret
DW_OPERATOR_SIGNATURE_KEY=operator-signature-secret
DW_ADMIN_SIGNATURE_KEY=admin-signature-secret
No auth (development only):
DW_AUTH_DRIVER=none
⚠️ Do not use none in production. All endpoints become publicly accessible.
Workflow Package
The Docker image installs the durable-workflow/workflow package. Control which version:
# Build-time arg (set in docker-compose.yml or pass to docker build)
WORKFLOW_PACKAGE_REF=v2 # branch, tag, or commit
WORKFLOW_PACKAGE_SOURCE= # custom Git remote (optional)
Retention
Configure how long completed workflows remain queryable:
DW_HISTORY_RETENTION_DAYS=30
After retention expires, workflows are pruned. Configure per-namespace retention via the API.
Namespaces
The server-bootstrap command runs migrations and seeds the default
namespace. Use DW_DEFAULT_NAMESPACE to change the namespace used when a
request omits the namespace header:
DW_DEFAULT_NAMESPACE=default
Create namespaces via the API:
curl -X POST http://localhost:8080/api/namespaces \
-H "Authorization: Bearer $TOKEN" \
-H "X-Durable-Workflow-Control-Plane-Version: 2" \
-H "Content-Type: application/json" \
-d '{
"name": "production",
"description": "Production workflows",
"retention_days": 90
}'
Health Checks
API Health
curl http://localhost:8080/api/health
Returns 200 OK with:
{
"status": "serving",
"timestamp": "2026-04-15T12:00:00Z",
"checks": {
"database": "ok"
},
"topology": {
"schema": "durable-workflow.v2.role-topology",
"version": 4,
"current_shape": "standalone_server",
"current_process_class": "server_http_node",
"current_roles": ["api_ingress", "control_plane", "matching", "history_projection"],
"execution_mode": "remote_worker_protocol",
"matching_role": {
"queue_wake_enabled": true,
"shape": "in_worker",
"wake_owner": "worker_loop",
"task_dispatch_mode": "poll",
"partition_primitives": ["connection", "queue", "compatibility", "namespace"],
"backpressure_model": "lease_ownership",
"discovery_limits": {
"poll_batch_cap": 100,
"availability_ceiling_seconds": 1,
"wake_signal_ttl_seconds": 60,
"workflow_task_lease_seconds": 300,
"activity_task_lease_seconds": 300
}
}
}
}
Public Topology Summary
Unauthenticated GET /api/health and GET /api/ready both publish the
responding node's topology summary. That public block is intentionally
smaller than /api/cluster/info, but it still exposes the fields needed to
identify split-role nodes before control-plane auth or namespace resolution:
topology.schematopology.versiontopology.current_shapetopology.current_process_classtopology.current_rolestopology.execution_modetopology.matching_role.queue_wake_enabledtopology.matching_role.shapetopology.matching_role.wake_ownertopology.matching_role.task_dispatch_modetopology.matching_role.partition_primitivestopology.matching_role.backpressure_modeltopology.matching_role.discovery_limits.poll_batch_captopology.matching_role.discovery_limits.availability_ceiling_secondstopology.matching_role.discovery_limits.wake_signal_ttl_secondstopology.matching_role.discovery_limits.workflow_task_lease_secondstopology.matching_role.discovery_limits.activity_task_lease_seconds
topology.matching_role.discovery_limits is the frozen numeric matching-role
contract that compiles into the workflow package: poll_batch_cap is the
maximum batch of ready-task rows returned per poll,
availability_ceiling_seconds is the cross-backend tolerance applied to
available_at so freshly-available tasks survive sub-second timestamp drift,
wake_signal_ttl_seconds is the default CacheLongPollWakeStore signal TTL,
and workflow_task_lease_seconds / activity_task_lease_seconds are the
default workflow and activity task lease durations. Tightening any of these
values is a protocol-level change because workers and downstream tooling read
them as the authoritative matching-role contract.
The same summary appears on /api/ready even when the deployment is not ready,
so probes can still distinguish server_http_node, scheduler_node,
matching_node, and execution_node responses while bootstrap blockers are
active.
Readiness
curl http://localhost:8080/api/ready
/api/ready is the deployment gate. It returns 200 OK only when bootstrap
prerequisites and rollout-safety health are in a ready or warning state.
Treat the machine-readable fields as follows:
checks.migrations.repository_existsandchecks.migrations.pending_migrationstell you whether the migration repository exists and which migration records are still pending.checks.migrations.adoptable_migrationslists create-table migrations that only need adoption into migration history. This is awarning, not a fail-closed outage, so the server can stay ready while operators schedule the adoption.checks.migrations.blocking_migrationslists rollout-safety migration records that must land before the server should admit traffic. When this array is not empty, readiness fails closed withchecks.migrations.status = "pending".checks.migrations.missing_tablesreports durable tables that are still absent, andchecks.migrations.operator_surfacetells you whether the v2 operator surface is available enough to explain rollout safety once the server boots.checks.migrations.readiness_contract.versionpins the boot and migration adoption contract revision that scripts should parse.checks.workflow_v2mirrors the all-namespaces rollout-safety verdict. When rollout-safety cannot be evaluated yet it returnsstatus: "blocked"plusblocked_by,message, andremediationso operators can fix the upstream readiness gate instead of chasing queue symptoms.
Workflow Bootstrap Gate
checks.workflow_v2.status: "blocked" is also a route-level gate, not just a
readiness signal. While workflow v2 bootstrap is blocked, the server fails
closed on workflow start/mutation, schedule mutation, bridge-adapter, and
worker-protocol routes with HTTP 503 and a machine-readable
reason: "workflow_v2_blocked" payload. The gate runs after role and
protocol-version validation but before namespace resolution, so blocked
requests never observe namespace existence.
The bootstrap-gate response always carries:
reason: "workflow_v2_blocked"so callers branch on a machine-readable name instead of a prose message.blocked_by: the ordered list of upstream readiness blockers (for examplemigrations).remediation: the operator-facing instruction for clearing the listed blockers, mirrored from/api/readychecks.workflow_v2.remediation.
Bootstrap-gated route families:
- Workflow start and mutation —
/api/workflowsstart, command, and run-targeted command routes. - Schedule mutation —
POST /api/schedules,PUT /api/schedules/{scheduleId},DELETE /api/schedules/{scheduleId},POST /api/schedules/{scheduleId}/pause,POST /api/schedules/{scheduleId}/resume,POST /api/schedules/{scheduleId}/trigger, andPOST /api/schedules/{scheduleId}/backfill. - Bridge adapters —
POST /api/bridge-adapters/webhook/{adapter}. - Worker protocol — every
/api/workerand/api/worker/*route, including registration, heartbeat, workflow-task, query-task, and activity-task verbs. Worker-protocol routes return the bootstrap-gate payload in the worker-protocol envelope and keep theX-Durable-Workflow-Protocol-Versionheader so worker SDKs can branch on the samereason: "workflow_v2_blocked"field they parse from the control plane.
Schedule reads are intentionally exempted so operators can inspect
schedule state during recovery: GET /api/schedules,
GET /api/schedules/{scheduleId}, and
GET /api/schedules/{scheduleId}/history continue to serve while the
bootstrap gate is blocking other routes.
Server Capabilities
curl http://localhost:8080/api/cluster/info \
-H "Authorization: Bearer $TOKEN"
Returns the server build version, supported SDK versions, engine capabilities, the client compatibility policy, and the independently-versioned control-plane and worker-protocol manifests:
{
"server_id": "server-1",
"version": "2.0.0",
"default_namespace": "default",
"supported_sdk_versions": {
"php": ">=1.0",
"python": ">=0.2,<1.0",
"cli": ">=0.1,<1.0"
},
"client_compatibility": {
"schema": "durable-workflow.v2.client-compatibility",
"version": 1,
"authority": "protocol_manifests",
"top_level_version_role": "informational",
"fail_closed": true
},
"capabilities": {
"workflow_tasks": true,
"activity_tasks": true,
"signals": true,
"queries": true,
"updates": true,
"schedules": true,
"child_workflow_retry_policy": true,
"child_workflow_timeouts": true,
"payload_codecs": ["avro"],
"response_compression": ["gzip", "deflate"]
},
"control_plane": {
"version": "2",
"header": "X-Durable-Workflow-Control-Plane-Version",
"request_contract": { "schema": "durable-workflow.v2.control-plane-request.contract", "version": 1, "...": "..." },
"response_contract": { "schema": "durable-workflow.v2.control-plane-response.contract", "version": 1, "...": "..." }
},
"worker_protocol": {
"version": "1.0",
"server_capabilities": {
"long_poll_timeout": 30,
"supported_workflow_task_commands": [
"complete_workflow",
"fail_workflow",
"continue_as_new",
"schedule_activity",
"start_timer",
"start_child_workflow"
],
"workflow_task_poll_request_idempotency": true,
"poll_status": true,
"history_page_size_default": 500,
"history_page_size_max": 1000,
"activity_retry_policy": true,
"activity_timeouts": true,
"child_workflow_retry_policy": true,
"child_workflow_timeouts": true,
"parent_close_policy": true,
"non_retryable_failures": true,
"response_compression": ["gzip", "deflate"],
"history_compression": {
"supported_encodings": ["gzip"],
"compression_threshold": 8192
}
}
}
}
Treat client_compatibility.authority: "protocol_manifests" as the rule for
client checks. The top-level version is build identity; CLI and SDK clients
should fail closed when control_plane.version,
control_plane.request_contract, or worker_protocol.version is missing or
unsupported.
Role topology and deployment shape
The field-by-field reference for this manifest lives on
Server Role Topology. Keep this
section for the inline cluster/info example and use the dedicated page when
you need the supported shapes, authority boundaries, failure domains, scaling
boundaries, or migration-path contract in one place.
GET /api/cluster/info also publishes a topology manifest. It is the
machine-readable role map for the node that answered the request, so operators
and automation can read one contract instead of inferring node duties from
container names or rollout runbooks.
{
"topology": {
"schema": "durable-workflow.v2.role-topology",
"version": 2,
"supported_shapes": [
"embedded",
"standalone_server",
"split_control_execution"
],
"role_vocabulary": [
"api_ingress",
"control_plane",
"matching",
"history_projection",
"scheduler",
"execution_plane"
],
"current_shape": "standalone_server",
"current_process_class": "server_http_node",
"current_roles": [
"api_ingress",
"control_plane",
"matching",
"history_projection"
],
"execution_mode": "remote_worker_protocol",
"matching_role": {
"queue_wake_enabled": true,
"shape": "in_worker",
"wake_owner": "worker_loop",
"task_dispatch_mode": "poll",
"partition_primitives": [
"connection",
"queue",
"compatibility",
"namespace"
],
"backpressure_model": "lease_ownership",
"discovery_limits": {
"poll_batch_cap": 100,
"availability_ceiling_seconds": 1,
"wake_signal_ttl_seconds": 60,
"workflow_task_lease_seconds": 300,
"activity_task_lease_seconds": 300
}
},
"shape_assignments": {
"embedded": {
"process_classes": [
{
"name": "application_process",
"roles": [
"control_plane",
"matching",
"history_projection",
"scheduler",
"execution_plane"
]
}
]
},
"standalone_server": {
"process_classes": [
{
"name": "server_http_node",
"roles": [
"api_ingress",
"control_plane",
"matching",
"history_projection"
]
},
{
"name": "scheduler_node",
"roles": ["scheduler"]
},
{
"name": "worker_node",
"roles": ["execution_plane"]
}
]
},
"split_control_execution": {
"process_classes": [
{
"name": "ingress_node",
"roles": ["api_ingress"]
},
{
"name": "control_plane_node",
"roles": ["control_plane", "history_projection"]
},
{
"name": "scheduler_node",
"roles": ["scheduler"]
},
{
"name": "matching_node",
"roles": ["matching"]
},
{
"name": "execution_node",
"roles": ["execution_plane"]
}
]
}
},
"authority_boundaries": {
"control_plane": {
"writes": [
"workflow_instances",
"workflow_runs.status",
"workflow_tasks.lifecycle"
]
},
"execution_plane": {
"writes": [
"workflow_tasks.outcomes",
"activity_attempts",
"worker_compatibility_heartbeats"
]
},
"matching": {
"writes": [
"workflow_tasks.leases",
"activity_tasks.leases"
]
},
"history_projection": {
"writes": [
"history_events",
"workflow_run_summaries",
"workflow_history_exports"
]
},
"scheduler": {
"writes": [
"workflow_schedules.fire_state",
"workflow_starts.scheduled"
]
},
"api_ingress": {
"writes": ["worker_registrations"]
}
},
"failure_domains": {
"control_plane_down": {
"effect": "workers_continue_claimed_tasks_only_until_lease_expiry",
"operator_signal": "operator_commands_fail_fast"
},
"execution_plane_down": {
"effect": "ready_tasks_accumulate_without_loss",
"operator_signal": "operators_see_ready_depth_growth"
},
"matching_down": {
"effect": "claim_falls_back_to_direct_ready_task_discovery",
"operator_signal": "ready_depth_rises_while_claim_rate_falls"
},
"history_projection_down": {
"effect": "projection_reads_may_stale_while_durable_writes_continue",
"operator_signal": "projection_lag_seconds_may_increase"
},
"scheduler_down": {
"effect": "scheduled_workflows_stop_firing_and_record_missed_runs",
"operator_signal": "operators_see_missed_schedule_state"
},
"api_ingress_down": {
"effect": "external_http_traffic_stops_at_the_edge",
"operator_signal": "embedded_in_process_calls_may_continue"
}
},
"scaling_boundaries": {
"api_ingress": "incoming_http_request_rate",
"control_plane": "operator_commands_and_run_lifecycle_transitions",
"matching": "ready_task_rate_and_poller_count",
"history_projection": "durable_event_rate",
"scheduler": "active_schedule_count",
"execution_plane": "workflow_and_activity_task_rate"
},
"migration_path": [
{
"step": "audit_role_boundaries",
"result": "tooling flags cross-role writes before runtime shape changes",
"reversible": true
},
{
"step": "expose_role_bindings",
"result": "container seams allow out-of-process adapters without patching the package",
"reversible": true
},
{
"step": "introduce_dedicated_matching_shape",
"result": "matching can run as its own process class without changing the claim contract",
"reversible": true
},
{
"step": "split_history_projection",
"result": "history and projections can move out of process without introducing a second writer",
"reversible": true
},
{
"step": "split_scheduler",
"result": "schedule firing can move behind leader election while single-replica deployments stay legal",
"reversible": true
},
{
"step": "optional_execution_partitioning",
"result": "workers can partition by namespace, connection, queue, and compatibility",
"reversible": true
}
],
"kernel_invariants": [
{
"id": "single_persistence_engine",
"summary": "one workflow database backs every topology shape; role split does not introduce a second persistence engine",
"applies_to": ["embedded", "standalone_server", "split_control_execution"]
},
{
"id": "single_worker_protocol",
"summary": "one HTTP worker protocol carries claim, complete, fail, and heartbeat traffic across every topology; role split does not fork the worker contract",
"applies_to": ["embedded", "standalone_server", "split_control_execution"]
},
{
"id": "single_history_writer",
"summary": "history_events has exactly one durable writer per logical event regardless of where the history/projection role runs",
"applies_to": ["embedded", "standalone_server", "split_control_execution"]
},
{
"id": "single_control_authority_per_run",
"summary": "every mutation of a given workflow run routes through one control-plane authority; per-run row locks serialise transitions across replicas",
"applies_to": ["embedded", "standalone_server", "split_control_execution"]
},
{
"id": "embedded_topology_remains_supported",
"summary": "the embedded shape where one process fills every role MUST stay legal; existing embedded hosts are never forced to migrate",
"applies_to": ["embedded", "standalone_server", "split_control_execution"]
},
{
"id": "role_split_is_topology_only",
"summary": "splitting roles is a topology change, not a product fork; collapsing the roles back onto a single process is always a legal topology",
"applies_to": ["embedded", "standalone_server", "split_control_execution"]
}
]
},
"coordination_health": {
"schema": "durable-workflow.v2.coordination-health.contract",
"version": 2,
"namespace_scope": "all_namespaces",
"status": "ok",
"http_status": 200,
"warning_checks": [],
"error_checks": [],
"categories": {
"correctness": "ok"
},
"checks": [
{
"name": "worker_compatibility",
"status": "ok",
"category": "correctness",
"message": null
},
{
"name": "activity_path",
"status": "ok",
"category": "correctness",
"message": null
}
],
"routing_drains": {
"queues_with_drains": 0,
"draining_build_id_count": 0,
"active_worker_count": 0,
"draining_worker_count": 0,
"stale_worker_count": 0,
"queues": []
}
}
}
Treat topology.version as the role-manifest schema version, not as a synonym
for the top-level server build version. Automation should check that field
before assuming fields added by a newer topology manifest revision. The
current public contract includes supported_shapes, role_vocabulary,
current_shape, current_process_class, current_roles, execution_mode,
matching_role, role_catalog, shape_assignments,
authority_boundaries, authority_surfaces, failure_domains,
supported_topologies, scaling_boundaries, migration_path, and
kernel_invariants.
Read the fields as follows:
supported_shapesnames the legal product topologies.role_vocabularyis the fixed list of v2 role names. Treat it as the canonical vocabulary for automation and diagnostics.current_shape,current_process_class, andcurrent_rolesdescribe the node you queried right now. Usecurrent_process_classas the node's declared identity, then compare the current role bundle againstshape_assignmentsfor the current shape when you need to validate that declaration.execution_modedistinguishes embedded local queue execution (local_queue_worker) from standalone server worker-protocol execution (remote_worker_protocol).matching_role.queue_wake_enabled,matching_role.shape,matching_role.wake_owner,matching_role.task_dispatch_mode,matching_role.partition_primitives, andmatching_role.backpressure_modeltell you whether the node still runs the in-worker wake path or expects a dedicated repair or matching loop to own that sweep, which routing axes remain stable, and which durable admission boundary the matching layer currently enforces.matching_role.discovery_limitsfreezes the numeric matching-role contract values the workflow package compiles in:poll_batch_cap(the maximum batch of ready-task rows returned per poll),availability_ceiling_seconds(the cross-backend tolerance applied toavailable_atso freshly-available tasks survive sub-second timestamp drift),wake_signal_ttl_seconds(the default long-poll wake-signal TTL),workflow_task_lease_seconds(the default workflow task lease), andactivity_task_lease_seconds(the default activity task lease). Operators read these to verify the deployment matches the documented matching-role contract without grepping the package source; tightening any value is a protocol-level change.role_catalogandauthority_surfacestell you which interfaces and durable mutation paths each role owns on the current manifest revision.shape_assignmentsmaps each supported shape to the process classes and role bundles that shape is allowed to run.supported_topologiessummarizes the deployment families the product supports and the node classes each family expects.authority_boundariesnames which durable write surfaces each role is expected to mutate, so operators can catch cross-role drift before they split a deployment.failure_domainsdescribes the first operator-visible degradation signal when a role goes down, instead of leaving that expectation implicit in a runbook.scaling_boundariesnames the main load dimension for each role when the topology is split.- authenticated hosted routes fail closed when the responding node does not
host the HTTP control surface. In that case the server returns
503withreason: "topology_role_unavailable"pluscurrent_shape,current_process_class,current_roles,required_roles, andmissing_rolesso callers can reroute to a node that actually exposes the requested surface. coordination_healthis the fleet-wide rollout-safety summary published from the same discovery call. It usesall_namespacesscope, summarizes the current status and HTTP posture, lists the normalized warning/error check names that also feed readiness health, and addsblocked_by,message, plusremediationwhen rollout-safety evaluation is blocked by upstream readiness problems.coordination_health.checks[]always includes the frozen checkactivity_pathnext toworker_compatibility,task_transport,routing_health,durable_resume_paths, and the projection/scheduler checks.activity_pathis the activity-side counterpart oftask_transport: it surfaces activity executions whose schedule-to-start, start-to-close, schedule-to-close, or heartbeat deadline has passed without enforcement (timeout_overdue,oldest_timeout_overdue_at,max_timeout_overdue_age_ms) and the sustained retry backlog (retrying,oldest_retrying_started_at,max_retrying_age_ms). Renaming the check is a protocol-level change.coordination_health.routing_drainssummarizes draining build-id cohorts across queues and namespaces.queues_with_drainsgreater than zero means the fleet is intentionally holding traffic away from at least one draining cohort.migration_pathlists the ordered rollout steps from today's standalone distribution toward more isolated role boundaries without introducing a second engine. Each entry'sreversible: trueflag declares that collapsing back to a less-isolated shape stays a legal topology.kernel_invariantsenumerates the durable-kernel guarantees the role split must preserve regardless of which supported shape is running:single_persistence_engine,single_worker_protocol,single_history_writer,single_control_authority_per_run,embedded_topology_remains_supported, androle_split_is_topology_only. Each entry'sapplies_tolists the supported shapes the invariant covers; rollout automation MAY use the field to assert that a candidate topology change preserves the kernel before applying the shape change.
This keeps the role split as a topology change, not a second engine or a
separate control-plane API. When a deployment evolves from a narrow
standalone_server fleet toward a more explicit split_control_execution
shape, operators still read the same discovery surface. The values under
current_shape, current_roles, execution_mode, matching_role,
shape_assignments, authority_boundaries, failure_domains,
scaling_boundaries, and migration_path are
versioned as one manifest so rollout tooling can reason about the same
topology surface the server ships.
The same constraint also surfaces machine-readably through
topology.kernel_invariants so rollout automation can verify that no
candidate topology change introduces a second persistence engine, a
forked worker protocol, a second history writer, or a non-reversible
migration before applying the change.
The hosted-route gate applies only to authenticated API and worker endpoints.
GET /api/health, GET /api/ready, and authenticated GET /api/cluster/info
stay available for discovery, liveness, and topology inspection even on
scheduler_node, matching_node, or execution_node processes that do not
host the current HTTP control surface.
For carrier-neutral external handlers, the same endpoint publishes
worker_protocol.external_execution_surface_contract. That manifest names the
activity-grade external execution surface,
links the external task input/result envelope contracts, and keeps workflow
replay, ContinueAsNew, signal/update/query ordering, and event-history
interpretation inside real runtimes.
Key field notes for client code:
- The app version is
version, notserver_version. - Workflow-task command capabilities live under
worker_protocol.server_capabilities.supported_workflow_task_commands, not at the top ofworker_protocol. The same nested object is echoed on every worker-plane response via theserver_capabilitiesfield. worker_protocol.server_capabilities.poll_statusmeans poll responses keep a machine-readablepoll_statusfield even when no task is leased, so workers can distinguishempty,throttled,unavailable, anddrainingoutcomes without scraping prose error messages.- Worker command-option capabilities, including retry policies, timeout fields, parent-close policy, and non-retryable failures, are also echoed in
server_capabilitiesso workers can negotiate behavior without a separate cluster-info request. - Universal payload codecs live under
capabilities.payload_codecs; final v2 advertisesavrothere. When the server advertises engine-specific codecs that only a PHP worker can honor, those appear undercapabilities.payload_codecs_engine_specific.<engine>— language-neutral SDKs should ignore that object unless they opt into that engine.
Connecting Workers
Workers poll the server for tasks and execute workflow code or activities. See the Worker Protocol reference for the full API contract. For the route role matrix, namespace lookup rules, and exact worker registration payload, see Namespace, Auth, And Worker Registration.
PHP Workers
PHP workers use the durable-workflow/workflow package in standalone server mode:
composer require durable-workflow/workflow:2.0.0-alpha.193@alpha
The pin includes the Composer prerelease stability suffix for the current
public artifact tuple while 2.0 is pre-stable on Packagist. Switch to
durable-workflow/workflow:^2.0 only after 2.0.0 is tagged stable and the
documented 2.0 cutover is authorized.
Configure the worker to connect to the server:
// config/workflow.php
return [
'mode' => 'server',
'server' => [
'url' => env('DURABLE_WORKFLOW_SERVER_URL', 'http://localhost:8080'),
'token' => env('DURABLE_WORKFLOW_AUTH_TOKEN'),
'namespace' => env('DURABLE_WORKFLOW_NAMESPACE', 'default'),
],
];
Run the worker process from the application that registers your PHP workflow types. The sample app publishes the current PHP worker reference as an application command:
php artisan app:polyglot-worker \
--mode=workflow \
--server-url=http://localhost:8080 \
--token="$DURABLE_WORKFLOW_AUTH_TOKEN" \
--namespace=default \
--task-queue=polyglot-php
The workflow package does not publish a package-level artisan worker command in the current 2.0 prerelease. Embedded Laravel workers that do not use the standalone worker protocol continue to run package-local workflow tasks through Laravel's queue worker, as shown in the 2.0 quickstart.
Python Workers
Python workers use the durable-workflow SDK:
pip install durable-workflow==0.4.84
See the Python SDK guide for worker setup.
Custom Language Workers
Any language can implement a worker by:
- Registering with
POST /api/worker/register - Long-polling for tasks with
POST /api/worker/workflow-tasks/poll,POST /api/worker/activity-tasks/poll, orPOST /api/worker/query-tasks/poll - Completing tasks with
POST /api/worker/workflow-tasks/{id}/complete,POST /api/worker/activity-tasks/{id}/complete, orPOST /api/worker/query-tasks/{id}/complete
All requests require:
Authorization: Bearer $TOKENX-Namespace: your-namespaceX-Durable-Workflow-Protocol-Version: 1.0
The server validates that the namespace exists. Register it via
POST /api/namespaces before directing workers or clients at it, or the
server returns 404 with reason: "namespace_not_found".
See the server README for a curl-based walkthrough.
See Task Queue Admission to tune worker registration slots, server-side active lease caps, per-minute dispatch budgets, and query-task backpressure.
CLI
The Durable Workflow CLI provides a shell interface to the server:
# Install — Linux and macOS
curl -fsSL https://durable-workflow.com/install.sh | sh
# Install — macOS (Homebrew alternative)
brew install durable-workflow/tap/dw
# Install — Windows (PowerShell)
# irm https://durable-workflow.com/install.ps1 | iex
# Configure
export DURABLE_WORKFLOW_SERVER_URL=http://localhost:8080
export DURABLE_WORKFLOW_AUTH_TOKEN=your-token
export DURABLE_WORKFLOW_NAMESPACE=default
# Use
dw server:health
dw workflow:list
dw workflow:start --type=my-workflow --input='["value"]'
dw workflow:start --type=my-workflow --input-file=input.json
See the CLI install page for a platform-detecting installer and direct binary downloads.
Task queue commands include admission status for workflow tasks, activity tasks, and query tasks. Use them to distinguish missing workers, saturated worker slots, server-side active lease or dispatch-rate throttling, and query-task overflow.
Deployment
Use the self-hosting deployment guide to choose a supported topology before deploying production traffic. It separates local development, single-node production, small clustered deployments, raw Kubernetes manifests, and support-led topologies.
The self-serve small-cluster contract is deliberately narrow: 2-3 stateless API nodes behind a load balancer, one shared external MySQL or PostgreSQL database, shared Redis, independently scaled workers, and exactly one scheduler or maintenance runner. Choose stop-the-world upgrades or rolling upgrades per release; the rolling-upgrade contract names the version-skew, schema, drain, readiness, and rollback guarantees that must hold. SQLite clustering, Redis-less multi-node mode, duplicate schedulers, active/active multi-region, Helm, and provider-specific failover are outside that contract until separately validated. Active/passive multi-region with operator-driven regional failover is its own self-serve contract in the self-hosting guide; each region still runs the validated single-region or small-cluster shape.
For self-hosted server deployments, start from published images rather than source-tree builds:
- Docker Hub:
durableworkflow/server:0.2.261 - GitHub Container Registry:
ghcr.io/durable-workflow/server:0.2.261 - Published-image Compose:
docker-compose.published.yml - Raw Kubernetes manifests:
k8s/
Production deployments should pin a version tag or image digest, use
role-scoped credentials, run bootstrap/migrations before serving traffic, and
prove readiness with /api/ready, /api/cluster/info, and worker
registration. Do not shift production traffic based on /api/health alone.
API Reference
For a complete endpoint-by-endpoint reference, including required headers, roles, worker-protocol routes, external payload storage routes, and named error reasons, see the Server API Reference.
The server exposes three API surfaces:
Control Plane
Start, describe, signal, query, update, cancel, and terminate workflows; manage namespaces, task queues, schedules, search attributes, and workers. Every control-plane request requires X-Durable-Workflow-Control-Plane-Version: 2. Requests without it are rejected with missing_control_plane_version.
Key endpoints:
POST /api/workflows— Start a workflowGET /api/workflows/{id}— Describe a workflowPOST /api/workflows/{id}/signal/{name}— Send a signalPOST /api/workflows/{id}/query/{name}— Execute a queryPOST /api/workflows/{id}/update/{name}— Execute an updatePOST /api/workflows/{id}/cancel— Request cancellationPOST /api/workflows/{id}/terminate— Terminate immediatelyGET /api/workflows/{id}/runs/{runId}/history— List run history eventsGET /api/workflows/{id}/runs/{runId}/history/export— Export a replay bundleGET /api/namespaces,POST /api/namespaces,GET|PUT /api/namespaces/{namespace}— Namespace managementGET /api/workers,GET|DELETE /api/workers/{id}— Worker fleet managementGET /api/task-queues,GET /api/task-queues/{taskQueue}— Task queue backlog, pollers, leases, and admission visibilityGET|POST /api/schedules,GET|PUT|DELETE /api/schedules/{id},POST /api/schedules/{id}/{pause|resume|trigger|backfill}— Schedule managementGET|POST|DELETE /api/search-attributes— Search attribute managementGET|POST|PUT|DELETE /api/service-endpoints...— Admin-only service catalog endpoints, nested services, operation bindings, and durable service-call snapshotsPOST /api/system/repair/pass,POST /api/system/activity-timeouts/pass,POST /api/system/retention/pass— Operator passes
Workflow control-plane responses, including run-history listing responses,
include the nested control_plane contract metadata that identifies the
operation and response contract version. History export is intentionally not
wrapped in that envelope; it returns the replay bundle unchanged so the bundle
integrity checksum and optional signature cover the exact artifact received by
the client.
Validation failures return HTTP 422 with reason: validation_failed plus
errors and validation_errors. Workflow operation routes also project that
reason and validation detail into control_plane.reason and
control_plane.validation_errors. Current run-targeted command routes project
the URL run_id in the response and control_plane.run_id, so clients can
distinguish instance-level commands from explicit selected-run commands.
Task queue visibility is the operator surface for deciding whether a queue is
falling behind because durable backlog is growing, workers have no available
slots, or the server is enforcing admission limits. GET /api/task-queues
returns one summary entry per queue; GET /api/task-queues/{taskQueue}
expands one queue with pollers and current_leases. Both routes expose
stats.approximate_backlog_count, stats.approximate_backlog_age_seconds,
and the per-kind stats.workflow_tasks.* / stats.activity_tasks.* readiness
and lease counters. The detailed route also includes the admission object so
automation can separate worker-capacity pressure from server-side queue or
query-task throttling. Fleet-level durable inflow versus dispatch rates live on
the operator-metrics surfaces (operator_metrics.backlog.tasks_added_last_minute
and operator_metrics.backlog.tasks_dispatched_last_minute), not on the
per-queue task-queue routes.
Worker Protocol
Workers register, poll for tasks, heartbeat, and complete tasks. Requires X-Durable-Workflow-Protocol-Version: 1.0.
Key endpoints:
POST /api/worker/register— Register a workerPOST /api/worker/workflow-tasks/poll— Long-poll for workflow tasksPOST /api/worker/workflow-tasks/{id}/complete— Complete workflow taskPOST /api/worker/query-tasks/poll— Long-poll for server-routed workflow query tasksPOST /api/worker/query-tasks/{id}/complete— Complete workflow query taskPOST /api/worker/query-tasks/{id}/fail— Fail or reject workflow query taskPOST /api/worker/activity-tasks/poll— Long-poll for activity tasksPOST /api/worker/activity-tasks/{id}/complete— Complete activity task
See the Worker Protocol reference for details.
Discovery (unversioned)
The only endpoints that do not require X-Durable-Workflow-Control-Plane-Version are discovery and health probes:
GET /api/health— Liveness probe plus the publictopologysummary (no auth required)GET /api/ready— Readiness probe plus the sametopologysummary (no auth required)GET /api/cluster/info— Server capabilities, protocol versions, payload codecs. Clients should hit this first to discover which control-plane and worker-protocol versions the server supports.
Troubleshooting
Workers not receiving tasks
Check:
- Workers registered?
curl http://localhost:8080/api/workers -H "Authorization: Bearer $TOKEN" -H "X-Durable-Workflow-Control-Plane-Version: 2" -H "X-Namespace: default" - Workers polling correct task queue?
- Workflow started with matching task queue?
- Cache backend shared across server instances?
Long-poll connections timing out immediately
Check:
- Cache driver supports atomic locks? Test with
php artisan workflow:v2:doctor --strict - Redis reachable from server?
- Load balancer timeout set higher than long-poll timeout (default: 60s)?
Database connection errors
Check:
- Database host and port correct?
- Credentials valid?
- Database exists?
- Migrations run?
php artisan migrate:status
Auth failures
Check:
DW_AUTH_DRIVERmatches client auth method?- Token/HMAC secret matches between server and client?
- Auth headers present?
Authorization: Bearer $TOKENor HMAC signature headers?
Learn More
- Worker Protocol Reference — Full API contract for workers
- Embedded to Server Migration — Adopt the server from a Laravel embedded v2 app
- Python SDK — Build Python workers
- CLI — Command-line interface
- Server Repository — Source code, issues, releases