Skip to main content
Version: 2.0

How It Works

Durable Workflow uses Laravel's queued jobs and event sourced persistence to create durable coroutines. Workflows suspend through Fiber-backed helper calls for a durable replay contract.

Runtime

The runtime uses the same broad Laravel primitives, but the runtime model is more explicit:

  • WorkflowStub::make() reserves a durable workflow instance id
  • caller-supplied public instance ids are validated up front as non-empty URL-safe strings up to 191 characters, so blank, overlong, or unsupported-character ids fail before the runtime tries to reserve or reuse anything in storage
  • accepting a start creates a distinct run id, a durable start command record, and the first workflow task in one transaction
  • signalWithStart() and the matching webhook route link one durable start command plus one durable signal command under a shared intake-group marker, and the runtime records the accepted signal before the first workflow task can run user code
  • starts can attach a searchable business_key, exact-match string visibility_labels, and returned-only memo metadata; the runtime stores business_key and visibility_labels on the workflow instance, run, and run-summary projection, records them on typed start history, carries them into continueAsNew() runs, and exposes them through selected-run detail, history export, Waterline list filters, and Waterline saved operational views. memo is also recorded on the workflow instance, run, typed start history, selected-run detail, history export, and later continueAsNew() runs, but it intentionally stays out of run-summary filters and saved-view matching. That same run-summary visibility contract now also carries repair_blocked_reason, the durable boolean repair_attention, and the durable task_problem flag, so fleet filters and saved views can isolate badge-visible repair blockers such as unsupported_history or waiting_for_compatible_worker, keep repair_not_needed out of those views, and still isolate broader replay, missing-task, or workflow-task transport problems without opening every selected run first
  • each accepted or rejected run-scoped command also gets a durable command_sequence inside that run, so command history and signal application do not depend on created_at ties
  • older runs that predate command_sequence are backfilled into that same per-run order during migration and again on later command intake, so new signals, updates, and operator commands cannot leapfrog legacy rows that were recorded before the sequence column existed
  • durable command records now also capture ingress metadata such as command source, caller label, auth outcome, request route, request fingerprint, and the accepted payload itself; compound start-time intake such as signalWithStart() also stamps context.intake.mode = signal_with_start plus a shared context.intake.group_id onto the linked start and signal commands so history export and Waterline can correlate the pair without scraping raw request bodies, and workflow-originated child or continue-as-new starts now record their parent run and workflow step in command context, with child_call_id attached when that run belongs to a parent-issued child invocation
  • stable workflow and activity type keys can come from #[Type(...)] attributes or workflows.v2.types config registration; the service provider validates at boot that no class is registered under multiple type keys and that config keys agree with any #[Type] attribute on the mapped class, so duplicate or conflicting type identities fail fast instead of silently producing ambiguous durable records
  • stable external signal names are declared explicitly with repeatable #[Signal('...')] workflow attributes, so signal ingress can reject typos before they become durable accepted commands
  • accepted and rejected signals also mint one first-class durable workflow_signal_records lifecycle row linked back to the signal command; selected-run detail exposes those rows through signals[*] while commands[*].signal_id, commands[*].signal_status, and commands[*].signal_wait_id remain the command-list compatibility bridge. Final v2 writes those lifecycle rows on the command path.
  • those #[Signal(...)] declarations may also include an ordered parameter contract, and the runtime snapshots workflow_definition_fingerprint, declared_queries, declared_query_contracts, declared_signals, declared_signal_contracts, declared_updates, and declared_update_contracts onto typed WorkflowStarted history so later webhook, PHP, and Waterline intake can validate named or positional arguments, declared scalar or object type, and allows_null rules from durable run metadata instead of only from a live class; selected-run detail also exposes normalized declared_query_targets, declared_signal_targets, and declared_update_targets arrays so operator clients can keep every declared target visible while still attaching parameter metadata when a durable contract exists, and those normalized arrays stay present even when the contract source is unavailable
  • final v2 treats the WorkflowStarted command-contract snapshot as the only authoritative source for declared query, signal, update, and entrypoint metadata. Selected-run detail and history export report declared_contract_source = durable_history when that complete snapshot is present, and declared_contract_source = unavailable with empty normalized target arrays when it is missing or incomplete. The clean-slate engine does not reflect a live class to rebuild missing command contracts and does not expose command-contract normalization pressure in fleet metrics or health checks
  • the current workflow task replays the selected run and applies one bounded unit of work
  • named workflows are straight-line only and suspend through Fiber-backed helpers such as activity(), await(), timer(), sideEffect(), getVersion(), and all([...]) without writing yield in the workflow body; await('signal-name') is the workflow-code helper for one named signal value, closures such as fn () => activity(...) and fn () => child(...) feed barrier topology into all([...]), and async(...) callbacks use that same straight-line-only helper contract
  • query methods marked with #[QueryMethod] replay committed history for the current or selected run without applying pending signals implicitly, can declare a stable public target name through #[QueryMethod('public-name')], and now snapshot their ordered parameter contract for selected-run detail plus Waterline's read-only query operator; selected-run detail also exposes can_query and query_blocked_reason so operator clients can tell when durable query targets exist but the workflow definition is not currently replayable
  • getVersion() records one typed VersionMarkerRecorded history event per workflow step, and workflow or query replay reuses that committed version marker instead of branching from live code alone; when a run reaches a newly introduced branch point with no marker yet, replay now checks the start-time workflow_definition_fingerprint from WorkflowStarted before it falls back to the compatibility marker, so same-compatibility runs that started before the branch was deployed can stay on WorkflowStub::DEFAULT_VERSION without synthesizing a new marker or consuming a new workflow step
  • selected-run replay-safety diagnostics are fingerprint-scoped: when the current loadable workflow class no longer matches the run's snapped workflow_definition_fingerprint, Waterline and run detail report workflow_determinism_source = definition_drift instead of pretending that today's source scan is authoritative for that older run
  • update methods marked with #[UpdateMethod] replay committed history, record typed UpdateAccepted when the command is accepted, apply under the run lock on the workflow worker, append typed UpdateApplied and UpdateCompleted entries when they run, and append typed UpdateRejected history when a targeted run rejects the command before application; callers can wait for completion with attemptUpdate* or submit accepted-only work with submitUpdate* / webhook or Waterline wait_for = accepted, and both paths record the accepted lifecycle first and use the durable workflow task path for application instead of executing the update body directly inside WorkflowStub, the webhook handler, or Waterline's controller. Write-side webhook and Waterline update requests accept only wait_for = accepted or wait_for = completed (or omit the field to keep the completed default); lookup responses reserve wait_for = status for inspectUpdate() and the update-status endpoints. Completion-waiting callers wait only up to the configured workflows.v2.update_wait.completion_timeout_seconds budget or an explicit per-call override, then fall back to the still-open accepted lifecycle with wait_for, wait_timed_out, and wait_timeout_seconds response metadata instead of blocking indefinitely; the engine also mints one first-class durable workflow_updates row per update lifecycle, gives it its own update_id, and exposes that row through webhook responses, selected-run detail, Waterline's dedicated Updates table, and the selected-run wait surface instead of making operators infer everything back out of generic command rows alone. While that lifecycle stays accepted, selected-run summaries project wait_kind = update, open_wait_id = update:{update_id}, and resume_source_kind = workflow_update; if the backing workflow task disappears, repair now keeps pointing at the accepted update instead of falling back to the older underlying signal, child, or timer wait. When you declare #[UpdateMethod('public-name')], that durable alias becomes the canonical update target in command history, webhook routes, and Waterline instead of the PHP method name, and the engine also snapshots each declared parameter contract so named or positional update intake can reject rejected_invalid_arguments with durable validation_errors for missing arguments, unknown arguments, type mismatches, or nullability violations before the update body runs, even when the current worker can only recover that contract from WorkflowStarted history rather than from a loadable live class; when a selected run still durably declares the target but the workflow definition cannot be replayed, the update rejects as rejected_workflow_definition_unavailable with rejection_reason = workflow_definition_unavailable; Waterline drives selected-run Signal and Update operator forms from the normalized declared_signal_targets and declared_update_targets detail arrays, with the older declared_signals, declared_signal_contracts, declared_updates, and declared_update_contracts fields retained as compatibility metadata
  • await() projects typed condition waits and named signal waits. For condition waits, replay advances only from committed ConditionWaitSatisfied or ConditionWaitTimedOut history instead of re-evaluating the predicate speculatively during queries; await() accepts an optional stable condition key, persists it in typed condition and timeout-timer history, exposes it to Waterline as condition_key, and validates the recorded key during worker and query replay before treating the current condition wait as the same durable step. For named signals, await('name') waits for a durable signal payload and await('name', timeout: ...) returns null when the timeout wins. Adding a key to an already-recorded unkeyed condition wait is also treated as replay drift, because the old history did not durably name that predicate. Replay also validates that the same workflow sequence did not already record a different typed step shape before appending or consuming step history. The shape guard covers activity, child-workflow, pure timer, signal-wait, side-effect, version-marker, continue-as-new, and all([...]) leaf sequences, so a current build cannot schedule an activity over committed timer or child history at the same sequence. For all([...]), worker and query replay compare recorded parallel group topology against the current barrier, including arity and nested group path. Typed activity or child leaf history from an all([...]) step must carry that group metadata; older leaf events that lack it block replay as history_shape_mismatch instead of being guessed into the current barrier shape. Worker-side condition-key, predicate-fingerprint, history-shape, or parallel-topology mismatch blocks replay with liveness_state = workflow_replay_blocked and tasks[*].transport_state = replay_blocked instead of committing WorkflowFailed; after a compatible build is deployed, an operator can repair the run to retry the task.
  • timeout-backed condition waits keep the wait itself as wait_kind = condition while using a normal durable timer row and timer task as the timeout transport; when a durable update flips the predicate before the timer fires, the runtime now republishes an existing ready workflow task or creates one if none is open so the worker can re-evaluate the wait, then cancels the stale timeout timer
  • selected-run summaries and workflow_run_waits projection rows also rebuild timeout-backed condition-wait deadline_at, resume_source_kind, and resume_source_id from typed ConditionWait* plus TimerScheduled timeout transport history, so Waterline keeps the original blocked-on deadline and timeout identity even if the live workflow_timers row later drifts or disappears; an unrelated open workflow task row no longer replaces that typed condition wait as the selected wait. Once timeout TimerScheduled history exists, the worker also requires matching TimerFired history before applying ConditionWaitTimedOut, so a drifted mutable timer row cannot make the timeout win by itself. After the timeout transport records TimerFired, the run waits for a workflow task to apply ConditionWaitTimedOut, and repair recreates that workflow task from typed history if the timer row or resume task disappears first
  • pure timers stay blocked from typed TimerScheduled history until the run commits a matching TimerFired, and selected-run summaries, waits, tasks, timer lists, and history exports rebuild timer identity and deadline metadata from that typed timer history before falling back to mutable side tables for non-terminal older rows. On those timer rows, status is always the authoritative selected-run timer state, source_status is the status value reported by that authority, and row_status is only the current mutable workflow_timers.status diagnostic when a timer row still exists. That means typed history never yields to a drifted mutable timer row: if the durable history still says pending while the row later says fired, the selected-run timer stays status = pending and source_status = pending, and only row_status = fired changes. A fired or otherwise terminal workflow_timers row with no typed timer history is treated as replay drift instead of a timer result; Waterline-facing detail marks that fallback as status = unsupported, history_authority = unsupported_terminal_without_history, history_unsupported_reason = terminal_timer_row_without_typed_history, keeps the mutable terminal state only as row_status, and omits resume_source_kind / resume_source_id because that row is diagnostic-only rather than a durable resume path.
  • completed activity outcomes replay from typed ActivityCompleted and ActivityFailed history, while activity cancellation is recorded as typed ActivityCancelled history when cancel or terminate closes an in-flight execution or an activity worker observes that stop through the bridge. ActivityCancelled is a terminal activity fact for worker and query replay, so it wins over earlier open activity history for that workflow step instead of leaving replay parked on the stale ActivityScheduled or ActivityStarted event. When a step already has typed open activity history such as ActivityScheduled, ActivityStarted, ActivityHeartbeatRecorded, or ActivityRetryScheduled, workflow replay and query replay stay blocked until a matching terminal activity event is committed instead of accepting a drifted terminal activity_executions row. A completed, failed, or cancelled activity_executions row with no typed activity history is explicitly unsupported for replay and blocks as history_shape_mismatch with recorded events no typed history; selected-run activity and wait projections mark that fallback as status = unsupported, history_authority = unsupported_terminal_without_history, history_unsupported_reason = terminal_activity_row_without_typed_history, keep the mutable status only as row_status, and omit resume_source_kind / resume_source_id because that row is diagnostic-only rather than a durable resume path. Non-terminal older rows may still be used only to keep an open wait visible.
  • selected-run activity detail, compatibility logs, activity-backed chartData, open activity waits, task labels, and run-summary liveness rebuild from typed ActivityScheduled, ActivityStarted, ActivityHeartbeatRecorded, ActivityRetryScheduled, ActivityCompleted, ActivityFailed, and ActivityCancelled snapshots first, so Waterline keeps the original activity class, arguments, running-vs-completed-or-cancelled status, retry-policy snapshot, idempotency key, attempt count, latest attempt id, per-attempt task id, worker, heartbeat, lease, close timestamps, bounded heartbeat progress, and heartbeat or cancellation timeline points even if the mutable activity_executions, activity_attempts, or task rows later drift or disappear. ActivityHeartbeatRecorded accepts one normalized operator-facing progress object with optional message, current, total, unit, and flat details, and selected-run detail plus history export expose that same snapshot back as last_heartbeat_progress on the activity and attempt views. Grouped activity barrier identity also comes from typed history: final v2 records parallel_group_path on the typed activity events and does not infer missing grouped metadata from mutable activity rows. If grouped typed history lacks that metadata, replay, query, export, and Waterline projection report history_shape_mismatch rather than manufacturing a barrier identity. For event-backed activities, mutable execution results and activity close timestamps are not used unless the typed history has a terminal activity event; unsupported row-only terminal fallbacks also suppress the mutable result and close timestamp instead of presenting them as durable output, and run-summary liveness reports workflow_replay_blocked when that unsupported fallback is the selected run's only apparent progress source. The activity rows themselves mark older mutable fallback evidence with diagnostic_only = true, and the legacy logs[*] / chartData[*] compatibility arrays echo that same history_authority, history_unsupported_reason, and diagnostic_only metadata instead of flattening older-row evidence into an apparently authoritative activity result. When a pending activity loses both its mutable execution row and activity task before it starts, repair restores the execution row from ActivityScheduled history before recreating the durable activity task
  • selected-run timeline entries carry stable event identity fields such as entry_kind, source_kind, and source_id, and the entry's primary command, task, activity, timer, child, and failure state comes from the recorded event snapshot first instead of inheriting whatever those mutable side rows say later
  • recorded version markers appear in that same selected-run timeline as typed VersionMarkerRecorded points with the durable change_id, selected version, and supported range, so Waterline can explain which branch a long-lived run is following without inventing compatibility-era markers that were never durably committed
  • selected-run exception_count, compatibility exceptions[*], update lifecycle failure fields, timeline failure metadata, and history-export failures[*] also rebuild from typed ActivityFailed, parent-side ChildRunFailed, WorkflowFailed, failed UpdateCompleted, and FailureHandled history first, so Waterline and exported bundles keep failure ids plus durable exception type aliases, message, file, line, trace, declared custom-property detail, update or child source metadata, handled disposition, and stable multi-failure ordering even if mutable update, command, or failure rows later drift or disappear. When a selected run can only recover a failure from an older mutable failure row, the detail/export payload now marks that row as history_authority = failure_row_fallback with diagnostic_only = true instead of presenting it as indistinguishable typed failure history. Failed update detail is keyed by the durable update_id in typed history; command ids are preserved when available, but they are not required to show the update failure lifecycle.
  • when typed failure payloads are present, replay restores the original activity exception class and custom properties instead of flattening handled failures into a generic RuntimeException; if the payload carries type and that alias is registered under workflows.v2.types.exceptions, replay resolves the alias before falling back to the recorded PHP class. Imported v1 failures with no durable type can be bridged through workflows.v2.types.exception_class_aliases; new v2 failures should use stable aliases before throwable classes move. Operator views report the resolved class plus whether resolution came from exception_type, class_alias, recorded_class, unresolved, misconfigured, or unrestorable. Unresolved mappings, invalid configured aliases, and loadable classes that cannot be safely restored now block replay with UnresolvedWorkflowFailureException, exception_replay_blocked = true, and a replay_blocked workflow task instead of being delivered to broad workflow catch blocks as a generic runtime exception
  • if an earlier accepted signal is still waiting to be applied on the current run, later updates reject as rejected_pending_signal with rejection_reason = earlier_signal_pending instead of running the workflow task inline on the caller path; let the queued workflow task apply the signal first, then retry the update against the advanced state
  • when a signal payload violates the declared durable signal contract, the runtime now rejects it as rejected_invalid_arguments with rejection_reason = invalid_signal_arguments and durable validation_errors before the signal command is accepted, including missing arguments, unknown arguments, type mismatches, and nullability violations
  • unknown signal and update targets now reject through typed durable command outcomes instead of being accepted implicitly or failing as bare adapter errors
  • the runtime exposes straight-line workflow-code helpers such as activity(), await(), child(), all(), sideEffect(), timer(), and continueAsNew(), plus explicit external signal(), update(), repair(), cancel(), terminate(), and archive() commands; signal, update, repair, cancel, and terminate target the current instance run, while archive can target a closed selected run in the instance
  • sideEffect() records one typed SideEffectRecorded history event per workflow step, and replay or query paths reuse that committed value instead of re-running the closure
  • activity completion, activity failure, activity cancellation, handled failure continuation, child workflow scheduling and closure, side-effect recording, timer fire, signal receipt/application, update acceptance/application/completion or rejection, continue-as-new lineage, accepted repair commands, cancellation, termination, and archival append typed history before the run summary is updated
  • when multiple accepted signal commands are pending for the same run, the current slice applies them in durable command_sequence order
  • those repeated same-name signals now also keep one durable signal_wait_id end-to-end in typed history, even when a later wait opens only after the signal command was already accepted
  • signal waits and typed timeline entries now keep the accepted command's snapped sequence, target, payload preview, source, and transport-adjacent task metadata in the event payload itself, so Waterline can still explain repeated same-name signals and event-era task state even if the mutable command or task rows drift later
  • activity claim advances the durable execution attempt count, mints a fresh current-attempt id, opens one durable activity_attempts row for that try, and only lets that currently claimed attempt write completion or failure history, so late results from an expired lease cannot overwrite a newer reclaimed attempt
  • adapter-style activity workers can use Workflow\V2\ActivityTaskBridge as the first worker boundary: claim a ready durable activity task by task id, receive the codec-tagged argument payload plus activity type, heartbeat by activity_attempt_id, and complete or fail that attempt without loading the PHP activity class. The same bridge is exposed over authenticated HTTP/JSON through the webhook routes for activity-tasks/{taskId}/claim and activity-attempts/{attemptId} status, heartbeat, completion, and failure. Bridge claims and the default PHP activity job share the same backend or compatibility checks, lease creation, durable attempt row, and typed ActivityStarted history path, while the same recorder writes ActivityCompleted, ActivityFailed, ActivityRetryScheduled, or ActivityCancelled history and dispatches the next durable workflow or retry task after commit. Adapter workers can call heartbeatStatus($activityAttemptId) or the matching heartbeat webhook for a structured stop contract with can_continue, cancel_requested, reason, attempt/task/run status, and lease timestamps; when a cancel or terminate command has closed the run, that heartbeat response closes the attempt lease, records the cancellation observation if it is missing, and late completion or failure is ignored as stale. This is still a first bridge for known durable ids, not yet a long-poll discovery or hosted worker-service contract
  • retryable activity failures close the failed attempt, record ActivityRetryScheduled, and create the next durable activity task from the activity execution's snapped retry_policy while the workflow stays parked on the same activity execution until a later attempt succeeds or the snapped retry budget is exhausted; the ActivityScheduled history snapshot, Waterline activity detail, and history export carry that policy, and activityId() remains the execution-level idempotency key for external side effects
  • the upgrade path also normalizes older already-started activity executions that predate activity_attempts into one latest-known durable attempt row plus current_attempt_id, so heartbeat renewal, repair, and Waterline attempt detail keep working across mixed-era data even though earlier releases never stored every historical attempt durably
  • queue jobs carry task ids, while the durable task row remains the source of truth for whether work is ready, leased, or completed
  • ordinary queue workers also run a light recovery sweep on Looping, which records a database-backed compatibility heartbeat snapshot for that worker, optional compatibility namespace, and queue scope, re-dispatches overdue ready tasks, reclaims expired workflow, activity, and timer task leases, and recreates missing workflow, child-resolution workflow, accepted-update, accepted-signal, pending-activity, or timer tasks for runs already projected as repair_needed with no open task row, without duplicating in-flight running activities; when the pending activity's mutable execution row is also gone, recovery restores it from typed ActivityScheduled history before creating the replacement activity task. Selected-run task detail now exposes those lost transport expectations before repair as synthetic transport_state = missing rows with task_missing = true, carrying the activity, timer, condition-wait, child, update, signal, command, retry, expected_task_id, or generic selected-run workflow-task identity that typed history, wait state, or the no-resume-source invariant can still prove. Child-resolution, accepted-update, and accepted-signal workflow tasks carry workflow_wait_kind, open wait id, resume source, and the durable child, update, signal, or command id metadata both when they are first scheduled and when repair recreates them, so Waterline can tie command-application or child-result transport back to the durable source before and after transport loss. The redispatch threshold, loop throttle, scan limit, and repeated-failure backoff cap are configured under workflows.v2.task_repair and echoed in Waterline operator metrics as repair_policy. The first dispatch failure can be repaired immediately, while repeated dispatch or claim failures set durable repair_available_at backoff on the task and keep Waterline detail in transport_state = repair_backoff until the next repair window. Candidate selection is scope-fair across connection, queue, and compatibility, so one hot repair scope cannot consume every existing-task or missing-run slot in a worker-loop pass while other scopes have candidates. Task claim still respects compatibility markers and the backend capability matrix before leasing, but transport-level recovery no longer depends on the scanning worker being able to execute that task itself. During a rolling upgrade, the fleet view also falls back to the older cache heartbeat format until those workers have restarted onto the database-backed snapshot path.
  • selected-run detail exposes that fleet view as compatibility_namespace, compatibility_fleet_reason, and compatibility_fleet, with one in-scope worker snapshot carrying worker_id, namespace, host, process_id, connection, queue, supported, supports_required, recorded_at, expires_at, and source, so Waterline can show which active workers are actually advertising the selected marker instead of only a boolean summary, isolate fleets when several apps share one workflow database, and label legacy cache snapshots during mixed-fleet upgrades; when a compatibility namespace is configured, database heartbeat rows must match it, while older cache snapshots remain visible as rollout fallback with namespace = null until those workers restart onto the new path
  • the Waterline dashboard stats endpoint is served through OperatorObservabilityRepository::dashboardSummary(), so totals, recent-run counts, extrema from the run-summary projection, and deeper operator metrics share the same replaceable operator-observability boundary as selected-run detail and history export. The metrics come from durable run summaries, workflow tasks, activity executions, activity attempts, worker compatibility heartbeats, projection rows, history events, and backend capability diagnostics, including archived run counts, runnable task backlog, retrying activity counts, failed activity-attempt counts, delayed and leased task counts, unhealthy transport, claim, or lease counts, repair-needed runs, claim-failed runs, compatibility-blocked runs, selected-run wait projection drift, timeline projection drift, active worker counts, active queue-scope counts, active repair-policy thresholds, scope-fair selected repair candidates, per-queue repair-pressure scopes, the configured database/queue/cache capability snapshot, and how many active workers advertise the current required compatibility marker.
  • the Waterline dashboard stats endpoint also exposes history-budget metrics from durable run summaries, including how many selected runs currently recommend continueAsNew(), the maximum projected event count, the maximum projected history byte size, and the active thresholds
  • the Waterline dashboard stats endpoint also exposes projection health under operator_metrics.projections. run_summaries includes total durable runs, projected summaries, missing summaries, stale summaries whose durable run fields drifted, orphaned summaries, and whether a rebuild is needed. run_waits includes wait row count, projected-run count, canonical wait-run count, projected canonical wait-run count, missing wait-run count, stale projected wait-run count, summaries with current open waits, missing current open-wait rows, orphaned wait rows, and whether a rebuild is needed. run_timeline_entries includes history-event count, timeline row count, projected-run count, canonical history-run count, projected canonical history-run count, missing history-run count, stale projected history-run count, missing history-event rows, orphaned timeline rows, and whether a rebuild is needed. run_timer_entries includes timer row count, projected-run count, canonical timer-run count, projected canonical timer-run count, missing timer-run count, stale projected timer-run count, orphaned timer rows, and whether a rebuild is needed. run_lineage_entries includes lineage row count, projected-run count, canonical lineage-run count, projected canonical lineage-run count, missing lineage-run count, stale projected lineage-run count, orphaned lineage rows, and whether a rebuild is needed. Operators can refresh that bridge with php artisan workflow:v2:rebuild-projections --needs-rebuild --prune-stale, use --missing for only absent summary rows, and use --prune-stale to remove summaries whose run row no longer exists. --needs-rebuild uses the same canonical wait, timeline, timer, and lineage projector comparisons that selected-run detail and history export use, so stale selected-run payload drift is rebuilt even when rows still exist. The command rebuilds the selected run's summary, its workflow_run_waits rows, its workflow_run_timeline_entries rows, its workflow_run_timer_entries rows, and its workflow_run_lineage_entries rows in the same pass, and honors configured v2 run, run-summary, run-wait, run-timeline-entry, run-timer-entry, and run-lineage-entry model classes so it repairs the same projection surface Waterline reports
  • selected runs can be exported as a versioned replay/debug bundle through Workflow\V2\Support\HistoryExport, WorkflowStub::historyExport(), Waterline's selected-run history export endpoint, or php artisan workflow:v2:history-export. The bundle keeps the ordered typed history events, selected-run projection metadata under selected_run, selected-run waits and timeline snapshots, command records, signal lifecycles, update lifecycles, task rows, activities, activity attempts, timers, failures, lineage links, run metadata, archive metadata, compatibility marker, payload codec, and raw stored argument/output payloads in one JSON-friendly artifact. It also carries codec_schemas and a payload_manifest so offline consumers can enumerate every encoded payload path, codec, redaction state, Avro framing mode, and writer schema instead of inferring decode rules from section names. Exported activity, activity-attempt, timer, and lineage sections are rebuilt from typed history first, with mutable activity, attempt, timer, and workflow_links rows kept as fallback or enrichment for older data. Exported activity status, unsupported-history diagnostics, synthetic current-attempt visibility, and the diagnostic_only flag come from the same mixed-era activity view as selected-run detail, so row-only terminal or open-row fallback activity evidence stays aligned across both surfaces. Timer exports use the same authority contract as selected-run detail: status is the authoritative timer state, source_status is the status value from that authority, and row_status is only mutable-row diagnostics. Completed, failed, or cancelled activity rows without typed activity history export as unsupported diagnostics instead of durable results. Fired or cancelled timer rows without typed timer history also export as unsupported diagnostics with diagnostic_only = true, history_authority = unsupported_terminal_without_history, history_unsupported_reason = terminal_timer_row_without_typed_history, and the mutable terminal state preserved only as row_status. Lineage export follows that same selected-run snapshot boundary: links.parents[*] and links.children[*] echo the projected lineage payload, including history_authority and diagnostic_only, instead of doing an extra live workflow_links reread on the export path. A configured Workflow\V2\Contracts\HistoryExportRedactor can replace exported payload and diagnostic fields before the artifact leaves the app; the bundle reports redaction.applied, redaction.policy, and the concrete redaction.paths that were passed through that policy. Each exported artifact also carries integrity.canonicalization, a SHA-256 integrity.checksum, and, when workflows.v2.history_export.signing_key is configured, an HMAC-SHA256 integrity.signature plus optional integrity.key_id so warehouse or incident-review tooling can verify the exact redacted bundle it received. Terminal runs set history_complete = true; non-terminal runs can still be exported as point-in-time debugging snapshots but are not archive-complete.
  • the run summary projection carries the operator-facing next-resume view, including business_key, visibility_labels, liveness_state, open_wait_id, resume_source_kind, resume_source_id, next_task_id, next_task_type, next_task_status, sort_timestamp, an opaque sort_key for stable Waterline list ordering, history_event_count, history_size_bytes, continue_as_new_recommended, and is_terminal so list/detail consumers can distinguish closed runs without re-deriving that state from raw status strings; Waterline applies that same sort_timestamp plus run-id tie-breaker contract when it queries list pages, and now uses raw terminal status for dedicated failed, cancelled, and terminated list screens while still preserving status_bucket = failed as the compatibility bridge for the latter two states
  • Waterline list routes echo the active visibility contract under visibility_filters, including the contract version, selected bucket, exact-match field definition, merged applied filters after any saved-view resolution, and the resolved saved_view payload when ?view=... is in effect; the current exact-match filter set covers instance_id, run_id, workflow_type, business_key, compatibility, declared_entry_mode, declared_contract_source, connection, queue, status, status_bucket, closed_reason, wait_kind, liveness_state, repair_blocked_reason, repair_attention, task_problem, is_current_run, continue_as_new_recommended, archived, is_terminal, and exact label[key]=value / labels[key]=value matches. The current contract version is 5, and current builds accept saved views written against versions 1 through 5. The shared definition payload includes field labels, editor input types, bounded-field option catalogs, ordering, label-textarea metadata, and the repair-triage option catalog (description, tone, and badge_visible) so Waterline and other operator clients can render the current filter and repair contract instead of hard-coding one. Waterline also ships built-in system views such as system:running, system:running-task-problems, and system:running-repair-blocked; the repair-blocked view applies repair_attention = true so clients can reuse the durable/searchable badge contract instead of hard-coding reason codes
  • selected-run wait rows are persisted in workflow_run_waits by the run-summary projection pass and distinguish an open backing task from a merely historical task row, so Waterline can tell the difference between healthy resume backing and stale task metadata. Selected-run timeline rows are persisted in workflow_run_timeline_entries, selected-run timer rows in workflow_run_timer_entries, and selected-run lineage rows in workflow_run_lineage_entries, from that same rebuildable projection pass. Selected-run detail and history export both read those surfaces through the same selected-run snapshot contract, so export-level selected_run.waits_projection_source, selected_run.timeline_projection_source, selected_run.timers_projection_source, and selected_run.lineage_projection_source match the same rebuildable wait, timeline, timer, and lineage payloads that detail uses. When rows were already in sync, those sources report workflow_run_waits, workflow_run_timeline_entries, workflow_run_timer_entries, and workflow_run_lineage_entries; when detail or history export had to recreate missing or stale projection rows on read, they report the matching *_rebuilt source instead of falling back to ad hoc live reconstruction. Fleet metrics expose the same missing, stale, and orphaned projection drift under operator_metrics.projections.*, with wait, timeline, timer, and lineage rebuild selection driven by the same canonical projector comparisons used by selected-run detail and export. The rebuilt payloads continue to surface older compatibility bridges only as typed-history-backed diagnostics or enrichment rather than as the primary selected-run contract
  • selected-run detail and history export also report current_run_source, so instance-scoped lookups can say whether the current run came from typed continue-as-new lineage or from the durable run-order fallback when workflow_instances.current_run_id drifted
  • the webhook surface mirrors that command model with explicit start routes plus both instance-targeted and run-targeted command routes under /webhooks/instances/{workflowId} and /webhooks/instances/{workflowId}/runs/{runId}
  • continueAsNew() keeps the public instance id stable, closes the selected run with closed_reason = continued, creates the next run immediately, records an explicit lineage link instead of relying on relationship sentinels, and gives that new run its own accepted start command sourced from the prior run
  • when that handoff happens after workflow-class drift, the new run now stores the resolved class from the durable type map and snapshots its declared signal or update contract from that resolved definition instead of carrying the stale missing FQCN forward forever

When a worker can no longer load a stored PHP workflow or activity class name directly, the engine can fall back to the configured durable type map. That lets you keep the durable workflow_type or activity_type stable across class renames, as long as the new code still registers the old durable key. Starts from reserved instances and later continueAsNew() generations also normalize newly written runs onto that resolved workflow class so command-contract snapshots and future replay do not depend on the dead class name lingering in storage.

The current task types are:

  • workflow task
  • activity task
  • timer task

Child waiting is modeled through the durable parent/child link and child run state rather than through a separate child task type. When a child closes, the parent resumes through a normal workflow task whose payload names the child-resolution source.

Queries, updates, and child workflows are implemented. The child-workflow surface is intentionally narrow in the current release.

In the current child-workflow slice, calling child() creates a durable child instance and child run plus a durable parent/child lineage link. That child run also gets its own accepted start command with source = workflow, so selected-run command history can explain which parent run, workflow step, and stable child_call_id created it. The parent run waits with wait_kind = child while that child run is active. When the child closes, the runtime first records the parent's own typed child-resolution event, then creates a parent workflow task whose payload carries workflow_wait_kind = child, child_call_id, child_workflow_run_id, open_wait_id, and the child_workflow_run resume source. A completed child resumes the parent with the child output, while a failed child resumes the parent by throwing an exception derived from parent-side child failure history. Parent replay and query now require the parent's typed ChildRun* history as the durable child-outcome authority. Child terminal history and legacy mutable child rows remain available for lineage, diagnostics, and payload enrichment after the parent-side resolution event exists, but a terminal child row without parent typed child history is not enough to resume or query the parent. In particular, once the parent has durably entered a child wait through ChildWorkflowScheduled or ChildRunStarted, query replay keeps that step blocked until the parent commits its own ChildRunCompleted, ChildRunFailed, ChildRunCancelled, or ChildRunTerminated history instead of treating a drifted terminal child row as if the parent had already observed the outcome. If only the child terminal row or link survives and the parent typed child step history is missing, worker and query replay block with history_shape_mismatch and recorded events no typed history; selected-run child waits surface status = unsupported, history_authority = unsupported_terminal_without_history, and history_unsupported_reason = terminal_child_link_without_typed_parent_history, and run-summary liveness reports workflow_replay_blocked when that unsupported child fallback is the selected run's only apparent progress source. If the parent workflow task row is lost after that resolution event, selected-run detail stays anchored to the typed child-resolution history and repair recreates the task with the same child payload.

Explicitly unsupported older activity, timer, and child fallbacks remain visible in selected-run waits as diagnostics, but selected-run task detail does not synthesize missing transport rows for those unsupported waits.

The runtime also supports all([...]) fan-in barriers for activities, for child workflows, and for mixed activity-plus-child groups, including nested all([...]) groups inside the same workflow step. Build those barriers with closures such as fn () => activity(...) and fn () => child(...), or nested all([...]) groups. The parent schedules every durable leaf member, waits until the whole enclosing barrier tree can make progress, returns results in the original nested array shape once every member completes successfully, wakes immediately on the first failed activity or the first failed/cancelled/terminated child, and otherwise suppresses the parent wake-up task until the last successful member in every enclosing group closes. When several failed, cancelled, or terminated members are already closed before the parent replays, the thrown failure is selected by earliest recorded close time, with the lower barrier leaf index as the exact-timestamp tie break; the same rule is used for query replay. Waterline detail exposes those grouped waits with open_wait_count, innermost parallel_group_* metadata, and parallel_group_path when one open wait belongs to more than one barrier. Homogeneous activity barriers use parallel_group_kind = activity, homogeneous child barriers use parallel_group_kind = child, and mixed barriers use parallel_group_kind = mixed with parallel-calls:* group ids so multiple open waits stay visible as one coherent barrier instead of one activity or child pretending to be the only active wait. Replay compatibility comes from committed typed history: if all typed leaf events for a grouped activity or child are missing parallel_group_path, replay blocks as history_shape_mismatch instead of guessing from mutable side rows.

async($callback) is implemented as a package-owned child workflow with workflow type durable-workflow.async. That means async callbacks get real child run ids, parent-side child_call_id lineage, typed child start and close history, and Waterline child-wait visibility instead of a side channel. Async callbacks use the same straight-line-only helper contract as named workflows, and the public helper rejects generator-style yield callbacks. The callback is serialized with Laravel's serializable closure support, so named child(...) workflows remain the better contract for cross-service or long-lived public workflow types.

The runtime includes history-backed child handles through $this->child() and $this->children(), plus parent-side child signaling helpers on those handles. Separate launch handles for async(...) and higher-level bounded-concurrency helpers are still not part of the current surface.

Operator-command behavior is intentionally engine-level:

  • repair() targets the current selected run and restores durable progress when liveness_state = repair_needed
  • accepted repair commands currently return repair_dispatched when the runtime re-dispatches an overdue ready task, reclaims an expired lease, or recreates a missing workflow, child-resolution workflow, accepted-update, accepted-signal, pending-activity, timer, or condition-timeout workflow task; they return repair_not_needed when the run already has a healthy durable resume path, when the selected run is already inside an in-flight activity with authoritative typed activity history but no open task row, or when a caller forces repair on older diagnostic-only mutable activity, timer, or child rows that do not name a durable repair candidate
  • accepted repair commands also append typed RepairRequested history entries, which point at the repaired task when one was needed
  • healthy signal waits before receipt stay read-only from a repair perspective because wait_kind = signal already names the durable satisfier; after SignalReceived, a missing application workflow task is repairable and stays identified as a signal application wait with workflow_wait_kind = signal
  • healthy child waits stay read-only while the child is still open; after parent-side ChildRunCompleted, ChildRunFailed, ChildRunCancelled, or ChildRunTerminated history is committed, a missing parent resume workflow task is repairable and stays identified as a child-resolution wait with workflow_wait_kind = child
  • a running activity without an open activity task is surfaced as liveness_state = activity_running_without_task only when typed activity history is still authoritative for that in-flight execution; Waterline leaves the run observable but hides Repair so operators do not duplicate in-flight user code
  • older open activity, timer, and child waits that only survive as mutable rows or links without typed history remain visible as diagnostics with history_authority = mutable_open_fallback and diagnostic_only = true, but they no longer populate the selected-run durable wait_kind, open_wait_id, or resume_source_* contract; instead the run projects liveness_state = workflow_replay_blocked, hides Repair with repair_blocked_reason = unsupported_history, and treats those rows as observability-only evidence rather than a durable resume or repair source
  • selected-run detail exposes per-action operator availability fields for the implemented surface: can_query / query_blocked_reason, can_signal / signal_blocked_reason, can_update / update_blocked_reason, can_repair / repair_blocked_reason, the durable/searchable repair_attention bridge, repair_blocked, can_archive / archive_blocked_reason, and can_cancel / cancel_blocked_reason plus can_terminate / terminate_blocked_reason; repair_blocked is the stable metadata companion for the reason code and carries the operator-facing label, description, tone, and whether Waterline should badge it in list views. The older can_issue_terminal_commands flag remains the coarse compatibility bridge for terminal controls
  • cancel() closes the current run as cancelled
  • terminate() closes the current run as terminated
  • accepted terminal commands record durable command rows and typed history such as CancelRequested / WorkflowCancelled or TerminateRequested / WorkflowTerminated
  • archive() marks a closed selected run as archived while preserving its durable history, command audit trail, and history export; it records ArchiveRequested plus WorkflowArchived, accepts already archived runs as archive_not_needed, and rejects open runs as rejected_run_not_closed
  • rejected repair, cancel, terminate, and archive commands still record durable rejected command rows with outcomes such as rejected_not_started, rejected_not_current, rejected_not_active, or rejected_run_not_closed
  • open workflow tasks, pending activity executions, and pending timers are marked cancelled
  • open timer waits are superseded durably, and late timer jobs no-op instead of reopening the run
  • workflow-level cancellation does not magically hard-stop arbitrary user code already executing inside an activity process

Queues

Queued jobs are background processes that are scheduled to run at a later time. Laravel supports running queues via Amazon SQS, Redis, or even a relational database. Workflows and activities are both queued jobs but each behaves a little differently. A workflow will be dispatched mutliple times during normal operation. A workflow runs, dispatches one or more activities and then exits again until the activities are completed. An activity will only execute once during normal operation, as it will only be retried in the case of an error.

Event Sourcing

Event sourcing is a way to build up the current state from a sequence of saved events rather than saving the state directly. This has several benefits, such as providing a complete history of the execution events which can be used to resume a workflow if the server it is running on crashes.

Coroutines

Coroutines are functions that allow execution to be suspended and resumed by returning control to the calling function. Durable suspension points are expressed as straight-line Fiber-backed helper calls such as activity(), await(), timer(), and sideEffect().

User workflow code lives in handle(), which is an ordinary method that calls those suspension helpers directly. Older workflows and activities that still implement execute() continue to load through a compatibility path, but the runtime rejects mixed handle()/execute() inheritance so an entry method never silently changes across one class hierarchy. The runtime first checks whether the step already completed durably. If so, the cached result is replayed from history instead of running the step a second time. Otherwise, the runtime queues the next activity, timer, or child work and suspends until that durable step completes or fails.

Activities

By calling multiple activities, a workflow can orchestrate the results between each of the activities. The execution of the workflow and the durable steps it schedules are interleaved: the workflow reaches an activity call, suspends until that activity completes, and then continues execution from where it left off.

If a workflow fails, the events leading up to the failure are replayed to rebuild the current state. This allows the workflow to pick up where it left off, with the same inputs and outputs as before, ensuring determinism.

Promises

Promises are used to represent the result of an asynchronous operation, such as an activity. The helper call itself suspends through the Fiber-backed runtime while keeping deterministic wait behavior.

Example

Straight-line Fiber-backed helpers are the authoring model.

use Workflow\V2\Workflow;
use function Workflow\V2\{activity, all};

class MyWorkflow extends Workflow
{
public function handle(): array
{
return [
activity(TestActivity::class),
activity(TestOtherActivity::class),
fn () => all([
fn () => activity(TestParallelActivity::class),
fn () => activity(TestParallelOtherActivity::class),
]),
];
}
}

Sequence Diagram

This sequence diagram shows how a workflow progresses through a series of activities, both serial and parallel.

Workflow Sequence Diagram
Workflow Sequence Diagram
  1. The workflow starts by getting dispatched as a queued job.
  2. The first activity, TestActivity, is then dispatched as a queued job. The workflow job then exits. Once TestActivity has completed, it saves the result to the database and returns control to the workflow by dispatching it again.
  3. At this point, the workflow enters the event sourcing replay loop. This is where it goes back to the database and looks at the event stream to rebuild the current state. This is necessary because the workflow is not a long running process. The workflow exits while any activities are running and then is dispatched again after completion.
  4. Once the event stream has been replayed, the workflow continues to the next activity, TestOtherActivity, and starts it by dispatching it as a queued job. Again, once TestOtherActivity has completed, it saves the result to the database and returns control to the workflow by dispatching it as a queued job.
  5. The workflow then enters the event sourcing replay loop again, rebuilding the current state from the event stream.
  6. Next, the workflow starts two parallel activities, TestParallelActivity and TestOtherParallelActivity. Both activities are dispatched. Once they have completed, they save the results to the database and return control to the workflow.
  7. Finally, the workflow enters the event sourcing replay loop one last time to rebuild the current state from the event stream. This completes the execution of the workflow.

Summary

The sequence diagram illustrates the workflow starting with the TestActivity and then the TestOtherActivity being executed in series. After both activities complete, the workflow replayed the events in order to rebuild the current state. This process is necessary in order to ensure that the workflow can be resumed after a crash or other interruption.

The need for determinism comes into play when the events are replayed. In order for the workflow to rebuild the correct state, the code for each activity must produce the same result when run multiple times with the same inputs. This means that activities should avoid using things like random numbers (unless using a side effect) or dates, as these will produce different results each time they are run.

The need for idempotency comes into play when an API fails to return a response even though it has actually completed successfully. For example, if an activity charges a customer and is not idempotent, rerunning it after a a failed response could result in the customer being charged twice. To avoid this, activities should be designed to be idempotent.