Monitoring
Waterline
Waterline is a separate UI that works nicely alongside Horizon. Think of Waterline as being to workflows what Horizon is to queues.
The Waterline bridge supports both the legacy workflow tables and the v2 run-summary bridge.
Use the default auto-detection mode when you want Waterline to switch onto the v2 bridge as soon as the workflow package's full v2 operator surface is installed:
WATERLINE_ENGINE_SOURCE=auto
Pin the legacy engine during a mixed-fleet rollout or while you intentionally keep Waterline on the v1 tables:
WATERLINE_ENGINE_SOURCE=v1
Pin the v2 bridge when you want Waterline to require the v2 operator surface and fail clearly if the required tables or configured models are missing or unreadable:
WATERLINE_ENGINE_SOURCE=v2
/waterline/api/stats now includes an engine_source object that reports the configured mode, the resolved mode, whether Waterline is actively using v2, any surfaced readiness issues, and the required-table inspection results. It also carries readiness_contract.version = 1, the frozen v2 readiness matrix for boot/install, dispatch, claim, stats, and health. That matrix names the code authority for each surface: WaterlineEngineSource::status decides whether the v2 operator tables are installed, BackendCapabilities::snapshot decides whether database/queue/cache/codec dispatch is supported, TaskBackendCapabilities::recordClaimFailureIfUnsupported records per-task claim failures, OperatorMetrics::snapshot owns v2 stats, and HealthCheck::snapshot owns the deeper v2 health checks. When engine_source=v2 is pinned but the v2 operator surface is incomplete, Waterline list/detail/export/saved-view/stats routes return HTTP 503 with that same engine_source payload instead of silently falling back to v1. Instance-scoped /waterline/api/instances/... routes remain v2-only; when Waterline is pinned to v1 or auto falls back to v1, those routes return 404 because the legacy bridge does not expose the public instance-id contract.
The final engine_source mode behavior is:
autoresolves tov2and setsuses_v2 = trueonly when every configured v2 operator model resolves to an available table. If that surface is incomplete,autoresolves tov1, keeps legacy stats available with readiness diagnostics, and makes/waterline/api/v2/healthreturn503.v1is an explicit legacy pin. It resolves tov1, setsuses_v2 = false, leaves legacy stats available with readiness diagnostics, makes/waterline/api/v2/healthreturn503, and leaves instance-scoped v2 routes unavailable.v2is a strict pin. It resolves tov2; when the operator surface is complete it enables v2 stats, health, and instance routes, and when the surface is incomplete it setsuses_v2 = falseand returns503for v2 Waterline surfaces with the readiness payload.
Waterline reads v2 selected-run detail, list-item projections, history-export payloads, dashboard stats, and operator metrics through Workflow\V2\Contracts\OperatorObservabilityRepository. The workflow package binds a default implementation that returns the built-in v2 operator contract, and applications can replace that binding when they need to front Waterline with an app-owned repository, tenancy scope, authorization policy, or cached projection layer.
All v2 operator detail payloads return typed JSON values for workflow arguments, output, activity arguments, activity results, command payloads, signal arguments, update arguments, update results, query results, and exception payloads. The browser does not need to unserialize engine-internal encoding to render durable workflow truth — every value in the JSON response is already structured data that JSON clients can use directly.
List-item contract
With v2 enabled, the Waterline list routes (/waterline/api/flows/{bucket}) project each paginator row through the typed list-item contract (RunListItemView) instead of returning raw summary-model arrays. The contract defines exactly which fields a fleet-list consumer can rely on:
idworkflow_instance_idinstance_idselected_run_idrun_idrun_numberis_current_runengine_sourceclassworkflow_typenamespacebusiness_keycompatibilitystatusstatus_bucketis_terminalclosed_reasonstarted_atclosed_atcreated_atupdated_atsort_timestampsort_keyduration_msarchived_atarchive_reasonwait_kindwait_reasonliveness_statevisibility_labelssearch_attributesrepair_attentionrepair_blocked_reasonrepair_blockedtask_problemtask_problem_badgedeclared_entry_modedeclared_contract_sourceexception_counthistory_event_counthistory_size_bytescontinue_as_new_recommendedconnectionqueue
repair_blocked and task_problem_badge are computed badge metadata (code, label, description, tone, badge_visible) generated from the stored reason and state columns — the list contract applies the same badge logic that selected-run detail uses.
Fields that appear in selected-run detail but not in the list-item contract — such as open_wait_id, resume_source_kind, resume_source_id, next_task_id, projection_schema_version, run_navigation, command/signal/update/wait/task/timeline scopes, and per-run collections — are intentionally excluded from the fleet-list surface. Consumers that need those fields should fetch the selected-run detail route.
Selected-run detail fields
The selected-run detail route returns a richer payload that includes everything in the list-item contract plus:
current_run_idcurrent_run_sourcecurrent_run_statuscurrent_run_status_bucketdeclared_entry_methoddeclared_entry_declaring_classcompatibility_supportedcompatibility_reasonopen_wait_idopen_wait_countresume_source_kindresume_source_idnext_task_atnext_task_idnext_task_typenext_task_statusnext_task_lease_expires_atliveness_reasonexception_countexceptions_countcan_issue_terminal_commandscan_archivearchive_blocked_reasoncan_repairread_only_reasonrun_navigationactivities_scopecommands_scopesignals_scopeupdates_scopewaits_scopetasks_scopetimeline_scopelineage_scopewaitstaskssignalsupdatesparentscontinuedWorkflowstimelinetimeline[*].entry_kindtimeline[*].source_kindtimeline[*].source_idtimeline[*].workflow_sequencetimeline[*].signal_idtimeline[*].signal_wait_idtimeline[*].child_call_idtimeline[*].version_change_idtimeline[*].versiontimeline[*].version_min_supportedtimeline[*].version_max_supported- command-level source metadata such as
commands[*].source,commands[*].caller_label,commands[*].auth_status,commands[*].request_path,commands[*].request_fingerprint, the accepted payload preview undercommands[*].payload, its codec undercommands[*].payload_codec, the caller-requested run undercommands[*].requested_run_id, the engine-resolved run undercommands[*].resolved_run_id, workflow-originated context undercommands[*].context.workflow, and compound-intake linkage undercommands[*].context.intakewithmodeplusgroup_id; request-ingress metadata stays flattened onto the command row rather than leaking the full raw request blob throughcommands[*].context - signal lifecycle detail such as
signals[*].id,signals[*].command_id,signals[*].command_sequence,signals[*].workflow_sequence,signals[*].name,signals[*].signal_wait_id,signals[*].status,signals[*].outcome,signals[*].rejection_reason,signals[*].validation_errors,signals[*].received_at,signals[*].closed_at, and the compatibility bridge fieldscommands[*].signal_id,commands[*].signal_status, andcommands[*].signal_wait_id - update lifecycle detail such as
updates[*].id,updates[*].command_id,updates[*].command_sequence,updates[*].workflow_sequence,updates[*].name,updates[*].status,updates[*].outcome,updates[*].rejection_reason,updates[*].failure_id,updates[*].failure_message,updates[*].exception_type,updates[*].exception_class,updates[*].exception_resolved_class,updates[*].exception_resolution_source,updates[*].exception_resolution_error,updates[*].exception_replay_blocked,updates[*].accepted_at,updates[*].closed_at, and the compatibility bridge fieldscommands[*].update_id,commands[*].update_status,commands[*].failure_id, pluscommands[*].failure_message; accepted-only submitted updates keepupdates[*].status = accepted,updates[*].outcome = null, andupdates[*].workflow_sequence = nulluntil the workflow worker applies or fails the update, and failed update rows prefer typedUpdateCompletedfailure history keyed by durableupdate_idover mutable update, command, or failure rows. When old or repaired data no longer has command provenance, Waterline can still show theupdates[*]lifecycle row, withupdates[*].command_id = nullif no command id is recoverable. - final v2 writes signal and update lifecycle rows on the command path, and Waterline reads those durable rows for selected-run detail, export, and operator actions
- task-level compatibility metadata such as
tasks[*].compatibility,tasks[*].compatibility_supported, andtasks[*].compatibility_reason, with older preview tasks inheriting the selected run's marker when their own task row predates task-level compatibility storage - task-level transport metadata such as
tasks[*].transport_state,tasks[*].task_missing,tasks[*].synthetic,tasks[*].expected_task_id,tasks[*].dispatch_failed,tasks[*].dispatch_overdue,tasks[*].claim_failed,tasks[*].last_dispatch_attempt_at,tasks[*].last_dispatched_at,tasks[*].last_dispatch_error,tasks[*].last_claim_failed_at, andtasks[*].last_claim_error - child-resolution workflow task metadata such as
tasks[*].workflow_wait_kind = child,tasks[*].workflow_open_wait_id,tasks[*].workflow_resume_source_kind,tasks[*].workflow_resume_source_id,tasks[*].workflow_sequence,tasks[*].child_call_id, andtasks[*].child_workflow_run_id
Activity detail rows expose activity_type, idempotency_key, attempt_count, attempt_id, retry_policy, started_at, last_heartbeat_at, closed_at, and an attempts list with one row per durable activity try. That attempt list rebuilds from typed ActivityStarted, ActivityHeartbeatRecorded, ActivityRetryScheduled, ActivityCompleted, ActivityFailed, and ActivityCancelled history first, so task ids, worker lease owners, heartbeat timestamps, lease expiry, cancellation, and close status remain visible if mutable activity_attempts or task rows drift or disappear. The retry policy is snapped when the activity is scheduled, so Waterline can explain the retry budget that applied to the selected execution even if the PHP activity class later changes. Timeline entries include related activity, timer, child, command, task, failure, and version-marker metadata when available. The entry identity fields (entry_kind, source_kind, and source_id) name the durable source row for that history point, while the entry's primary command, task, activity, timer, child, and failure state is assembled from the recorded event payload first so earlier ActivityScheduled, ActivityStarted, ActivityHeartbeatRecorded, ActivityCancelled, TimerScheduled, UpdateAccepted, UpdateApplied, ActivityFailed, WorkflowFailed, signal-application points, and VersionMarkerRecorded entries do not silently inherit later mutable row state.
Calling Workflow\V2\Activity::heartbeat() writes that last_heartbeat_at value onto the currently claimed durable activity-attempt row, mirrors it onto the live activity execution, renews the leased activity task plus the selected run's next_task_lease_expires_at, and appends an ActivityHeartbeatRecorded history point. Waterline uses that history point as the display authority for historical attempt heartbeat detail. Late heartbeats from a reclaimed older attempt are ignored before they can mutate the newer current attempt.
The upgrade path also backfills older already-started activity executions that predate durable activity_attempts into one latest-known attempt row plus current_attempt_id. That keeps Waterline's top-level attempt_id, attempt_count, last_heartbeat_at, and attempts[*] fields stable across mixed-era preview data, while being explicit that earlier previews did not durably record every older closed attempt separately.
The v2 bridge reads run summaries, typed workflow history, timer waits, failure projections, command history, and activity executions only as a compatibility fallback when older preview runs predate the richer activity payloads. The implemented slices currently cover:
- start with distinct instance and run ids
- durable start command ids with
started_new,returned_existing_active, andrejected_duplicateoutcomes - activity scheduling, completion, failure, cancellation, and handled failure continuation
- activity heartbeats that persist
last_heartbeat_atfor the current attempt row, renew the leased activity task, and append typedActivityHeartbeatRecordedhistory - timer scheduling, firing, and timer-backed wait visibility in Waterline
- child workflow scheduling, durable parent/child linkage, parent waits while a child run is active, parent-side typed child-resolution history, parent resume on child completion or failure, child-resolution workflow-task repair, and child wait/timeline visibility in Waterline
- named signal waits, accepted signal commands, and signal-applied history visibility in Waterline
- accepted-only submitted updates with worker-applied lifecycle visibility in Waterline
- typed side-effect history visibility in Waterline timelines
- typed version-marker history visibility in Waterline timelines
- continue-as-new lineage with stable instance ids and distinct run ids
- liveness-driven
repair()commands that recreate the missing durable workflow, child-resolution workflow, accepted-update, accepted-signal, activity, or timer task for runs markedrepair_needed - explicit cancel and terminate commands, including
cancelledandterminatedterminal run states - explicit archive commands for closed selected runs, including archived metadata, typed archive history, and
archive_not_neededhandling for already archived runs - durable next-task and liveness projection data for open activity, timer, workflow-task, and signal-wait states
- typed event-history visibility sourced from durable
workflow_history_events - run-summary history budget fields for
history_event_count,history_size_bytes, andcontinue_as_new_recommended - history-first timeline snapshots for command, task, activity, timer, child, and failure detail, including activity started, heartbeat, and closed timestamps plus timer delay, deadline, fired, and cancelled timestamps, with live row enrichment retained only for remaining compatibility fields
- operator metrics derived from durable run summaries, workflow tasks, activity executions, activity attempts, start commands, and worker compatibility heartbeats for archive counts, backlog, activity retry pressure, start latency, repair, compatibility-blocked, history-budget, and active-worker dashboard signals
- versioned selected-run history exports for replay debugging and offline inspection
In the v2 bridge:
- list routes are still run-centric, with each row keyed by run id while also exposing
instance_id - list routes now sort by the durable run-summary contract (
sort_timestampdescending, then run id) instead of inferring recency from raw ids or Waterline's legacyworkflow_sort_columnsetting - list rows expose both
sort_timestampand an opaquesort_key; that key encodes the samesort_timestampplus run-id tie-breaker contract that Waterline applies server-side, so polling and page-1 refresh can detect newer rows without assuming numeric or lexicographically ordered ids - list and detail payloads expose the stable
workflow_typealongside the stored class name - detail routes expose run metadata,
closed_reason, archive metadata (archived_at,archive_command_id, andarchive_reason), selected-runwaitsandtaskscollections, activity and timer compatibility logs, dedicatedactivities,signals, andupdatescollections,exceptionscompatibility rows, chart data, command history,repair_blocked_reason,repair_attention,repair_blocked,task_problem,task_problem_badge,can_issue_terminal_commands,can_cancel,cancel_blocked_reason,can_terminate,terminate_blocked_reason,can_archive,archive_blocked_reason,can_query,query_blocked_reason,can_signal,signal_blocked_reason,can_update,update_blocked_reason,can_repair,read_only_reason, and the workflow-definition drift fieldsworkflow_definition_fingerprint,workflow_definition_current_fingerprint, andworkflow_definition_matches_current - list and detail payloads also expose the durable
repair_attentionflag plusrepair_blockedmetadata, which turns badge-visible repair blockers such asunsupported_historyandwaiting_for_compatible_workerinto a searchable bridge without forcing Waterline or saved views to hard-code specific reason codes - list and detail payloads also expose the durable
task_problemflag plustask_problem_badge, which summarizes replay-blocked workflow tasks, missing workflow-task resume transport, and repeated workflow-task dispatch or claim trouble into a searchable operator-facing badge without promoting older diagnostic-only waits into a repairable resume source - selected-run
commands,signals, andupdatesnow also expose task-linkage fieldscurrent_task_id,current_task_status,task_transport_state,task_ids, andtask_missing, so Waterline can show both the currently open backing workflow task and any historically proven durable task ids for the same accepted command lifecycle - detail routes expose
history_event_count,history_size_bytes,history_event_threshold,history_size_bytes_threshold, andcontinue_as_new_recommended, so Waterline can warn about runs approaching the configured continue-as-new budget without replaying the workflow or scanning history in the browser - selected-run
activities, compatibilitylogs, activity-backedchartData, open activity waits, task labels, and run-summary activity liveness now prefer typedActivityScheduled,ActivityStarted,ActivityHeartbeatRecorded,ActivityRetryScheduled,ActivityCompleted,ActivityFailed, andActivityCancelledsnapshots recorded in history, with liveactivity_executionsandactivity_attemptsrows kept only as fallback or enrichment for older preview runs; that means Waterline keeps the latest durable attempt count, attempt id, execution-level idempotency key, snapped retry policy, per-attempt task id, worker lease owner, lease expiry, heartbeat state, bounded heartbeat progress, retry-scheduled state, cancellation state, and heartbeat or cancellation timeline points aligned with the currently claimed activity try instead of inheriting stale attempt data from drifted execution, attempt, or task rows.ActivityHeartbeatRecordedprogress is intentionally compact: selected-run detail and history export expose it aslast_heartbeat_progresson the activity plus the latest attempt, while the timeline heartbeat entry keeps the same normalized payload for incident breadcrumbs. History export now reuses that same mixed-era activity view for status, unsupported-history diagnostics, synthetic current-attempt visibility, anddiagnostic_only, so selected-run detail and exported bundles no longer disagree about row-only terminal or open-row fallback activity evidence. For grouped activities,parallel_group_pathis replay-authoritative only when typed activity history carries it; mutable activity rows are diagnostic only and are not used to infer missing barrier identity. Any terminal mutable activity row with no typed activity history is surfaced as unsupported instead of completed, failed, or cancelled: activity and wait rows exposehistory_authority = unsupported_terminal_without_history,history_unsupported_reason = terminal_activity_row_without_typed_history, and the mutable status asrow_status, while mutable result and close timestamp stay hidden from the durable detail contract. The legacylogs[*]andchartData[*]compatibility arrays now echo that samehistory_authority,history_unsupported_reason, anddiagnostic_onlymetadata instead of flattening older-row evidence into an apparently normal activity result. - detail routes now also expose
open_wait_idplusresume_source_kind/resume_source_id, so Waterline can name the exact current selected-run wait row or workflow task that the run is blocked on without inferring it from freeformwait_reasontext - condition wait rows now include
condition_definition_fingerprintwhen the predicate source was available at record time, and replay-blocked workflow tasks include both recorded and current predicate fingerprints when a same-key predicate drift blocks replay - detail routes now also expose
open_wait_count, so Waterline can render multi-wait barriers honestly when more than one selected-run wait is open and no singleopen_wait_idshould be treated as the whole story - detail routes also expose
declared_signals,declared_signal_contracts,declared_updates,declared_update_contracts, anddeclared_contract_source, so Waterline can show the selected run's declared command contract alongside current mutability flags and explain whether it came from durable start history or is unavailable. Final v2 reads that contract only from a completeWorkflowStartedsnapshot; incomplete preview-era snapshots no longer auto-backfill from live PHP definitions and instead reportdeclared_contract_source = unavailablewith empty normalized target arrays - detail routes also expose the selected run's durably snapped
workflow_definition_fingerprint, the currently loadable class fingerprint underworkflow_definition_current_fingerprint, andworkflow_definition_matches_current, so Waterline can explain when a long-lived run started on a different definition than the one the current build can load before a newVersionMarkerRecordedevent exists - detail routes also expose replay-safety diagnostics as
workflow_determinism_status,workflow_determinism_source, andworkflow_determinism_findings; when the selected run's snappedworkflow_definition_fingerprintstill matches the current loadable class, Waterline renders live-definition findings for obvious workflow-code calls to live database, cache, request/auth context, HTTP, wall-clock, or random sources, and when the fingerprint has drifted it instead returnsworkflow_determinism_source = definition_driftwith one warning explaining that current-source findings are no longer authoritative for that run - detail routes also expose the selected run's
compatibilitymarker plus read-time localcompatibility_supported/compatibility_reasonfields, the configuredcompatibility_namespace, and fleetcompatibility_supported_in_fleet/compatibility_fleet_reasonfields so mixed-fleet operators can distinguish "this build cannot claim it" from "no active worker heartbeat currently advertises that marker" - selected-run detail now also exposes
compatibility_fleet, a durable-first list of the active worker heartbeat snapshots in scope. Current workers contribute database-backed snapshots; mixed-fleet reads can also surface the older cache heartbeat format until those workers restart onto the new path. Each row carriesworker_id,namespace,host,process_id,connection,queue,supported,supports_required,recorded_at,expires_at, andsource, which is what the current Waterline detail UI uses to show who is actually advertising the marker and whether that row came from the durable table or the legacy cache bridge. Whencompatibility_namespaceis non-null, database-backed rows must match that namespace; older cache snapshots still remain visible as rollout fallback, but they surface withnamespace = nulluntil the older workers restart, so full namespace isolation is only strict once the mixed fleet has moved onto the durable heartbeat path. - detail routes also expose
parentsandcontinuedWorkflowslineage arrays pluslineage_projection_sourcefor parent/child and continue-as-new navigation without relying on legacy relationship pivots - child-workflow lineage entries in those arrays also expose
child_call_id, so one parent-issued child invocation stays identifiable even if the child later continues as new - those lineage arrays now prefer typed child and continue-as-new history, using
workflow_linksonly as a compatibility fallback for older preview rows or missing history - selected-run detail reports
lineage_projection_source = workflow_run_lineage_entrieswhen Waterline is reading an already-synced lineage projection row set, andworkflow_run_lineage_entries_rebuiltwhen the detail read had to recreate missing or stale lineage rows on the fly - child-workflow lineage entries are deduped by stable
child_call_id, so one child invocation stays one logical Waterline row even when that child usescontinueAsNew()and spans several child runs continuedWorkflowsis the child-side lineage array in the current payload shape, so parent runs surface child workflow links there and child runs surface their inverse links inparents- detail routes also expose a
timelinecollection with ordered typed history entries, including eventtype,kind,entry_kind,source_kind,source_id,summary,recorded_at, the workflow stepworkflow_sequence, and related command, activity, timer, child, task, failure, and version metadata when available; side-effect snapshots appear there as typedSideEffectRecordedpoints, and versioning branch points appear there as typedVersionMarkerRecordedpoints withversion_change_id,version,version_min_supported, andversion_max_supportedwhen that marker was durably committed for the selected run - Waterline also exposes the selected run as a versioned history-export bundle at
GET /waterline/api/instances/{instanceId}/runs/{runId}/history-export; the legacy run-key bridge remains available atGET /waterline/api/flows/{runId}/history-export. The detail screen links to that selected-run export as "Export History", and the app can generate the same artifact from the CLI withphp artisan workflow:v2:history-export {instanceId} --run-id={runId} --output=storage/app/workflow-history/order-123.json --prettyorphp artisan workflow:v2:history-export {runId} --run --pretty. The response usesschema = durable-workflow.v2.history-exportandschema_version = 1, includes a per-rundedupe_key, and carries orderedhistory_events, export-level selected-run projection metadata underselected_run, selected-run scopedwaitsandtimeline,commands,signals,updates,tasks,activities[*].idempotency_key,activities[*].retry_policy,activities[*].attempts,activities[*].history_authority,activities[*].history_event_types,activities[*].history_unsupported_reason,activities[*].row_status,timers[*].history_authority,timers[*].diagnostic_only,timers[*].history_event_types,timers[*].history_unsupported_reason,timers[*].row_status,failures, lineagelinks, and archive metadata when the run has been marked archived. The lineagelinksblock also exposesprojection_source, andselected_runexposeswaits_projection_source,timeline_projection_source,timers_projection_source, andlineage_projection_source, so offline replay or debug tooling can tell whether the export came from already-synced selected-run projection rows or from a same-contract on-read rebuild. Activity, activity-attempt, and timer sections use the same history-first snapshots as selected-run detail, so exports retain durable activity identity, retry policy, latest attempt id, per-attempt task id and worker metadata, raw stored activity payloads, timer ids, deadlines, fired timestamps, and cancelled timestamps even if the mutable activity, attempt, task, or timer rows later drift or disappear. Timer snapshots follow one explicit field contract across detail and export:statusis the authoritative selected-run timer state,source_statusis the status value reported by that authority, androw_statusis only the current mutableworkflow_timers.statusdiagnostic when a row still exists. A completed, failed, or cancelled activity row without typed activity history exports asstatus = unsupportedwithhistory_authority = unsupported_terminal_without_history,history_unsupported_reason = terminal_activity_row_without_typed_history, and no mutable result or close timestamp. A fired or cancelled timer row without typed timer history exports asstatus = unsupported,diagnostic_only = true,history_authority = unsupported_terminal_without_history,history_unsupported_reason = terminal_timer_row_without_typed_history, no mutable fired timestamp, and the mutable terminal state only asrow_status. It also includespayloads.codecplus the stored argument and output payloads so offline replay/debug tools can decode them with the same codec boundary the run used.codec_schemas.avrodocuments the base64 Avro framing rules for0x00generic-wrapper payloads and0x01typed-schema payloads; typed payloads carry a length-prefixed writer schema in the blob, and the export'spayload_manifest.entries[*]lists every encoded payload path with codec, availability, redaction state, Avro framing, prefix, writer schema, writer-schema fingerprint, and any diagnostic such aspayload_unavailableorpayload_redacted. Consumers should treat unknown fields as additive, require the declaredschemaandschema_version, and usepayload_manifestrather than guessing which sections contain decodable bytes. Whenworkflows.v2.history_export.redactorpoints at aWorkflow\V2\Contracts\HistoryExportRedactoror callable, Waterline and CLI exports run that policy over payload and failure-diagnostic slots before returning the bundle and exposeredaction.applied,redaction.policy, andredaction.pathsin the same response. Each bundle carries anintegrityblock computed after redaction and after codec metadata is attached:canonicalization = json-recursive-ksort-v1,checksum_algorithm = sha256, and a SHA-256checksum; configureworkflows.v2.history_export.signing_keyand optionallysigning_key_idto addsignature_algorithm = hmac-sha256,signature, andkey_idfor downstream artifact verification. Terminal runs sethistory_complete = true; open runs are point-in-time snapshots and should not be treated as final archive artifacts. - For
schema_version = 1, required top-level export fields areschema,schema_version,exported_at,dedupe_key,history_complete,workflow,payloads,summary,selected_run,history_events,waits,timeline,linked_intakes_scope,linked_intakes,commands,signals,updates,tasks,activities,timers,failures,links,redaction,codec_schemas,payload_manifest, andintegrity; nested values may still benullor empty when the selected run has no corresponding data. - selected-run
exception_count, compatibilityexceptions[*], timelinefailure, and history-exportfailures[*]now prefer typed failure history, including parent-sideChildRunFailedevents andFailureHandledevents recorded after a workflow actually catches a thrown activity or child failure. Waterline-compatible detail and export views keep the durableexception_typealias, exception class, resolved replay class, resolution source, message, code, file, line, trace frames, declared custom properties, child-failuresource_kind = child_workflow_runmetadata, and handled disposition even if the mutable failure row later drifts or disappears. When selected-run detail or history export can only recover a failure from an importedworkflow_failuresrow, that exception/failure payload now carrieshistory_authority = failure_row_fallbackanddiagnostic_only = true, so operators can see that the replay metadata is compatibility-only rather than typed failure history. New v2 failure history should use durable exception aliases before class moves are deployed. Multiple failure snapshots are ordered by committed history sequence when present, then by failure timestamp and id, and that order is preserved in selected-run detail and history exports. Resolution source is one ofexception_type,class_alias,recorded_class,unresolved,misconfigured, orunrestorable, withexception_resolution_errorpopulated for invalid mappings or unrestorable throwable classes. When the source isunresolved,misconfigured, orunrestorable,exception_replay_blocked = true; query replay raisesUnresolvedWorkflowFailureException, and worker replay leaves the run open with a failed task until the mapping is corrected and the run is repaired. - each supported wait row describes one selected-run resume source with
kind,status,source_status,resume_source_kind, and either task metadata orexternal_only = true; current kinds includeactivity,timer,child,signal,condition, andupdate. For timer waits specifically,statusis the coarse operator wait state derived from the authoritative timer snapshot (open,resolved,cancelled, orunsupported),source_statuskeeps the authoritative timer status value (pending,fired, orcancelled), androw_statusstays reserved for mutable timer-row diagnostics. Unsupported older terminal activity, timer, and child diagnostics instead setdiagnostic_only = true, keephistory_authority,history_unsupported_reason, and mutablerow_status, and omitresume_source_kind/resume_source_idbecause they are not durable resume paths - accepted update lifecycles now also appear as
kind = updatewaits withupdate_id,open_wait_id = update:{update_id}, andresume_source_kind = workflow_update; when the worker still owes that application, Waterline shows the accepted update as the current wait even if the underlying workflow is otherwise parked on a longer-lived signal, child, or timer wait - write-side update commands accept only
wait_for = acceptedorwait_for = completed; omittingwait_forkeeps thecompleteddefault, whilewait_for = statusis reserved for update lookup responses such asGET .../updates/{updateId}andWorkflowStub::inspectUpdate() - child wait rows also expose
child_call_id, which is the stable parent-issued child invocation id rather than the mutable current child run id; when the parent has already recorded child-resolution history and an open parent workflow task is applying that result, the wait row is task-backed by that workflow task - when the selected run currently has one open wait,
open_wait_idnames that exact wait row; when the selected run is instead sitting on an open workflow task,open_wait_idbecomesworkflow-task:{taskId}andresume_source_kind = workflow_task - signal wait rows also expose
signal_wait_idand, once a signal is received or applied on new v2 rows,signal_id, so repeated waits for the sametarget_nameremain distinguishable without inferring identity from FIFO name matching alone - condition waits expose
condition_wait_id; when the workflow supplied an optional condition key, they also exposecondition_keyand mirror it intotarget_nameas the stable operator label for that predicate - timeout-backed condition waits also expose
timeout_secondswhile keeping the wait itself askind = conditioninstead of splitting it into a separate synthetic timer wait; their timer task and timer detail carry the samecondition_keywhen one was recorded - timeout-backed condition waits rebuild their
deadline_at,resume_source_kind, andresume_source_idfrom typedConditionWait*plus condition-timeoutTimerScheduledhistory first, and cancelled timeout timers rebuild from typedTimerCancelledhistory, so selected-run detail survives missing or drifted live timer rows without losing the timeout transport identity or final cancelled status. After a timeout is scheduled in typed history, Waterline keeps the wait pending until matchingTimerFiredhistory exists; a live timer row that drifted tofireddoes not change the wait by itself. - after a timeout-backed condition wait records
TimerFired, selected-run detail keeps the wait open withsource_status = timeout_fireduntil a workflow task appliesConditionWaitTimedOut; if that workflow task is missing, the synthetic task row usestype = workflow,workflow_wait_kind = condition, the originalcondition_wait_id,timer_id, and the timer resume source - resolved signal waits also carry the snapped
command_sequence,command_status, andcommand_outcomefrom typed history, so Waterline can still explain which accepted signal satisfied that wait even if the mutable command row later drifts or disappears task_backed = truenow means the wait still has an open durable backing task inreadyorleasedstate- for accepted update waits,
task_backed = truealso requires that open workflow task to carry matchingworkflow_update_id,workflow_command_id,open_wait_id, orworkflow_updateresume-source provenance; unrelated signal, child, condition, or other workflow tasks do not satisfy the update wait task_backed = falseon an open timer wait or on an openpendingactivity wait means the run still has the durable wait source row but has lost the matching open task, which is the operator-facing condition behindliveness_state = repair_needed; older open mutable rows withhistory_authority = mutable_open_fallbackare different, because those waits are diagnostic-only and do not count as durable repair candidates- when
task_backed = falsebuttask_id,task_type, andtask_statusare still populated, Waterline is showing historical or stale task metadata for that wait rather than a healthy current backing task - when an open repairable wait has lost its open task row, selected-run
tasksnow also includes a synthetic diagnostic row withstatus = missing,transport_state = missing,task_missing = true, and the same activity, timer, update, signal, child-resolution, retry, condition-wait, or command identity that repair will use to recreate transport; if typed retry history named the lost retry task, that value appears asexpected_task_id - when a non-terminal selected run has no open semantic wait and no open workflow task row, selected-run
tasksincludes a generic synthetic workflow row withid = missing:workflow:{run_id},status = missing, andtransport_state = missing, making the no-durable-resume-source invariant visible in the same task table as other transport loss - timeout-backed condition waits are intentionally both
external_only = trueand task-backed, because either an external durable input can satisfy the predicate first or the timeout timer task can resume the run later - a
runningactivity wait withtask_backed = falseis surfaced instead asliveness_state = activity_running_without_taskonly when typed activity history still authoritatively says that activity is in flight, which keeps the run observable but intentionally suppresses Repair to avoid duplicating in-flight work - active child waits are intentionally not task-backed because the durable resume source is the child run chain itself, not a separate child task row on the parent; once the parent records
ChildRunCompleted,ChildRunFailed,ChildRunCancelled, orChildRunTerminated, the parent resume workflow task becomes the task-backed transport for applying that child result wait_kind = childwithliveness_state = waiting_for_childis a healthy durable wait while the parent is blocked on an active child runwait_kind = childwithliveness_state = repair_neededmeans the parent has already committed child-resolution history, but the workflow task that should apply that result is missing; selected-run detail keepsopen_wait_id = child:{child_call_id},resume_source_kind = child_workflow_run, and a synthetic missing task row withworkflow_wait_kind = child,child_call_id, andchild_workflow_run_id- waits opened by one
all([...])barrier exposeparallel_group_kind,parallel_group_id,parallel_group_base_sequence,parallel_group_size, andparallel_group_index; Waterline uses that metadata plusopen_wait_countto show several open waits as one fan-in barrier, withparallel_group_kind = mixedandparallel-calls:*ids when the same barrier combines child workflows and activities - when one open wait belongs to nested
all([...])barriers, those top-levelparallel_group_*fields describe the innermost enclosing group andparallel_group_pathpreserves the full outer-to-inner barrier path for operator displays and compatibility bridges - for those grouped waits, any failed activity or failed/cancelled/terminated child still wakes the parent immediately, while successful member closures do not create a parent workflow task until the last successful member in the group closes
- if that child uses
continueAsNew(), the selected-run wait and lineage surfaces follow the newest parent-recordedChildRunStartedfor thatchild_call_idinstead of pinning the parent to the original child run id or trusting only the child instance's mutable current-run pointer - while a parent run is still waiting on a child, selected-run detail now keeps that child wait open from the parent's own
ChildWorkflowScheduled/ChildRunStartedhistory even if the mutable child run row drifts to a terminal state before the parent commits its correspondingChildRun*resolution event - child lineage links may still carry
parallel_group_pathdiagnostics for grouped child waits, but Waterline and child-closure transport treat that barrier identity as authoritative only after matching typedChildWorkflowScheduled,ChildRunStarted, orChildRun*resolution history carries the path - once a parent run records
ChildRunCompleted,ChildRunFailed,ChildRunCancelled, orChildRunTerminated, selected-run detail keeps that child wait resolved from parent history even if the mutable child run row later drifts; child terminal history and legacy link rows remain diagnostic or lineage enrichment, but they do not replace missing parent typed child step history for replay - if a yielded child step only has a terminal child row or link and no parent typed child history, worker and query replay block with
history_shape_mismatchand recorded eventsno typed history; Waterline reportsliveness_state = workflow_replay_blocked, marks the selected child waitstatus = unsupported, exposeshistory_authority = unsupported_terminal_without_historyandhistory_unsupported_reason = terminal_child_link_without_typed_parent_history, preserves child identity throughtarget_name,child_call_id, andchild_workflow_run_idwhen available, and omitsresume_source_kind/resume_source_idbecause that wait is diagnostic-only - if an open activity, timer, or child wait only survives as older mutable state with no typed history, Waterline still lists that wait with
status = open,history_authority = mutable_open_fallback, anddiagnostic_only = true, but the selected run clears top-levelwait_kind,open_wait_id, andresume_source_*, projectsliveness_state = workflow_replay_blocked, and setsrepair_blocked_reason = unsupported_historyplusrepair_attention = truebecause that mutable row or link is observability-only rather than a durable resume source wait_kind = signalwithliveness_state = waiting_for_signalis a healthy external wait, not a repair conditionwait_kind = updatemeans the selected run has already accepted a durable update lifecycle that the workflow worker still needs to apply; if the matching backing workflow task drifts away, selected-run detail keeps the update wait open and flipsliveness_state = repair_neededso operators repair the missing transport instead of mistaking the run for a healthy signal, child, or timer wait. New, reused, and repaired update workflow tasks exposeworkflow_update_id,workflow_command_id,workflow_wait_kind = update, the open wait id, and theworkflow_updateresume source in the selected-run task detail.wait_kind = signalwithliveness_state = repair_neededmeans a signal has already been durably received but the workflow task that should apply it is missing; selected-run detail keepsopen_wait_id = signal-application:{signal_id}when the lifecycle row exists,resume_source_kind = workflow_signal, and new or repaired signal workflow tasks exposeworkflow_signal_id,workflow_command_id,workflow_wait_kind = signal, the open wait id, and the signal resume source.- if an imported accepted signal or update has command plus typed-history evidence but no first-class lifecycle row, selected-run detail may show the command-based fallback identity. Final v2 writes lifecycle rows directly for new commands.
wait_kind = conditionwithliveness_state = waiting_for_conditionis a healthy predicate wait; if it also hasresume_source_kind = timer, Waterline is showing a timeout-backed condition wait whose timer task is only the timeout transport, and that timeout identity now comes from typed condition-wait plus timer-schedule history rather than from the live timer row alone. If the timeout already fired but the workflow has not applied it, the same condition wait moves toliveness_state = repair_neededwhen the resume workflow task is missing.liveness_state = workflow_replay_blockedmeans either a workflow task reached a deterministic replay guard before committing new history or the selected run only has unsupported diagnostic state instead of a durable typed resume path. For keyed condition waits, selected-run task detail exposestransport_state = replay_blocked,replay_blocked_reason = condition_wait_definition_mismatch, the workflow sequence, the recorded condition key, and the key yielded by the current build. If replay finds a different typed step already recorded at that sequence, the task usesreplay_blocked_reason = history_shape_mismatchplusreplay_blocked_expected_history_shapeandreplay_blocked_recorded_event_types; this applies to activity, child-workflow, pure timer, signal-wait, side-effect, version-marker, continue-as-new, andall([...])leaf sequences, not only condition waits. When there is no replay-blocked task and the selected waits instead exposehistory_authority = mutable_open_fallbackorunsupported_terminal_without_history, the run is blocked on unsupported history and Waterline hides Repair withrepair_blocked_reason = unsupported_historyplusrepair_attention = trueinstead of synthesizing a new durable task from those mutable rows.- after a
SignalReceivedhistory event, that external wait is resolved; the selected run should then surface either the backing workflow task state with signal-application payload metadata or the accepted-signal application repair state until worker recovery or manual repair restores the missing workflow task - accepted and rejected signal commands also surface in the dedicated selected-run
signalstable, backed byworkflow_signal_recordson final v2 runs and by command/history fallback only for older preview rows that were not normalized before the clean-slate upgrade - when several same-name signals are accepted before a later wait opens, command sequence remains authoritative for ordering; Waterline shows the per-step
sequence, the matching wait row's snappedcommand_sequence, and the matchingSignalReceived,SignalWaitOpened, andSignalAppliedentries now share the same durablesignal_wait_idfor that accepted command - when an earlier accepted signal has already been received for the selected run but is not yet applied, Waterline keeps
can_signal = trueand flipscan_update = falsewithupdate_blocked_reason = earlier_signal_pending; the runtime no longer drains that workflow task inline on the update caller path - when the selected run already has an open ready task or an expired leased task but neither the current build nor any active worker heartbeat snapshot advertises that task's effective compatibility marker, the run surfaces
*_task_waiting_for_compatible_workerinstead ofrepair_needed, andcan_repairstaysfalse - in that compatibility-blocked case, selected-run detail now also sets
repair_blocked_reason = waiting_for_compatible_workerplusrepair_attention = trueinstead of leaving operators to infer it fromliveness_statealone liveness_state = repair_neededmeans the selected current run either lost its durable next-resume task or still has one whose last actionable transport state is unhealthy, such asdispatch_failed,dispatch_overdue, or an expired lease, so Waterline can surface a Repair action for it- command history exposes the durable start outcome alongside command status and rejection reason
- command history also exposes a stable per-run
sequencefor each accepted or rejected command - older preview runs that still had
command_sequence = nullare backfilled into that same per-run order before later commands are recorded, so Waterline keeps one durable command timeline even while a preview deployment is being upgraded - command history also exposes
target_namefor named commands such as v2 signals and aliased v2 updates, while accepted repair commands surfacerepair_dispatchedorrepair_not_needed - command history now also exposes
payload_available,payload_codec, andpayload, so Waterline can inspect the durable accepted input for start, signal, update, and repair commands without scraping raw engine rows - command history also exposes
validation_errorsfor rejected signal or update commands, so Waterline can show contract mismatches such as missing required arguments, unknown named arguments, type mismatches, or nullability violations without replaying the workflow class - selected-run update detail now also exposes one row per durable update lifecycle, so Waterline no longer has to reconstruct update state entirely from command plus timeline joins
- when a workflow declares
#[UpdateMethod('public-name')],declared_updates,declared_update_contracts, commandtarget_name, and timelineupdate_nameall use that durable alias instead of the PHP method name - rejected signal and update commands keep their typed
rejection_reason, so Waterline can distinguishunknown_signal,unknown_update,earlier_signal_pending,workflow_definition_unavailable, and run-state rejection without replay - rejected updates on an existing selected run also append typed
UpdateRejectedtimeline entries, so Waterline's timeline and command table agree on the rejected target, sequence, outcome, and rejection reason, includingrejected_workflow_definition_unavailablewhen the target is durably declared but the workflow definition cannot be replayed - command history now also exposes durable command-ingress metadata so Waterline can tell whether a command came from PHP, a public webhook, the Waterline operator UI, or another workflow run; workflow-originated start commands carry the parent instance id, parent run id, workflow step, and any inherited
child_call_idincommands[*].context.workflow - cancel and terminate commands now carry an optional
commands[*].reasonfield with the caller- or operator-provided reason string; the same reason is persisted in both theCancelRequested/TerminateRequestedandWorkflowCancelled/WorkflowTerminatedtyped history events so audit trails, history exports, and offline analysis can distinguish user-driven, policy-driven, and operator-driven interruption without replaying the command payload - the detail lookup can resolve either a selected run id or the public instance id of the current run
- that instance-scoped detail and operator lookup resolves the current run from typed continue-as-new lineage first and falls back to durable run ordering only when no lineage evidence exists, instead of trusting only
workflow_instances.current_run_id - detail and history-export payloads expose that resolution path as
current_run_source, currentlycontinue_as_new_lineageorrun_order_fallback - historical-run detail payloads also expose the current active run pointer so the UI can navigate back to the active execution quickly
- Waterline operator commands now use canonical instance-scoped routes for current-run actions while still recording accepted and rejected outcomes as durable v2 command records; the current detail screen uses normalized
declared_signal_targetsanddeclared_update_targetsarrays plusdeclared_contract_sourceto drive operator-facing Signal and Update forms instead of inventing target names from live PHP reflection, while the olderdeclared_signals,declared_signal_contracts,declared_updates, anddeclared_update_contractsfields remain available as compatibility metadata; selected-run detail also exposesdeclared_entry_method,declared_entry_mode, anddeclared_entry_declaring_classso operators can tell whether the run was started from the canonicalhandle()contract or the legacyexecute()compatibility path. Those normalized target arrays stay present even when the selected run reportsdeclared_contract_source = unavailable, so partial legacy snapshots can remain observable without being treated as authoritative - mutable detail views are limited to the current selected run while it is still open; historical runs and closed current runs are read-only
- dashboard stats are served by
OperatorObservabilityRepository::dashboardSummary()and derived from the run-summary projection, including total runs, recent run starts, max wait, max duration, and max exceptions. This keeps the dashboard endpoint on the same replaceable operator-observability boundary as selected-run detail, history export, and the deeperoperator_metricspayload. - dashboard stats now also include
engine_sourceplusoperator_metrics.engine_sourcereportsconfigured,resolved,uses_v2,v2_operator_surface_available, a stable readinessstatus, an operator-facingmessage, any surfacedissues, the inspectedrequired_tableslist, and the versionedreadiness_contractwith effective states for boot/install, stats, health, and instance routes.operator_metricsremains the v2-only object withgenerated_at,runs,tasks,activities,backlog,repair,starts,history,projections,workers,backend,update_wait, andrepair_policygroups.runscounts total, current, running, completed, failed, cancelled, terminated, archived,repair_needed,claim_failed, andcompatibility_blockedselected-run summaries.taskscounts open, ready, due-ready, delayed, leased, dispatch-failed, claim-failed, dispatch-overdue, lease-expired, and unhealthy durable workflow tasks.activitiescounts open, pending, running, retrying, failed attempts, and max attempt count from durable activity executions and attempts.backlogmirrors the operator-actionable counts asrunnable_tasks,delayed_tasks,leased_tasks,retrying_activities,unhealthy_tasks,repair_needed_runs,claim_failed_runs, andcompatibility_blocked_runs.repairexposes the worker-loop candidate pressure asexisting_task_candidates,missing_task_candidates,total_candidates,scan_limit,scan_strategy = scope_fair_round_robin, selected existing-task and missing-run counts for the next pass, per-phase scan-limit flags,scan_pressure, oldest-candidate timestamps, max candidate ages, and per-scope rows grouped byconnection,queue, andcompatibilityso operators can tell whether repair sweeps are keeping up, which queue scope is consuming the repair budget, and whether each scope was selected or limited by the fair scan. Tasks with repeated dispatch or claim failures keep their failure counts visible, exposerepair_available_atin selected-run task detail, and are omitted fromrepaircandidate counts until that timestamp arrives.startsexposes pending start runs, accepted pending start commands, due first workflow tasks, the oldest pending start timestamp, andmax_pending_msso dashboards can track workflow-start latency without reading queue internals.historyexposescontinue_as_new_recommended_runs,max_event_count,max_size_bytes,event_threshold, andsize_bytes_thresholdfrom run summaries.projections.run_summariesexposes durable run count, summary count, missing summary count, stale summary count, orphaned summary count, rebuild-needed count, and oldest/newest summary update timestamps so operators can tell when list and dashboard views need a rebuild.projections.run_waitsexposes wait projection row count, projected-run count, canonical wait-run count, projected canonical wait-run count, missing wait-run count, stale projected wait-run count, summaries with current open waits, missing current open-wait rows, rebuild-needed count, and oldest/newest wait update timestamps.projections.run_timeline_entriesexposes history-event count, timeline row count, projected-run count, canonical history-run count, projected canonical history-run count, missing history-run count, stale projected history-run count, missing history-event rows, orphaned timeline rows, rebuild-needed count, and oldest/newest timeline update timestamps.projections.run_timer_entriesexposes timer row count, projected-run count, canonical timer-run count, projected canonical timer-run count, missing timer-run count, stale projected timer-run count,schema_version_mismatch_runs,schema_version_mismatch_rows, orphaned timer rows, rebuild-needed count, and oldest/newest timer update timestamps, so dashboards can tell whether timer rebuild pressure is ordinary drift or a stored schema-version mismatch.projections.run_lineage_entriesexposes lineage row count, projected-run count, canonical lineage-run count, projected canonical lineage-run count, missing lineage-run count, stale projected lineage-run count, orphaned lineage rows, rebuild-needed count, and oldest/newest lineage update timestamps.--needs-rebuilduses the same canonical wait, timeline, timer, and lineage projector comparisons that selected-run detail and history export use, so stale selected-run payload drift is repaired even when the row set is still present.workersexposescompatibility_namespace,required_compatibility,active_workers,active_worker_scopes, andactive_workers_supporting_requiredfrom the database-backed compatibility heartbeat snapshots plus the mixed-fleet cache fallback.backendexposes the same database, queue, and cache capability snapshot returned byphp artisan workflow:v2:doctor --json, including the frozen backend side of the readiness contract and blockingissuessuch asqueue_sync_unsupported; Waterline renders this snapshot and any claim-failed task/run counts on the v2 operator dashboard so backend problems are visible before opening a selected run.update_waitexposes the activecompletion_timeout_secondsandpoll_interval_millisecondsvalues used by completion-waiting update calls before they fall back to an accepted lifecycle.repair_policyexposes the activeredispatch_after_seconds,loop_throttle_seconds,scan_limit,scan_strategy,failure_backoff_max_seconds, andfailure_backoff_strategyvalues used by worker-loop repair and dispatch-overdue metrics.
For on-demand operations, php artisan workflow:v2:repair-pass runs one immediate sweep with that same repair policy and emits the selected-candidate and repaired-task counts directly. The default command path bypasses the loop throttle so operators can force a repair pass after a deploy or during incident response; add --respect-throttle if the command should skip work when the background loop already owns the throttle window, or --json when dashboards and scripts want the raw report. The command exits non-zero when any selected existing-task repair or missing-task reconstruction fails, so alerting and deployment tooling can treat those operator-visible failures as actionable instead of parsing stderr heuristics.
- Waterline exposes
GET /waterline/api/v2/healthas the v2 health-check endpoint behind the same Waterline route middleware and authorization gate as the dashboard API. It now includesengine_sourceand the samereadiness_contractat the top level, and prepends anengine_sourcecheck ahead of the deeper v2 checks. If Waterline is not actively using v2, becauseengine_source=autofell back to v1,engine_source=v1is pinned, orengine_source=v2is pinned but incomplete, the endpoint returns HTTP503with the readiness payload instead of pretending the v2 bridge is healthy. Once v2 is active, it returns the same operator metrics plus achecksarray forbackend_capabilities,run_summary_projection,selected_run_projections,task_transport,durable_resume_paths, andworker_compatibility; hard backend capability errors return HTTP503, while projection, task, durable-resume-path, and worker-compatibility issues are returned as warnings so a web health check does not fail just because repairable work exists. - failed, cancelled, and terminated runs expose the same closed-at and duration semantics as completed runs
- v2 list/detail payloads now also expose
is_terminal, so Waterline and other operator clients can tell at a glance whether a selected run is closed without inferring that only from the rawstatus - v2 list/detail payloads also expose
business_keyandvisibility_labels, copied from the start metadata onto the durable run-summary projection and selected-run detail cancelledandterminatedstill map intostatus_bucket = failedas the compatibility bridge, but current Waterline builds now use the rawstatusto expose dedicatedfailed,cancelled, andterminatedlist views instead of collapsing every non-completed terminal state into one screen- the activity table stays activity-only, but it now exposes the execution-level idempotency key, snapped retry policy, attempt count, separate started, heartbeat, and closed timestamps, one
attempts[*]row per try rebuilt from typed activity history first, and prefers typed activity history snapshots over liveactivity_executionsoractivity_attemptsrows for status and identity, so timer waits stay visible in the timeline without also showing up as fake activities and completed, cancelled, and currently running activity detail survive mutable-row drift. If typed history only proves that an activity was scheduled, started, heartbeated, or retry-scheduled, Waterline keeps that activity open even when the mutable execution row says it later closed. Those older open-row fallbacks now carryhistory_authority = mutable_open_fallbackplusdiagnostic_only = true, so operators can keep the breadcrumb without mistaking it for a durable resume source. Terminal activity result and close time require typedActivityCompleted,ActivityFailed, orActivityCancelledhistory. If only a completed, failed, or cancelled terminal mutable activity row exists, Waterline showsstatus = unsupportedwith the unsupported history reason, setsdiagnostic_only = true, and keeps the mutable status asrow_statusfor diagnostics. Each attempt row also exposescan_continue,cancel_requested, andstop_reason, matching theActivityTaskBridge::heartbeatStatus()contract external activity workers use to observe cancellation or stale-attempt stop conditions - timeout tasks created by
await()with atimeout:parameter also surfacecondition_wait_idon the selected-runtaskscollection, so Waterline can label them as condition-timeout transport instead of as generic unrelated timers
Route examples:
GET /waterline/api/instances/order-123
GET /waterline/api/instances/order-123/history-export
GET /waterline/api/instances/order-123/runs/01J10000000000000000000021
GET /waterline/api/instances/order-123/runs/01J10000000000000000000021/history-export
POST /waterline/api/instances/order-123/signals/name-provided
POST /waterline/api/instances/order-123/updates/mark-approved
POST /waterline/api/instances/order-123/cancel
POST /waterline/api/instances/order-123/repair
POST /waterline/api/instances/order-123/terminate
POST /waterline/api/instances/order-123/archive
POST /waterline/api/instances/order-123/runs/01J10000000000000000000021/signals/name-provided
POST /waterline/api/instances/order-123/runs/01J10000000000000000000021/updates/mark-approved
POST /waterline/api/instances/order-123/runs/01J10000000000000000000021/cancel
POST /waterline/api/instances/order-123/runs/01J10000000000000000000021/repair
POST /waterline/api/instances/order-123/runs/01J10000000000000000000021/terminate
POST /waterline/api/instances/order-123/runs/01J10000000000000000000021/archive
POST /waterline/api/instances/order-123/runs/01J10000000000000000000021/queries/current-stage
GET /waterline/flows/instances/order-123/runs/01J10000000000000000000021
GET /waterline/api/flows/01J10000000000000000000021
GET /waterline/api/flows/01J10000000000000000000021/history-export
GET /waterline/api/flows/order-123
GET /waterline/api/v2/health
POST /waterline/api/flows/order-123/queries/current-stage
POST /waterline/api/flows/order-123/signals/name-provided
POST /waterline/api/flows/order-123/updates/mark-approved
POST /waterline/api/flows/order-123/archive
The canonical Waterline detail route is now instance-scoped with an explicit selected run. GET /waterline/api/instances/{instanceId} resolves the instance's current run, while GET /waterline/api/instances/{instanceId}/runs/{runId} pins one concrete selected run inside that instance. GET /waterline/api/instances/{instanceId}/history-export now exports the same current run that the instance detail route resolves, so current-run detail and current-run export stay aligned under continue-as-new lineage and current-run-pointer drift. Historical exports remain explicitly run-scoped under /waterline/api/instances/{instanceId}/runs/{runId}/history-export. Waterline now exposes both instance-targeted current-run operator routes and explicit selected-run operator routes under the same /instances/{instanceId} prefix. The legacy /waterline/api/flows/{id} lookup and /waterline/api/flows/{id}/{command} operator routes still work as compatibility bridges for either a run id or the public instance id of the current run, and the legacy bucket preview routes hydrate through the same selected-run payload before redirecting to the canonical detail route.
For list screens, the current Waterline build now exposes:
/waterline/api/flows/runningfor open runs in the running bucket/waterline/api/flows/completedfor rawstatus = completed/waterline/api/flows/failedfor rawstatus = failed/waterline/api/flows/cancelledfor rawstatus = cancelled/waterline/api/flows/terminatedfor rawstatus = terminated
The last three are all terminal views. cancelled and terminated still carry status_bucket = failed in each row for compatibility, but the list routing no longer forces operator-driven closures to share the exact same screen as actual failures.
Those list routes also accept exact-match filters for instance_id, run_id, namespace, workflow_type, business_key, compatibility, declared_entry_mode, declared_contract_source, connection, queue, status, status_bucket, closed_reason, wait_kind, liveness_state, repair_blocked_reason, repair_attention, task_problem, is_current_run, continue_as_new_recommended, archived, and is_terminal. Visibility labels can be filtered with either label[key]=value or labels[key]=value, and search attributes with search_attribute[key]=value or search_attributes[key]=value, for example:
GET /waterline/api/flows/running?declared_entry_mode=compatibility&declared_contract_source=unavailable&repair_blocked_reason=unsupported_history&workflow_type=billing.invoice-sync&instance_id=order-123&label[tenant]=acme&search_attribute[priority]=high
Selected-run detail and history export also return memo, but memo is intentionally not part of the list-filter contract, run-summary projection, or saved-view matching. Use business_key, visibility_labels, and search_attributes for searchable fleet metadata, and use memo for returned-only per-run context.
That searchable-versus-returned-only boundary is now machine-readable too: visibility_filters.definition.indexed_metadata describes the exact-match searchable metadata that Waterline can persist in saved views today (including business_key, labels, and search_attributes), while visibility_filters.definition.detail_metadata calls out returned-only metadata such as memo that stays visible on selected-run detail and history export but never participates in list filtering or saved-view matching.
Each list response now also echoes the resolved filter contract under visibility_filters:
version: the current workflow visibility filter contract version. Current builds emit5and still support saved views written against versions1through5. Versions1and2are deprecated but remain loadable; updating a deprecated saved view rewrites it onto the current versionminimum_supported_version: the oldest filter version the current build will acceptdeprecated_versions: filter versions that are still loadable but should be migrated to the current versionreserved_view_id_prefix: the ID prefix reserved for system views (currentlysystem:); custom views must not use this prefixbucket: the list bucket that was querieddefinition: the exact-match field and label contract the current build understands, including field labels, editor input types, bounded-field option catalogs, query-parameter names, ordering, field help text, label-editor metadata such as the accepted key pattern and placeholder, plusindexed_metadataanddetail_metadataentries that distinguish searchable saved-view-compatible operator metadata from returned-only detail metadataapplied: the merged filters after Waterline resolves any saved view and overlays the current query stringsaved_view: the resolved saved-view payload when the request used?view=...
The definition also includes projection_schema_version, which tracks the current derived-field schema that the run-summary projector writes. Summaries projected by an older package version may have NULL for fields added in later schema versions; exact-match filters will not match those rows until they are re-projected. The workflow:v2:rebuild-projections --needs-rebuild command detects schema-outdated summaries (where projection_schema_version is NULL or lower than the current build) and re-projects them from durable runtime state.
The saved-views index response also includes mixed_fleet_policy, which describes how visibility filters behave when workers run different package versions concurrently:
- Filter normalization is idempotent regardless of the worker package version that wrote the summary projection
- Saved views remain readable across filter version bumps; updating a deprecated saved view rewrites it onto the current version
- Mixed-fleet operation is safe during a rollout window — older workers continue projecting with their schema version, and the rebuild command brings all rows to the current schema after rollout completes
- The health check surfaces
schema_outdatedalongsidemissing,orphaned, andstalein therun_summary_projectioncheck, with the currentprojection_schema_versionreported for operator reference
Waterline v2 saved views persist those same exact-match filters server-side:
GET /waterline/api/saved-views?bucket=running
POST /waterline/api/saved-views
PUT /waterline/api/saved-views/{viewId}
DELETE /waterline/api/saved-views/{viewId}
GET /waterline/api/flows/running?view=01J20000000000000000000000
A saved view records name, bucket, scope, shared, filter_version, and normalized filters using the workflow v2 visibility filter contract. GET /waterline/api/saved-views?bucket=... now also returns filter_definition, supported_filter_versions, and version_evolution so operator clients can render the same current field contract they save against, inspect deprecation status, and understand upgrade policy, even if no custom views exist yet. Each saved-view payload echoes filter_version_supported, filter_version_deprecated, filter_version_status, filter_version_message, current_filter_version, minimum_supported_filter_version, and supported_filter_versions, so mixed-era rows are inspectable before an operator applies or updates them. When filter_version_deprecated is true, the saved view is still loadable but should be updated to the current version; the filter_version_message explains the recommended action. When a selected custom view's filter_version is not supported by the current build, Waterline still returns the saved-view payload but marks visibility_filters.saved_view_applied = false and echoes the warning under visibility_filters.saved_view_warning; the list falls back to any direct query-string filters instead of silently pretending the outdated saved filters still applied. Updating that saved view rewrites it onto the current filter contract version. Current Waterline builds consume the shared definition plus the list route's echoed visibility_filters.definition to render the list-screen filter editor instead of keeping a separate hard-coded field schema in the browser, including select controls for bounded fields such as status, status_bucket, closed_reason, wait_kind, repair_blocked_reason, declared_entry_mode, and declared_contract_source plus booleans such as repair_attention, task_problem, is_current_run, and continue_as_new_recommended. The same echoed definition now also exposes searchable business_key, labels, and search_attributes metadata separately from returned-only memo, so Waterline can tell operators exactly which visibility metadata is indexed and saved-view-compatible before they try to save or share a view. The repair_blocked_reason option catalog now also carries the operator-facing description, severity tone, and badge_visible hint that Waterline uses for repair-triage badges, and repair_attention turns that same badge-visible subset into one durable/searchable saved-view filter, so the browser no longer has to keep its own reason map or hard-code a list of actionable reason codes. Current list rows use the same command-contract fields for fast triage badges such as compatibility-entry, workflow-task-problem, and repair-blocked states before an operator opens selected-run detail. WATERLINE_SAVED_VIEW_SCOPE partitions saved views by app, environment, tenant, or operator namespace when several installs share one database. Waterline also returns system defaults such as system:running, system:running-task-problems, and system:running-repair-blocked; the repair-blocked view applies repair_attention = true, while the task-problems view applies task_problem = true, so operators can keep stable fleet views for badge-visible repair blockers and for selected runs with replay-blocked, missing, or repeatedly unhealthy workflow-task transport. Custom views are stored in the waterline_saved_views table, and the list screen can now save, update, and delete those custom views while showing the currently applied filter badges. Saving or updating a custom view now persists the effective applied filter set after any selected saved-view overlay, so operators do not accidentally drop the saved portion of the current view when refining or renaming it. Because repair_attention, repair_blocked_reason, task_problem, and declared_entry_mode are part of that same contract, operators can save repeatable views for repair-blocked drift, unsupported_history, workflow-task problem triage, or compatibility-entry runs without rebuilding those filters by hand each time. Extra query filters on a saved-view URL refine the saved view for that request.
Query, Signal, and Update now sit on that same operator surface. Waterline shows Query only when can_query = true and the selected run exposes at least one declared_query_targets[*].name; unlike the mutating controls, that read-only query action can still stay available on historical or already-closed selected runs too. When durable query targets exist but selected-run detail reports can_query = false, Waterline keeps the declared targets visible but hides query execution, with query_blocked_reason explaining why. It shows Signal only when can_signal = true and the selected run exposes at least one declared_signal_targets[*].name. It shows Update only when can_update = true and the selected run exposes at least one declared_update_targets[*].name. The UI also shows declared_contract_source so operators can tell whether those targets came from durable WorkflowStarted history or are unavailable. Final v2 no longer falls back to a live definition or auto-backfills missing command contracts; incomplete preview-era snapshots report declared_contract_source = unavailable with empty normalized target arrays. Each normalized target row carries the durable public target name, a parameters array when the run snapped a contract for that target, and has_contract so operator clients can distinguish contract-backed targets from bare declared names without hiding the latter. When a contract exists, Waterline now seeds the operator JSON editor from declared defaults plus type-aware primitive placeholders instead of filling every required scalar slot with null; rejected commands still surface durable validation_errors.
Query, signal, and update requests use an explicit JSON arguments field on the Waterline operator routes:
- query routes accept either a JSON object of named arguments or a JSON array of positional arguments, then return the query result as a typed JSON payload instead of recording a durable command row; when durable query targets exist but the workflow definition cannot be replayed, those POSTs now return HTTP
409 Conflictwithblocked_reason = workflow_definition_unavailable - signal routes accept either a JSON array of positional arguments or any other single JSON value, which Waterline forwards as one durable signal payload
- signal routes also accept a JSON object of named arguments when the selected run exposes a matching declared signal contract; rejected signal commands surface machine-readable
validation_errors, including declared type mismatches and nullability violations - update routes accept either a JSON object of named arguments or a JSON array of positional arguments, and rejected requests surface machine-readable
validation_errors, including declared type mismatches and nullability violations - update routes also accept
wait_for = acceptedandwait_timeout_seconds; Waterline exposes that as a "Return after" control plus an optional completion-wait timeout so operators can either queue an update immediately or wait briefly for the worker before the UI falls back to the accepted lifecycle
Cancel and terminate still target the current run only. If you address a historical run directly through the canonical selected-run route or a legacy compatibility route for one of those current-run-only actions, Waterline rejects the command instead of forwarding it to that older run. Archive is intentionally selected-run scoped: it can mark any closed selected run in the instance as archived, including a closed historical run, while open runs reject with rejected_run_not_closed and already archived runs accept as archive_not_needed. Waterline shows the Repair control only when can_repair is true for the currently selected run. It shows Cancel and Terminate only when can_cancel or can_terminate are true, and Archive only when can_archive is true; can_issue_terminal_commands remains the coarse compatibility bridge when older clients still expect one shared terminal-action flag.
Current-run detail payload example:
{
"id": "01J10000000000000000000021",
"instance_id": "order-123",
"business_key": "order-123",
"visibility_labels": {
"tenant": "acme",
"region": "us-east"
},
"memo": {
"customer": {
"id": 42,
"name": "Taylor"
},
"source": "checkout"
},
"selected_run_id": "01J10000000000000000000021",
"run_id": "01J10000000000000000000021",
"is_current_run": true,
"current_run_id": "01J10000000000000000000021",
"current_run_source": "run_order_fallback",
"current_run_status": "waiting",
"current_run_status_bucket": "running",
"run_navigation": [
{
"instance_id": "order-123",
"run_id": "01J10000000000000000000021",
"run_number": 1,
"is_current_run": true,
"is_selected_run": true,
"status": "waiting",
"status_bucket": "running"
}
],
"status": "waiting",
"status_bucket": "running",
"closed_reason": null,
"archived_at": null,
"archive_command_id": null,
"archive_reason": null,
"can_issue_terminal_commands": true,
"can_cancel": true,
"cancel_blocked_reason": null,
"can_terminate": true,
"terminate_blocked_reason": null,
"can_archive": false,
"archive_blocked_reason": "run_not_closed",
"can_query": true,
"query_blocked_reason": null,
"can_signal": true,
"signal_blocked_reason": null,
"can_update": true,
"update_blocked_reason": null,
"can_repair": false,
"repair_blocked_reason": "repair_not_needed",
"read_only_reason": null,
"open_wait_id": "timer:01J10000000000000000000031",
"resume_source_kind": "timer",
"resume_source_id": "01J10000000000000000000031",
"waits_scope": "selected_run",
"tasks_scope": "selected_run",
"waits": [
{
"id": "timer:01J10000000000000000000031",
"kind": "timer",
"status": "open",
"source_status": "pending",
"summary": "Waiting for timer.",
"task_backed": true,
"external_only": false,
"resume_source_kind": "timer",
"resume_source_id": "01J10000000000000000000031",
"task_id": "01J10000000000000000000041",
"task_type": "timer",
"task_status": "ready"
}
],
"tasks": [
{
"id": "01J10000000000000000000041",
"type": "timer",
"status": "ready",
"summary": "Timer for 60 seconds task ready.",
"is_open": true,
"timer_id": "01J10000000000000000000031",
"timer_sequence": 1
}
],
"timeline": [
{
"sequence": 1,
"type": "StartAccepted",
"kind": "command",
"entry_kind": "point",
"source_kind": "workflow_command",
"source_id": "01J10000000000000000000011",
"summary": "Start accepted as started_new.",
"command_sequence": 1
},
{
"sequence": 2,
"type": "WorkflowStarted",
"kind": "workflow",
"entry_kind": "point",
"source_kind": "workflow_run",
"source_id": "01J10000000000000000000021",
"summary": "Workflow run started."
},
{
"sequence": 3,
"type": "TimerScheduled",
"kind": "timer",
"entry_kind": "point",
"source_kind": "timer",
"source_id": "01J10000000000000000000031",
"summary": "Scheduled timer for 60 seconds.",
"timer": {
"id": "01J10000000000000000000031",
"status": "pending"
}
}
]
}
run_navigation is ordered by run number for the selected instance. Waterline uses it to render stable continue-as-new navigation without inferring routes from legacy bucket names or from one-hop lineage arrays alone.
Waterline still returns compatibility logs and chartData fields for the current Vue client, but those no longer need to be the only observability surface for v2 runs. They now echo activity-level history_authority, history_unsupported_reason, and diagnostic_only metadata when an older mutable row is the only surviving evidence, so compatibility consumers can keep the breadcrumb without treating it as typed durable history. The timeline collection is ordered by durable history sequence and is the better source when you need to explain exactly what the engine accepted, scheduled, completed, cancelled, terminated, or archived for one selected run.
The timeline is now projected into workflow_run_timeline_entries during the same projection pass that updates run summaries and waits. Selected-run detail reports timeline_projection_source = workflow_run_timeline_entries when Waterline is reading an already-synced projection row set, and workflow_run_timeline_entries_rebuilt when the detail read had to recreate missing or stale timeline rows before returning them. Fleet metrics report missing history-run coverage, stale projected timeline payloads, and missing history-event rows under operator_metrics.projections.run_timeline_entries.*.
The timeline's primary identity and status fields are history-first snapshots, not a live join against today's mutable side tables. For example, an ActivityScheduled entry stays pending, an ActivityStarted and ActivityHeartbeatRecorded entry stays running, a TimerScheduled entry stays pending, UpdateAccepted and UpdateApplied do not inherit the later completion outcome, VersionMarkerRecorded keeps the originally committed branch choice and supported range, typed failure entries keep the failure payload that was recorded at that point in history even if later mutable rows drift, and FailureHandled marks the later point where workflow code caught the failure and continued. If a mixed-era replay stayed on WorkflowStub::DEFAULT_VERSION because the selected run predates the marker and no typed marker was ever committed, Waterline does not invent a synthetic timeline entry for that fallback; use the run's compatibility marker and deployment wave as the operator context instead.
When several run-scoped commands exist, the dedicated commands list is ordered by durable command sequence instead of timestamp ties, and Waterline now shows that per-run sequence directly in the commands table alongside a payload viewer backed by the durable command row. Command-related timeline entries surface the same command_sequence, and their nested command snapshot keeps the event-era payload preview, public workflow command context, plus requested_run_id and resolved_run_id, so operators can correlate accepted signals with the order they were later applied and still see which current run superseded a rejected historical-run command even if the mutable command row later drifts.
Compound start-time intake is now grouped separately under linked_intakes_scope = selected_run and linked_intakes[*]. Each grouped row is keyed by the durable workflow_commands.context.intake.group_id, names the mode, reports source = workflow_commands.context.intake, carries start_command_*, primary_command_*, and the ordered nested commands[*] snapshots, and marks complete = false plus missing_expected_command_types when a recognized mode is only partially present. The current release recognizes signal_with_start as a start plus signal compound intake; future modes can still publish the same grouped shape with their own mode and ordered command list. Older preview rows that only preserved commands[*].context.intake.mode without the durable group_id are omitted from linked_intakes, so the grouped contract never invents a linked identity that the command rows did not actually store.
Those same selected-run commands, signals, and updates rows now carry a small task-link bridge for operator triage. current_task_id and current_task_status name the currently open durable workflow task, if one still exists, that transports or applies that accepted command lifecycle. task_ids keeps the set of known durable task ids proven either by current selected-run task payloads or by typed history, so a row can still point back to the task that handled it even after that task has closed and disappeared from the open-task surface. task_transport_state mirrors the current task's transport state when there is one and falls back to missing when the lifecycle is repairable but the backing task row is gone. task_missing = true means the command, signal, or update is durably accepted yet currently has no open backing task; for preview-era accepted signals or updates that still only have command-plus-history evidence, the same rows may temporarily point at the command-linked fallback task identity.
The dedicated waits and tasks arrays are also selected-run scoped. waits tells you what the run is waiting on now or what resume source resolved earlier in the same run. Current rebuilds persist those rows in workflow_run_waits; selected-run detail reports waits_projection_source = workflow_run_waits when Waterline is reading an already-synced wait projection row set, and workflow_run_waits_rebuilt when the detail read had to recreate missing or stale wait rows before returning them. Fleet metrics report canonical wait-run coverage, stale projected wait payloads, and current open waits missing from that projection under operator_metrics.projections.run_waits.*. That rebuilt path still derives waits from typed history even if the current run summary has no open wait, so resolved child, activity, timer, signal, update, and condition waits do not disappear behind a stale summary. Unsupported older terminal activity, timer, and child fallbacks remain visible there as diagnostics, but they now set diagnostic_only = true and omit resume_source_kind / resume_source_id so Waterline does not imply a durable resume path that typed history never proved. tasks tells you which durable worker tasks exist for that run and also includes synthetic transport_state = missing rows when typed history, wait state, or the no-resume-source invariant proves that repairable task transport disappeared. Together they let operators tell the difference between a healthy external signal wait, a signal that was already received but lost its workflow task, a healthy task-backed timer or activity wait, a repair-needed wait whose task row has disappeared, a pending activity whose execution row and task can be restored from typed history, a non-terminal run with no durable resume source, and an open wait that only has stale historical task rows left behind. For repeated same-name signals, the wait row sequence and signal_wait_id tell you exactly which opened wait you are looking at.
Selected-run timers are projected separately into workflow_run_timer_entries during that same rebuild pass. Detail reports timers_projection_source = workflow_run_timer_entries when Waterline is reading an already-synced timer projection row set, and workflow_run_timer_entries_rebuilt when the detail read had to recreate missing or stale timer rows before returning them. When that rebuild happens, detail and history export also surface timers_projection_rebuild_reasons, currently missing_projection, stale_projection, and schema_version_mismatch, so operators can tell whether the read repaired an absent row set, a drifted payload, or a row whose stored schema version does not match the current contract. Fleet metrics report timer coverage and drift under operator_metrics.projections.run_timer_entries.*, including runs_with_timers, missing_runs_with_timers, stale_projected_runs, schema_version_mismatch_runs, schema_version_mismatch_rows, and orphaned. Timer projection rows carry an explicit stored schema_version: 1 is the current row contract. Waterline treats rows with any other stored schema version as rebuild-required instead of silently trusting them forever, so selected-run detail, history export, or php artisan workflow:v2:rebuild-projections --needs-rebuild rewrites them onto the current schema before the timer projection is reported as aligned. The timer rows keep timer-specific diagnostics such as timer_kind, timeout-backed condition_wait_id / condition_key, history_authority, and history_unsupported_reason on a rebuildable selected-run surface instead of forcing operators back to mutable timer rows or compatibility logs.
The selected-run lineage arrays are projected into workflow_run_lineage_entries during that same rebuild pass. Selected-run detail reports lineage_projection_source = workflow_run_lineage_entries when Waterline is reading an already-synced lineage projection row set, and workflow_run_lineage_entries_rebuilt when the detail read had to recreate missing or stale lineage rows before returning them. Fleet metrics report canonical lineage-run coverage plus stale projected lineage payloads under operator_metrics.projections.run_lineage_entries.*. That rebuilt path derives parent, child, and continue-as-new relationships from typed history first and only uses workflow_links as a compatibility bridge for older preview rows. When a lineage entry only survives through that mutable compatibility bridge, the row now carries history_authority = mutable_open_fallback and diagnostic_only = true, so Waterline keeps the relation visible without implying that the identity or timestamp came from durable typed lineage history.
When replay blocks because committed history no longer matches the current workflow definition, task rows use transport_state = replay_blocked. A parallel all([...]) arity or nesting mismatch uses replay_blocked_reason = history_shape_mismatch with replay_blocked_expected_history_shape = parallel all barrier matching current topology, so Waterline can distinguish a barrier-shape rollout problem from ordinary queue transport loss. The same replay-blocked shape is used when typed activity or child leaf events exist for an all([...]) step but lack parallel_group_path metadata, because the engine cannot safely infer which current barrier those events belonged to. Activity execution and child link rows may carry paths for diagnostics, but runtime reads, history export, and Waterline projections do not infer grouped activity or child barrier identity from those mutable rows.
For pure timers, those same selected-run wait and task rows now rebuild timer_id, timer_sequence, and deadline_at from typed TimerScheduled and TimerFired history first, so a drifted or deleted workflow_timers row does not erase the operator-facing wait identity. Timeout-backed condition waits use that fired timer history as the resume source for applying the condition timeout; a mutable timer row marked fired without the matching typed TimerFired event remains a pending timeout transport. A pure timer row marked fired with no typed timer history blocks replay as history_shape_mismatch and Waterline reports recorded events as no typed history instead of treating the row as a durable timer result. Selected-run timer and wait detail mark that row-only terminal fallback with status = unsupported, diagnostic_only = true, history_authority = unsupported_terminal_without_history, history_unsupported_reason = terminal_timer_row_without_typed_history, row_status set to the mutable row state, and no resume_source_kind / resume_source_id. Once TimerFired exists, repair restores the missing workflow task that records ConditionWaitTimedOut instead of recreating a timer task that would fire the same deadline again.
Task rows also expose transport-health fields such as transport_state, task_missing, synthetic, expected_task_id, dispatch_failed, dispatch_overdue, claim_failed, last_dispatch_attempt_at, last_dispatched_at, last_dispatch_error, last_claim_failed_at, last_claim_error, and lease_expired, so Waterline can tell the difference between a missing task, a ready task whose last publish failed, a ready task an unsupported worker could not claim, a stale ready task that needs re-dispatch, and a leased task whose worker lease expired.
last_dispatched_at now means the most recent confirmed queue handoff for that durable task. If the engine tried to publish the task but the queue handoff failed, Waterline leaves last_dispatched_at unchanged and instead records the failed attempt in last_dispatch_attempt_at plus last_dispatch_error.
When the worker-claim backend capability gate rejects a task, Waterline leaves the task ready and shows transport_state = claim_failed with the claim failure timestamp and reason. No workflow replay, activity execution, activity-attempt row, timer-fire history, or lease is written for that rejected claim.
When an accepted update satisfies a condition wait while a ready unscoped workflow task already exists, the runtime republishes that existing durable task rather than creating a duplicate task row and annotates it with the update wait provenance. Seeing the same workflow task id with refreshed dispatch metadata after such an update is normal. If the only open workflow task belongs to another resume source, Waterline keeps the update wait repair-needed instead of borrowing that unrelated task.
Task rows now also expose compatibility, compatibility_supported, compatibility_reason, compatibility_supported_in_fleet, and compatibility_fleet_reason. When a task row predates task-level compatibility storage, Waterline reads the selected run's marker as the effective task marker until the migration or runtime claim path backfills the task row itself. The local pair tells you whether the current build can claim that marker. The fleet pair tells you whether any active worker heartbeat snapshot in the selected run's configured namespace and queue scope currently advertises it. When a task is ready on the current run, or its old lease has expired, but neither view advertises a compatible claimer, Waterline leaves the durable task state alone and reports that the task is waiting for a compatible worker instead of pretending the task vanished or inviting an unsafe Repair.
Child workflows use that same detail surface. On the parent run, Waterline shows the active child in waits with kind = child, the stable child_call_id, the current child run in resume_source_id, the durable lineage link in continuedWorkflows, and typed child events in timeline. If the child continues as new, those parent-facing surfaces keep the same child_call_id while following the child instance's newest durable run even if copied link rows or the child instance's mutable current_run_id drift. Once the parent records a typed child-resolution event, the resolved child wait stays resolved from that parent history even if the mutable child row later drifts, and the parent resume workflow task carries workflow_wait_kind = child, child_call_id, child_workflow_run_id, and the child_workflow_run resume source. If that workflow task row is lost before the parent applies the result, the selected run remains repair_needed from typed parent history and tasks includes a synthetic missing child-resolution row with the same identity fields. On the child run, the inverse parent link is visible in parents. Compatibility-only lineage rows now say so explicitly through history_authority and diagnostic_only, and the selected-run history export uses that same projected lineage payload for links.parents and links.children instead of doing separate live-link enrichment on export.
Child-wait detail example:
{
"wait_kind": "child",
"wait_reason": "Waiting for child workflow billing-child",
"liveness_state": "waiting_for_child",
"waits": [
{
"kind": "child",
"status": "open",
"child_call_id": "01JCHILDCALL0000000000001",
"target_name": "child-instance-123",
"target_type": "billing-child",
"task_backed": false,
"external_only": false,
"resume_source_kind": "child_workflow_run",
"resume_source_id": "01JCHILDRUN0000000000001"
}
],
"continuedWorkflows": [
{
"link_type": "child_workflow",
"child_call_id": "01JCHILDCALL0000000000001",
"child_workflow_id": "child-instance-123",
"child_workflow_run_id": "01JCHILDRUN0000000000001"
}
],
"timeline": [
{
"type": "ChildWorkflowScheduled",
"kind": "child",
"child_call_id": "01JCHILDCALL0000000000001",
"summary": "Scheduled child workflow billing-child."
},
{
"type": "ChildRunStarted",
"kind": "child",
"child_call_id": "01JCHILDCALL0000000000001",
"summary": "Child workflow billing-child started."
}
]
}
Missing child-resolution task example:
{
"wait_kind": "child",
"wait_reason": "Waiting to apply child workflow billing-child result",
"liveness_state": "repair_needed",
"open_wait_id": "child:01JCHILDCALL0000000000001",
"resume_source_kind": "child_workflow_run",
"resume_source_id": "01JCHILDRUN0000000000001",
"tasks": [
{
"id": "missing:workflow:child:01JCHILDCALL0000000000001",
"type": "workflow",
"status": "missing",
"transport_state": "missing",
"task_missing": true,
"synthetic": true,
"workflow_wait_kind": "child",
"workflow_open_wait_id": "child:01JCHILDCALL0000000000001",
"workflow_resume_source_kind": "child_workflow_run",
"workflow_resume_source_id": "01JCHILDRUN0000000000001",
"child_call_id": "01JCHILDCALL0000000000001",
"child_workflow_run_id": "01JCHILDRUN0000000000001"
}
]
}
Signal-wait detail example:
{
"wait_kind": "signal",
"wait_reason": "Waiting for signal approved-by",
"liveness_state": "waiting_for_signal",
"waits": [
{
"kind": "signal",
"status": "open",
"target_name": "approved-by",
"task_backed": false,
"external_only": true,
"resume_source_kind": "signal"
}
],
"timeline": [
{
"type": "SignalWaitOpened",
"kind": "signal",
"signal_name": "approved-by",
"summary": "Waiting for signal approved-by."
}
]
}
Once Waterline records SignalReceived, the external signal wait is resolved. The selected run should either show the backing workflow task if one still exists, or keep wait_kind = signal with open_wait_id = signal-application:{signal_id}, resume_source_kind = workflow_signal, and liveness_state = repair_needed if the signal was accepted but the workflow task row is gone before SignalApplied.
Historical-run detail payload example:
{
"id": "01J10000000000000000000020",
"instance_id": "order-123",
"selected_run_id": "01J10000000000000000000020",
"run_id": "01J10000000000000000000020",
"is_current_run": false,
"current_run_id": "01J10000000000000000000021",
"current_run_source": "continue_as_new_lineage",
"current_run_status": "waiting",
"current_run_status_bucket": "running",
"status": "completed",
"status_bucket": "completed",
"closed_reason": "continued",
"can_issue_terminal_commands": false,
"can_repair": false,
"read_only_reason": "Selected run is historical. Issue commands against the current active run.",
"continuedWorkflows": [
{
"link_type": "continue_as_new",
"child_workflow_run_id": "01J10000000000000000000021",
"status": "waiting",
"status_bucket": "running"
}
]
}
Closed current-run detail payload example:
{
"id": "01J10000000000000000000021",
"instance_id": "order-123",
"run_id": "01J10000000000000000000021",
"is_current_run": true,
"status": "cancelled",
"status_bucket": "failed",
"is_terminal": true,
"can_issue_terminal_commands": false,
"can_repair": false,
"read_only_reason": "Run is closed."
}
Repair-needed detail payload example:
{
"id": "01J10000000000000000000021",
"instance_id": "order-123",
"status": "waiting",
"wait_kind": "timer",
"liveness_state": "repair_needed",
"liveness_reason": "Timer 01J10000000000000000000031 is pending without an open timer task.",
"waits": [
{
"kind": "timer",
"status": "open",
"task_backed": false,
"task_id": null,
"task_type": null,
"task_status": null
}
],
"can_issue_terminal_commands": true,
"can_repair": true,
"read_only_reason": null
}
Accepted repair command response:
{
"outcome": "repair_dispatched",
"workflow_id": "order-123",
"run_id": "01J10000000000000000000021",
"command_id": "01J40000000000000000000022",
"workflow_type": "workflow.timer",
"command_status": "accepted",
"rejection_reason": null
}
Ordinary queue workers also run the same recovery rules automatically. If a run has a ready task whose dispatch is overdue, a workflow, activity, or timer task whose lease has expired, or a repair_needed run summary with no open workflow, child-resolution workflow, accepted-update, accepted-signal, condition-timeout workflow, pending-activity, or timer task row, the worker loop reuses or recreates the durable task, increments repair_count, and re-dispatches it without duplicating in-flight running activities. Each pass selects repair candidates scope-fair across connection, queue, and compatibility, then caps existing-task and missing-run work separately at scan_limit. When the missing task is a pending activity whose mutable execution row also disappeared, repair restores that execution from typed ActivityScheduled history before creating the replacement task. When the missing task is a delayed retry, the replacement is rebuilt from the latest typed ActivityRetryScheduled history so Waterline keeps showing the original retry deadline and retry metadata.
Accepted update and accepted signal application waits follow that same automatic path: if the apply-task row disappears after UpdateAccepted or SignalReceived, the worker loop recreates a ready workflow task with the same workflow_wait_kind, workflow_update_id or workflow_signal_id, workflow_command_id, open-wait id, and resume-source metadata that manual repair() would restore.
After repair or automatic worker recovery restores that durable workflow, child-resolution workflow, accepted-update, accepted-signal, condition-timeout workflow, activity, or timer task, run detail should move from repair_needed back to the healthy task-backed liveness state (workflow_task_ready, activity_task_ready, or timer_scheduled) instead of continuing to show repair-needed.
If a run is already waiting on a named signal, already has a healthy durable ready or leased task, is already inside a typed-history-backed running activity with no task row, or only has older diagnostic-only mutable activity, timer, or child state, repair() is accepted as repair_not_needed instead of inventing a new task.
Accepted repair commands also appear in the run timeline as RepairRequested entries. When repair restored a task, the timeline entry includes that task id and type so operators can see which durable resume source was repaired.
Accepted terminal command response:
{
"outcome": "cancelled",
"workflow_id": "order-123",
"run_id": "01J10000000000000000000021",
"command_id": "01J40000000000000000000021",
"workflow_type": "workflow.timer",
"command_status": "accepted",
"rejection_reason": null
}
Historical-run rejection response:
{
"outcome": "rejected_not_current",
"workflow_id": "order-123",
"run_id": "01J10000000000000000000020",
"requested_run_id": "01J10000000000000000000020",
"resolved_run_id": "01J10000000000000000000021",
"command_id": "01J40000000000000000000020",
"target_scope": "run",
"workflow_type": "workflow.timer",
"command_status": "rejected",
"rejection_reason": "selected_run_not_current"
}
For current-run-only commands, that historical-run response is now a durable engine command outcome created through either Waterline's canonical selected-run operator route or the legacy compatibility operator route. run_id and requested_run_id preserve the historical selected run that the operator addressed, while resolved_run_id points at the current run that callers should use next. The public webhook routes expose the same target_scope = run rejection payload when you address a historical run directly for those actions. Archive is the exception: it is selected-run scoped and may accept a historical run once that run is closed.
For the canonical Waterline operator routes:
POST /waterline/api/instances/{instanceId}/signals/{signal}returns200when the current run accepts that signal commandPOST /waterline/api/instances/{instanceId}/updates/{update}returns200when the current run accepts and the workflow worker completes that update command before the response returns, or202when the request body useswait_for = acceptedor the configured/per-request completion wait budget expires first; timed-out completion waits returnupdate_status = accepted,wait_for = completed,wait_timed_out = true, andwait_timeout_seconds, and write requests reject anywait_forother thanacceptedorcompletedwith422POST /waterline/api/instances/{instanceId}/repairreturns200withrepair_dispatchedorrepair_not_neededwhen the current run accepts the repair commandPOST /waterline/api/instances/{instanceId}/cancelreturns200when the current run is closed ascancelledPOST /waterline/api/instances/{instanceId}/terminatereturns200when the current run is closed asterminatedPOST /waterline/api/instances/{instanceId}/archivereturns200witharchivedorarchive_not_neededwhen the instance's current run is closed and accepts the archive command- all six endpoints return
404when{instanceId}resolves to no instance or to an instance without a current run - all six endpoints return
409when the underlying v2 command is rejected
For the canonical selected-run operator routes:
POST /waterline/api/instances/{instanceId}/runs/{runId}/signals/{signal}returns200when that selected run is still current and accepts the signal commandPOST /waterline/api/instances/{instanceId}/runs/{runId}/updates/{update}returns200when that selected run is still current and the workflow worker completes the accepted update before the response returns, or202when the request body useswait_for = acceptedor the completion wait budget expires first; write requests reject anywait_forother thanacceptedorcompletedwith422POST /waterline/api/instances/{instanceId}/runs/{runId}/repairreturns200withrepair_dispatchedorrepair_not_neededwhen that selected run is still current and accepts the repair commandPOST /waterline/api/instances/{instanceId}/runs/{runId}/cancelreturns200when that selected run is still current and closes ascancelledPOST /waterline/api/instances/{instanceId}/runs/{runId}/terminatereturns200when that selected run is still current and closes asterminatedPOST /waterline/api/instances/{instanceId}/runs/{runId}/archivereturns200witharchivedorarchive_not_neededwhen that selected run is closed, including closed historical runs in the same instance- all six endpoints return
404when{instanceId}or{runId}does not resolve to that instance-selection pair - the current-run-only endpoints return
409withtarget_scope = runwhen the selected run is historical or otherwise rejected; archive returns409when the selected run is still open or the underlying archive command is otherwise rejected
For the current Waterline compatibility routes:
POST /waterline/api/flows/{id}/signals/{signal}returns200when the selected current run accepts the signal commandPOST /waterline/api/flows/{id}/updates/{update}returns200when the selected current run accepts and the workflow worker completes the update before the response returns, or202when the request body useswait_for = acceptedor the completion wait budget expires first; write requests reject anywait_forother thanacceptedorcompletedwith422POST /waterline/api/flows/{id}/repairreturns200withrepair_dispatchedorrepair_not_neededwhen the selected current run accepts the repair commandPOST /waterline/api/flows/{id}/cancelreturns200when the current run is closed ascancelledPOST /waterline/api/flows/{id}/terminatereturns200when the current run is closed asterminatedPOST /waterline/api/flows/{id}/archivereturns200witharchivedorarchive_not_neededwhen{id}resolves to a closed selected run or to an instance whose current run is closed- all six endpoints return
404when{id}resolves to neither a run nor an instance with a current run - the current-run-only endpoints return
409when the selected run is historical or when the underlying v2 command is rejected; archive returns409when the selected run is still open or the underlying archive command is otherwise rejected /waterline/api/flows/failed,/waterline/api/flows/cancelled, and/waterline/api/flows/terminatednow split terminal list screens by raw runstatus; cancelled and terminated rows still carrystatus_bucket = failedas the compatibility bridge
Dashboard View
Workflow View
Refer to https://github.com/durable-workflow/waterline for installation and configuration instructions.