Skip to main content
Version: 2.0 prerelease

Monitoring

Waterline is a separate UI that works alongside Horizon. Think of Waterline as being to workflows what Horizon is to queues.

Waterline ships only with the embedded Laravel host that installs the durable-workflow/workflow package; it reads that app's durable state in process. The standalone server distribution does not run Waterline. Operators who run the standalone server read the equivalent durable-state facts through GET /api/system/health, GET /api/system/operator-metrics, and the workflow control-plane routes documented in the Server API Reference. The Operator Operating Envelope maps the Waterline routes below onto their server-side counterparts.

Durable Workflow has two observability planes:

PlaneSource of truthTypical questions
Durable stateWorkflow database, Waterline projections, and history exportDid the workflow start? Which run is current? Which signal, update, timer, activity, retry, or failure was committed? Which operator action is safe now?
Worker/runtime telemetryQueue worker logs, SDK metrics recorders, Prometheus/OpenMetrics endpoints, and application tracesAre workers polling? How long do tasks take? Is an exporter configured? Did custom application metrics leave the worker process?

Waterline intentionally does not replace worker metrics. If a custom metric was recorded in activity or worker code, scrape the worker's telemetry endpoint. Use Waterline to correlate that runtime signal with the durable workflow history and current run state.

When worker telemetry shows repeated claims, late completion races, or stuck leases, read Execution Guarantees and Idempotency alongside this guide. That contract separates at-least-once transport uncertainty from duplicate durable outcomes so duplicate-looking evidence does not turn into the wrong operational conclusion.

Dashboard View

Waterline dashboard

The dashboard shows running totals, recent-run counters, and fleet-wide metrics so you can tell at a glance whether work is flowing, stalling, or failing.

Use the Operator Operating Envelope when you need the rollout and runbook contract for those facts: which diagnostics block traffic, which are advisory, how queue-health facts split between Waterline and worker telemetry, and how to verify rebuild, export, and archive paths.

Workflow View

Waterline workflow detail

The workflow detail view shows the durable timeline for a single run: the activities, signals, timers, and child workflows that happened in order, each with its inputs, outputs, and timing.

Installing Waterline

Install Waterline into your Laravel application alongside the workflow package and run its migrations. See durable-workflow/waterline for the full installation and configuration guide.

List and detail API

Waterline's list views (/waterline/api/flows/{bucket}) and selected-run detail endpoint (/waterline/api/flows/{id}) return typed JSON contracts that you can consume directly from your own dashboards or scripts. The Waterline Operator API Reference documents the endpoint list, selected-run field families, history export, actionability, schedules, saved views, preferences, and operator-action contract.

Actionability Contract

Waterline annotates list rows, selected-run detail responses, and history exports with a versioned actionability contract. Consumers should treat actionability_contract.schema = waterline.actionability and actionability_contract.version = 1 as the contract identifier for the fields below.

Run-level actionability answers whether the selected run can be repaired:

FieldMeaning
repair_stateOne of repairable, blocked, not_needed, or unknown.
repairableBoolean shorthand for repair_state = repairable.
blocked_reasonStable reason code when repair_state = blocked.
status_bucketThe Waterline bucket that shaped the run-level decision.
closed_reasonDurable close reason when the run is closed.
task_problemWhether Waterline saw a task-level problem on the run.
diagnostic_only_evidenceTrue when at least one child evidence row is informative but not a resume source.

Evidence rows under activities, waits, timers, exceptions, logs, and timeline/export entries can also include their own actionability block:

FieldMeaning
stateactionable when the row is a valid repair source, otherwise diagnostic_only.
repair_sourceTrue only for rows backed by a repairable source authority.
diagnostic_onlyTrue when the row must not be used as a resume source.
history_authoritySource authority, such as typed_history, mutable_open_fallback, failure_row_fallback, or unsupported_terminal_without_history.
history_unsupported_reasonStable reason code for unsupported fallback history.

Automation should gate repair, resume, and replay affordances from actionability.repair_state, actionability.repairable, and row-level actionability.repair_source. A row with diagnostic_only = true is never a durable resume source, even when it contains useful failure or fallback metadata. Rows with history_authority = unsupported_terminal_without_history are diagnostic evidence only; they explain why a run is blocked, but they do not prove enough typed history to rebuild progress safely.

Control-plane actions from Waterline

Operators can cancel, terminate, repair, and archive workflows directly from the detail view. Each action maps to a POST on the same run id and returns either 200 with the resulting state or 409 when the action is not valid for the run's current state.

  • Execution Guarantees and Idempotency explains the replay, retry, lease-expiry, and durable-outcome contract that shapes operator evidence.
  • Operator Operating Envelope ties health, queue state, rebuild, export, archive, and topology expectations into one operator contract.
  • Failures and Recovery explains retry exhaustion, non-retryable failures, timeouts, and repair behavior behind the dashboard facts.
  • AI-Assisted Development names the Waterline, CLI, MCP, and LLM-readable contracts that agents should use when diagnosing workflow state.