YAML Reference
Complete field-by-field reference for daggle DAG definitions. A JSON Schema is available for editor autocomplete.
DAG-level fields
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | yes | Unique DAG identifier |
version |
string | no | Schema version (default: "1") |
steps |
array | yes | List of execution steps |
trigger |
object | no | Automated execution triggers |
workdir |
string | no | Working directory for all steps |
max_parallel |
integer | no | Cap on concurrent step execution (0 = unbounded) |
env |
object | no | Environment variables for all steps |
params |
array | no | Parameters with optional defaults |
r_version |
string | no | R version constraint (e.g. ">=4.1.0") |
r_version_strict |
boolean | no | Fail if R version doesn’t match (default: warn) |
owner |
string | no | DAG owner (e.g. "alice"). Filterable via daggle list --owner |
team |
string | no | Owning team (e.g. "data"). Filterable via daggle list --team |
description |
string | no | Free-form one-line description |
tags |
string[] | no | Tags for filtering (e.g. [etl, daily, critical]). Filterable via daggle list --tag |
exposures |
array | no | Downstream consumers of this DAG (see below) |
on_success |
hook | no | Hook to run when DAG completes |
on_failure |
hook | no | Hook to run when DAG fails |
on_exit |
hook | no | Hook to run on DAG exit (success or failure) |
Exposures
exposures: lists downstream consumers (dashboards, reports, apps) so daggle impact and the API can report who is affected when the DAG changes. Purely informational — daggle does not deploy or monitor them.
exposures:
- name: ops-dashboard
type: dashboard
url: https://dash.example.com/ops
description: Main operations dashboard
- name: weekly-email
type: report| Field | Type | Required | Description |
|---|---|---|---|
name |
string | yes | Unique exposure identifier within the DAG |
type |
enum | yes | One of shiny, quarto, dashboard, report, other |
url |
string | no | Link to the exposure |
description |
string | no | Free-form description |
Step fields
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | yes | Unique step identifier |
| step type | varies | yes | Exactly one step type field (see Step Types) |
args |
string[] | no | Command-line arguments |
depends |
string[] | no | IDs of upstream steps |
timeout |
duration | no | Maximum duration (e.g. "30s", "5m", "1h") |
retry |
object | no | Retry configuration |
env |
object | no | Step-level environment variables |
workdir |
string | no | Step-level working directory |
when |
object | no | Conditional execution |
preconditions |
array | no | Health checks before running |
error_on |
enum | no | Error sensitivity: "error", "warning", "message" |
matrix |
object | no | Parameter grid for expansion |
max_parallel |
integer | no | Max parallel matrix instances |
on_success |
hook | no | Step-level success hook |
on_failure |
hook | no | Step-level failure hook |
Full example
name: my-pipeline
version: "1"
r_version: ">=4.1.0"
trigger:
schedule: "30 6 * * MON-FRI"
workdir: /opt/projects/etl
env:
DB_HOST: "localhost"
REPORT_DATE: "{{ .Today }}"
params:
- name: department
default: "sales"
on_success:
r_expr: 'logger::log_info("Pipeline complete")'
on_failure:
command: echo "Pipeline failed!" | mail -s "Alert" team@example.com
steps:
- id: extract
script: etl/extract.R
args: ["--dept", "{{ .Params.department }}"]
timeout: 10m
retry:
limit: 3
backoff: exponential
max_delay: 60s
env:
BATCH_SIZE: "1000"
- id: validate
r_expr: |
data <- readRDS("data/raw.rds")
stopifnot(nrow(data) > 0)
cat("::daggle-output name=row_count::", nrow(data), "\n")
depends: [extract]
error_on: warning
- id: report
script: etl/report.R
args: ["--rows", "$DAGGLE_OUTPUT_VALIDATE_ROW_COUNT"]
depends: [validate]Editor support
Add this comment to the top of your YAML files for VS Code autocomplete (requires the YAML extension):
# yaml-language-server: $schema=https://github.com/cynkra/daggle/raw/main/docs/daggle-schema.json