YAML Reference

Complete field-by-field reference for daggle DAG definitions. A JSON Schema is available for editor autocomplete.

DAG-level fields

Field	Type	Required	Description
`name`	string	yes	Unique DAG identifier
`version`	string	no	Schema version (default: `"1"`)
`steps`	array	yes	List of execution steps
`trigger`	object	no	Automated execution triggers
`workdir`	string	no	Working directory for all steps
`max_parallel`	integer	no	Cap on concurrent step execution (0 = unbounded)
`env`	object	no	Environment variables for all steps
`params`	array	no	Parameters with optional defaults
`r_version`	string	no	R version constraint (e.g. `">=4.1.0"`)
`r_version_strict`	boolean	no	Fail if R version doesn’t match (default: warn)
`owner`	string	no	DAG owner (e.g. `"alice"`). Filterable via `daggle list --owner`
`team`	string	no	Owning team (e.g. `"data"`). Filterable via `daggle list --team`
`description`	string	no	Free-form one-line description
`tags`	string[]	no	Tags for filtering (e.g. `[etl, daily, critical]`). Filterable via `daggle list --tag`
`exposures`	array	no	Downstream consumers of this DAG (see below)
`on_success`	hook	no	Hook to run when DAG completes
`on_failure`	hook	no	Hook to run when DAG fails
`on_exit`	hook	no	Hook to run on DAG exit (success or failure)

Exposures

exposures: lists downstream consumers (dashboards, reports, apps) so daggle impact and the API can report who is affected when the DAG changes. Purely informational — daggle does not deploy or monitor them.

exposures:
  - name: ops-dashboard
    type: dashboard
    url: https://dash.example.com/ops
    description: Main operations dashboard
  - name: weekly-email
    type: report

Field	Type	Required	Description
`name`	string	yes	Unique exposure identifier within the DAG
`type`	enum	yes	One of `shiny`, `quarto`, `dashboard`, `report`, `other`
`url`	string	no	Link to the exposure
`description`	string	no	Free-form description

Step fields

Field	Type	Required	Description
`id`	string	yes	Unique step identifier
step type	varies	yes	Exactly one step type field (see Step Types)
`args`	string[]	no	Command-line arguments
`depends`	string[]	no	IDs of upstream steps
`timeout`	duration	no	Maximum duration (e.g. `"30s"`, `"5m"`, `"1h"`)
`retry`	object	no	Retry configuration
`env`	object	no	Step-level environment variables
`workdir`	string	no	Step-level working directory
`when`	object	no	Conditional execution
`preconditions`	array	no	Health checks before running
`error_on`	enum	no	Error sensitivity: `"error"`, `"warning"`, `"message"`
`matrix`	object	no	Parameter grid for expansion
`max_parallel`	integer	no	Max parallel matrix instances
`on_success`	hook	no	Step-level success hook
`on_failure`	hook	no	Step-level failure hook

Full example

name: my-pipeline
version: "1"
r_version: ">=4.1.0"

trigger:
  schedule: "30 6 * * MON-FRI"

workdir: /opt/projects/etl
env:
  DB_HOST: "localhost"
  REPORT_DATE: "{{ .Today }}"

params:
  - name: department
    default: "sales"

on_success:
  r_expr: 'logger::log_info("Pipeline complete")'
on_failure:
  command: echo "Pipeline failed!" | mail -s "Alert" team@example.com

steps:
  - id: extract
    script: etl/extract.R
    args: ["--dept", "{{ .Params.department }}"]
    timeout: 10m
    retry:
      limit: 3
      backoff: exponential
      max_delay: 60s
    env:
      BATCH_SIZE: "1000"

  - id: validate
    r_expr: |
      data <- readRDS("data/raw.rds")
      stopifnot(nrow(data) > 0)
      cat("::daggle-output name=row_count::", nrow(data), "\n")
    depends: [extract]
    error_on: warning

  - id: report
    script: etl/report.R
    args: ["--rows", "$DAGGLE_OUTPUT_VALIDATE_ROW_COUNT"]
    depends: [validate]

Editor support

Add this comment to the top of your YAML files for VS Code autocomplete (requires the YAML extension):

# yaml-language-server: $schema=https://github.com/cynkra/daggle/raw/main/docs/daggle-schema.json