YAML Schema Reference

Field-by-field reference for daggle DAG definitions. The machine-readable JSON Schema can be used for editor validation.

Editor autocomplete

Add this comment as the first line of any DAG YAML file to enable autocomplete in VS Code (requires the YAML extension):

# yaml-language-server: $schema=https://github.com/cynkra/daggle/raw/main/docs/daggle-schema.json
name: my-dag
steps:
  - id: hello
    command: echo "hello"

DAG-level fields

Field Type Required Default Description
name string yes Unique DAG identifier
version string no "1" Schema version
steps array yes List of step objects
trigger object no (none) Automated execution triggers. See trigger fields
workdir string no (DAG file dir) Working directory for all steps
env object no {} Environment variables for all steps. Values are strings or {value, secret} objects
params array no [] Parameters: name (required), default (optional)
r_version string no (none) R version constraint (e.g. ">=4.1.0")
r_version_strict boolean no false Fail if R version doesn’t match (default: warn only)
on_success hook no (none) Hook after all steps succeed
on_failure hook no (none) Hook when DAG fails
on_exit hook no (none) Hook on DAG exit regardless of outcome

Trigger fields

The trigger object controls when the scheduler automatically starts a DAG. Multiple triggers can coexist – any match starts a run.

Field Type Description
schedule string Cron expression (e.g. "30 6 * * MON-FRI", "@every 5m")
overlap enum "skip" (default) or "cancel" – what to do if DAG is already running
deadline string Time-of-day deadline in HH:MM format. Used with on_deadline hook
on_deadline hook Hook executed when the deadline is reached and the DAG is still running
watch object File watcher trigger
webhook object HTTP POST trigger
on_dag object Trigger on another DAG’s completion
condition object Polling trigger (R expr or shell command)
git object Trigger on new git commits

watch fields

Field Type Required Description
path string yes Directory to watch
pattern string no Glob pattern (e.g. "*.csv")
debounce duration no Wait for writes to settle (e.g. "5s")

webhook fields

Field Type Required Description
secret string no HMAC-SHA256 secret for request validation

on_dag fields

Field Type Required Description
name string yes Name of upstream DAG
status enum no "completed" (default), "failed", or "any"
pass_outputs boolean no Pass upstream outputs as env vars

condition fields

Field Type Required Description
r_expr string one of R expression to evaluate
command string one of Shell command to evaluate
poll_interval duration no How often to check (default: "5m")

git fields

Field Type Required Description
branch string no Branch to monitor
events array no "push", "tag"
poll_interval duration no How often to poll (default: "30s")

Step fields

Each step requires an id and exactly one step type field.

Common fields

Field Type Required Default Description
id string yes Unique step identifier
args string[] no [] Command-line arguments
depends string[] no [] IDs of upstream steps
timeout duration no (none) Maximum duration (e.g. "30s", "5m")
retry object no (none) See retry config
env object no {} Step-level environment variables
workdir string no (DAG workdir) Step-level working directory
when object no (none) Skip step if condition fails. Fields: r_expr or command
preconditions array no [] Fail step if checks fail. Each has r_expr or command
error_on enum no "error" "error", "warning", or "message"
matrix object no (none) Parameter grid. Keys = names, values = string arrays
max_parallel integer no (all) Max concurrent matrix instances
artifacts array no [] Output file declarations. Each entry has name, path, format (optional), versioned (optional). See artifacts config
cache boolean no false Enable step-level caching. When true, the step is skipped if inputs have not changed since the last successful run
freshness array no [] Source freshness checks on input files. Each entry has path, max_age, on_stale (optional). See freshness config
output_dir string no (none) Output directory template for rendered reports (Quarto/Rmd). Supports template variables
output_name string no (none) Output filename template for rendered reports. Supports template variables
on_success hook no (none) Step success hook
on_failure hook no (none) Step failure hook

Step types

Exactly one step type field must be set per step.

Core:

Field Type Description
script string Path to R script. Runs via Rscript --no-save --no-restore
r_expr string Inline R expression. Written to temp file, runs via Rscript
command string Shell command. Runs via sh -c

Documents:

Field Type Description
quarto string Quarto document or project path. Runs quarto render
rmd string R Markdown file path. Runs rmarkdown::render()

R package development:

Field Type Description
test string Package path. Runs devtools::test()
check string Package path. Runs rcmdcheck::rcmdcheck()
document string Package path. Runs roxygen2::roxygenize()
lint string Package path. Runs lintr::lint_package()
style string Package path. Runs styler::style_pkg()
coverage string Package path. Runs covr::package_coverage()
pkgdown string Package path. Runs pkgdown::build_site()
benchmark string Directory path. Runs bench scripts
revdepcheck string Package path. Runs revdepcheck::revdep_check()

Environment:

Field Type Description
renv_restore string Project path. Runs renv::restore()
install string Package names. Installs via pak or install.packages()

Deployment:

Field Type Description
connect object Deploy to Posit Connect. Fields: type (shiny/quarto/plumber), path, name, force_update
pin object Publish via pins. Fields: board, name, object, type, versioned
vetiver object MLOps deploy. Fields: action (pin/deploy), model, board, name

Workflow:

Field Type Description
approve object Approval gate. Fields: message, timeout, notify (hook)
call object Sub-DAG. Fields: dag (name), params (map)
targets string Project path. Runs targets::tar_make()
shinytest string App directory. Runs shinytest2::test_app()
validate string Script path. Runs validation via Rscript

Data sources:

Field Type Description
database object SQL query via R DBI. Fields: driver, params (map), query or query_file, output (format inferred from extension)

Notifications:

Field Type Description
email object Send email via SMTP. Fields: channel, subject, body/body_file, to, cc, bcc, from, attach

Isolation:

Field Type Description
docker object Run inside a Docker container. Fields: image, command/entrypoint, volumes, env, workdir, network, user, pull

Retry config

Field Type Default Description
limit integer 0 Number of retries (total attempts = limit + 1)
backoff enum "linear" "linear" or "exponential"
max_delay duration (none) Cap on delay between retries

Hook fields

Hooks appear in on_success, on_failure, and on_exit. A hook must contain exactly one of:

Field Type Description
r_expr string Inline R expression
command string Shell command

Artifacts config

Each entry in the artifacts array declares an output file produced by the step.

Field Type Required Default Description
name string yes Unique name within the step
path string yes Path relative to workdir
format string no (none) File format hint (e.g. parquet, rds, csv, png)
versioned boolean no false If true, append epoch timestamp to filename

Example:

steps:
  - id: train
    script: train.R
    artifacts:
      - name: model
        path: output/model.rds
        format: rds
        versioned: true
      - name: metrics
        path: output/metrics.csv
        format: csv

Artifacts are tracked by the engine. After a run, their hashes and sizes are recorded and available via GET /api/v1/dags/{name}/runs/{run_id}/artifacts.

Freshness config

Each entry in the freshness array declares a freshness expectation on an input file.

Field Type Required Default Description
path string yes Path to the input file
max_age duration yes Maximum acceptable age (e.g. "24h", "1h30m")
on_stale enum no "fail" "fail" (default) or "warn" – action when the file is older than max_age

Example:

steps:
  - id: transform
    script: transform.R
    freshness:
      - path: data/raw.csv
        max_age: "24h"
        on_stale: fail
      - path: data/lookup.csv
        max_age: "7d"
        on_stale: warn

Template variables

String fields (args, env values, workdir) support Go text/template syntax:

Variable Example Description
{ .Today } 2026-04-06 Current date (YYYY-MM-DD)
{ .Now } 2026-04-06T08:30:00Z Current time (RFC3339)
{ .Params.name } sales Parameter value
{ .Env.KEY } localhost DAG-level environment variable
{ .Matrix.key } lm Matrix parameter (in expanded steps)