Tutorial: Multi-DAG Workflows

This tutorial shows two approaches to chaining DAGs: event-driven triggers for loosely coupled workflows, and sub-DAG calls for tightly composed pipelines.

Examples below use the daggleR companion package for in-step helpers; install it with pak::pkg_install("cynkra/daggleR").

Approach 1: on_dag trigger

Independent DAGs, loose coupling. The upstream DAG does not know about the downstream DAG.

ETL pipeline (runs on schedule)

name: etl-pipeline
trigger:
  schedule: "0 3 * * *"    # 3 AM daily

steps:
  - id: extract
    r_expr: |
      data <- read.csv("/data/incoming/daily_export.csv")
      saveRDS(data, "data/raw.rds")
      daggleR::daggle_output("n_rows", nrow(data))

  - id: clean
    r_expr: |
      raw <- readRDS("data/raw.rds")
      clean <- raw[complete.cases(raw), ]
      saveRDS(clean, "data/clean.rds")
      daggleR::daggle_output("n_clean", nrow(clean))
    depends: [extract]

  - id: load
    r_expr: |
      clean <- readRDS("data/clean.rds")
      con <- DBI::dbConnect(RSQLite::SQLite(), "warehouse.db")
      DBI::dbWriteTable(con, "daily_data", clean, append = TRUE)
      DBI::dbDisconnect(con)
      daggleR::daggle_output("loaded", "true")
    depends: [clean]

Report pipeline (triggers on ETL completion)

name: report-pipeline
trigger:
  on_dag:
    name: etl-pipeline
    status: completed
    pass_outputs: true

steps:
  - id: generate
    r_expr: |
      loaded <- daggleR::daggle_get_output("load", "loaded")
      cat(sprintf("ETL completed (loaded=%s), generating report\n", loaded))
      # ... build report from warehouse data ...
      daggleR::daggle_output("report_path", "output/daily_report.html")

  - id: distribute
    command: |
      cp output/daily_report.html /shared/reports/
      echo "Report distributed"
    depends: [generate]

When etl-pipeline completes successfully, daggle automatically starts report-pipeline. The pass_outputs: true flag makes the upstream DAG’s outputs available as environment variables in the downstream DAG.

Running

Start the scheduler to activate both triggers:

daggle serve

The ETL pipeline runs at 3 AM. When it finishes, the report pipeline starts automatically. If the ETL pipeline fails, the report pipeline does not trigger (because status: completed requires success).

You can also set status: failed to build alert workflows, or status: any to trigger regardless of outcome.

Approach 2: call step

Parent controls child. The sub-DAG runs inline as part of the parent.

Full pipeline (composes the ETL DAG)

name: full-pipeline
steps:
  - id: run-etl
    call:
      dag: etl-pipeline

  - id: report
    r_expr: |
      cat("Generating report from warehouse data\n")
      # ... build report ...
      daggleR::daggle_output("report_path", "output/daily_report.html")
    depends: [run-etl]

  - id: distribute
    command: |
      cp output/daily_report.html /shared/reports/
      echo "Report distributed"
    depends: [report]

The call: step runs etl-pipeline to completion. If it fails, the run-etl step fails and blocks downstream steps. The parent DAG has full control over the execution flow.

You can pass parameters to the sub-DAG:

- id: run-etl
  call:
    dag: etl-pipeline
    params:
      source: "api"

When to use each approach

	`on_dag` trigger	`call` step
Coupling	Loose – DAGs are independent	Tight – parent owns the child
Failure	Does not affect upstream DAG	Fails the parent step
Execution	Async, separate run	Inline, blocks parent
Scheduling	Each DAG has its own triggers	Sub-DAG runs when parent runs
Visibility	Two separate runs in history	One run with nested steps

Use on_dag triggers when:

The DAGs are maintained by different teams
You want the upstream DAG to remain unaware of downstream consumers
You need multiple DAGs to react to the same upstream event
Failure in the downstream DAG should not affect the upstream

Use call steps when:

The sub-DAG is a logical component of a larger workflow
You need the parent to fail if the sub-DAG fails
You want a single run ID tracking the entire pipeline
You want to pass parameters from parent to child