File Layout

daggle uses a fixed directory structure for configuration, DAG definitions, and run data.

Directory structure

~/.config/daggle/
  config.yaml                    # Global configuration (tool paths, cleanup, etc.)
  dags/                          # Global DAG definitions
    my-pipeline.yml
    nightly-etl.yml
  projects.yaml                  # Registered project directories

~/.local/share/daggle/
  runs/                          # Run data, organized by DAG and date
    <dag>/
      <YYYY-MM-DD>/
        run_<id>/
          meta.json              # Run metadata (params, start time, status)
          events.jsonl           # Append-only event log (see Event Schema)
          dag.yaml               # Snapshot of the DAG YAML at run start
          dag_diff.patch         # Unified diff vs. the prior run (only if dag_hash changed)
          <step>.stdout.log      # Step stdout (includes output markers)
          <step>.stderr.log      # Step stderr
          <step>.inline.R        # Rendered inline R code (for r_expr steps)
          <step>.sessioninfo.json # R sessionInfo() — written only on R step failure
  proc/
    scheduler.pid                # Scheduler daemon PID file

.daggle/                         # Project-local DAG definitions (repo root)
  my-dag.yml

DAG discovery order

When resolving a DAG by name, daggle searches in this order:

  1. --dags-dir flag (if provided)
  2. DAGGLE_DAGS_DIR environment variable (if set)
  3. .daggle/ in the current working directory (project-local)
  4. ~/.config/daggle/dags/ (global)

The first match wins.

Run directory contents

Each run gets its own directory under runs/<dag>/<date>/run_<id>/.

File Description
meta.json Run metadata: DAG name, parameters, start/end timestamps, final status.
events.jsonl Append-only event log. See Event Schema.
dag.yaml Copy of the DAG YAML taken at run start. Used by dag_diff.patch and by daggle archive to make runs self-describing and reproducible.
dag_diff.patch Unified diff of this run’s dag.yaml against the previous run’s. Written only when the DAG hash changed between runs. Gives a self-contained “what changed?” record without requiring git.
<step>.stdout.log Captured stdout for each step. Includes raw output markers.
<step>.stderr.log Captured stderr for each step.
<step>.inline.R Rendered R source for r_expr steps (useful for debugging).
<step>.sessioninfo.json sessionInfo() snapshot written only when an R step fails. Contains r_version, platform, error_message, session_info (full text), and timestamp. Useful for compliance and post-mortem debugging — proves which package versions were active at the moment of failure without re-running R.

Overriding directories

Mechanism Config dir Data dir DAGs dir
CLI flags --data-dir --dags-dir
Environment variables DAGGLE_CONFIG_DIR DAGGLE_DATA_DIR DAGGLE_DAGS_DIR
XDG fallback $XDG_CONFIG_HOME/daggle $XDG_DATA_HOME/daggle
Default ~/.config/daggle ~/.local/share/daggle (discovery order)

Priority is top to bottom: CLI flags override environment variables, which override XDG, which override defaults.

Global configuration (config.yaml)

The file ~/.config/daggle/config.yaml holds global settings. All fields are optional.

# Override tool paths (useful when the scheduler can't find binaries)
tools:
  rscript: /usr/local/bin/Rscript
  quarto: /opt/homebrew/bin/quarto
  git: /usr/bin/git

# Automatic cleanup of old run data
cleanup:
  older_than: "30d"
  interval: "6h"

# Named notification channels, referenced from hooks by name.
# See the Hooks page for usage.
notifications:
  team-slack:
    type: slack
    webhook_url: https://hooks.slack.com/services/...
  ops-email:
    type: smtp
    smtp_host: mail.example.com
    smtp_port: 587
    smtp_from: daggle@example.com
    smtp_to: [ops@example.com]
    smtp_user: daggle
    smtp_password: ${SMTP_PASSWORD}

Tool path resolution

At startup, daggle resolves each tool path using this precedence:

  1. tools: in config.yaml — explicit absolute path (highest priority)
  2. exec.LookPath — searches the current PATH
  3. Bare binary name — fallback (may fail if not on PATH)

Auto-detection: When daggle discovers a tool via PATH lookup (step 2), it automatically saves the resolved absolute path to config.yaml. This means running any daggle command once from an interactive shell (e.g., daggle doctor or daggle run) will persist the tool paths. Future scheduler runs — even as a system service with a minimal PATH — will find the saved paths in config.

Run daggle doctor to see which paths daggle resolved.