File Layout

daggle uses a fixed directory structure for configuration, DAG definitions, and run data.

Directory structure

~/.config/daggle/
  config.yaml                    # Global configuration (tool paths, cleanup, etc.)
  dags/                          # Global DAG definitions
    my-pipeline.yml
    nightly-etl.yml
  projects.yaml                  # Registered project directories

~/.local/share/daggle/
  runs/                          # Run data, organized by DAG and date
    <dag>/
      <YYYY-MM-DD>/
        run_<id>/
          meta.json              # Run metadata (params, start time, status)
          events.jsonl           # Append-only event log (see Event Schema)
          dag.yaml               # Snapshot of the DAG YAML at run start
          dag_diff.patch         # Unified diff vs. the prior run (only if dag_hash changed)
          <step>.stdout.log      # Step stdout (includes output markers)
          <step>.stderr.log      # Step stderr
          <step>.inline.R        # Rendered inline R code (for r_expr steps)
          <step>.sessioninfo.json # R sessionInfo() — written only on R step failure
  proc/
    scheduler.pid                # Scheduler daemon PID file

.daggle/                         # Project-local DAG definitions (repo root)
  my-dag.yml

DAG discovery order

When resolving a DAG by name, daggle searches in this order:

  1. --dags-dir flag (if provided)
  2. DAGGLE_DAGS_DIR environment variable (if set)
  3. .daggle/ in the current working directory (project-local)
  4. ~/.config/daggle/dags/ (global)

The first match wins.

Run directory contents

Each run gets its own directory under runs/<dag>/<date>/run_<id>/.

File Description
meta.json Run metadata: DAG name, parameters, start/end timestamps, final status.
events.jsonl Append-only event log. See Event Schema.
dag.yaml Copy of the DAG YAML taken at run start. Used by dag_diff.patch and by daggle archive to make runs self-describing and reproducible.
dag_diff.patch Unified diff of this run’s dag.yaml against the previous run’s. Written only when the DAG hash changed between runs. Gives a self-contained “what changed?” record without requiring git.
<step>.stdout.log Captured stdout for each step. Includes raw output markers.
<step>.stderr.log Captured stderr for each step.
<step>.inline.R Rendered R source for r_expr steps (useful for debugging).
<step>.sessioninfo.json sessionInfo() snapshot written only when an R step fails. Contains r_version, platform, error_message, session_info (full text), and timestamp. Useful for compliance and post-mortem debugging — proves which package versions were active at the moment of failure without re-running R.

Overriding directories

Mechanism Config dir Data dir DAGs dir
CLI flags --data-dir --dags-dir
Environment variables DAGGLE_CONFIG_DIR DAGGLE_DATA_DIR DAGGLE_DAGS_DIR
XDG fallback $XDG_CONFIG_HOME/daggle $XDG_DATA_HOME/daggle
Default ~/.config/daggle ~/.local/share/daggle (discovery order)

Priority is top to bottom: CLI flags override environment variables, which override XDG, which override defaults.

Global configuration (config.yaml)

The file ~/.config/daggle/config.yaml holds global settings. All fields are optional.

# Override tool paths (useful when the scheduler can't find binaries)
tools:
  rscript: /usr/local/bin/Rscript
  quarto: /opt/homebrew/bin/quarto
  git: /usr/bin/git

# Scheduler behaviour (all optional)
scheduler:
  poll_interval: "30s"      # how often to scan DAG files AND fire due cron
                            # entries. Lower this if you have sub-30s cron
                            # schedules (e.g. "500ms" for @every 1s).
  max_concurrent: 4         # cap on concurrent DAG runs across the daemon
  watch_debounce: "500ms"   # quiet window for file-watcher triggers
  max_catchup_runs: 100     # cap on `catchup: all` fires per startup

# Automatic cleanup of old run data
cleanup:
  older_than: "30d"
  interval: "6h"

# Named notification channels, referenced from hooks by name.
# See the Hooks page for usage.
notifications:
  team-slack:
    type: slack
    webhook_url: https://hooks.slack.com/services/...
  ops-email:
    type: smtp
    smtp_host: mail.example.com
    smtp_port: 587
    smtp_from: daggle@example.com
    smtp_to: [ops@example.com]
    smtp_user: daggle
    smtp_password: ${SMTP_PASSWORD}

Tool path resolution

At startup, daggle resolves each tool path using this precedence:

  1. tools: in config.yaml — explicit absolute path (highest priority)
  2. exec.LookPath — searches the current PATH
  3. Bare binary name — fallback (may fail if not on PATH)

Auto-detection: When daggle discovers a tool via PATH lookup (step 2), it automatically saves the resolved absolute path to config.yaml. This means running any daggle command once from an interactive shell (e.g., daggle doctor or daggle run) will persist the tool paths. Future scheduler runs — even as a system service with a minimal PATH — will find the saved paths in config.

Nested tool lookups (PATH injection): Resolved tool paths only cover the binaries daggle invokes directly. Some of those tools spawn further subprocesses via their own PATH lookup — most notably Quarto, which searches PATH for Rscript to render R chunks. To keep these working under a minimal daemon PATH, daggle prepends the directories of all resolved tools onto the PATH of every step, hook, and deadline-hook subprocess. So once Rscript is resolved (in config or via auto-detection), a Quarto step finds it even when the scheduler was launched by launchd/systemd/cron. See Scheduler troubleshooting.

Run daggle doctor to see which paths daggle resolved.