File Layout
daggle uses a fixed directory structure for configuration, DAG definitions, and run data.
Directory structure
~/.config/daggle/
config.yaml # Global configuration (tool paths, cleanup, etc.)
dags/ # Global DAG definitions
my-pipeline.yml
nightly-etl.yml
projects.yaml # Registered project directories
~/.local/share/daggle/
runs/ # Run data, organized by DAG and date
<dag>/
<YYYY-MM-DD>/
run_<id>/
meta.json # Run metadata (params, start time, status)
events.jsonl # Append-only event log (see Event Schema)
dag.yaml # Snapshot of the DAG YAML at run start
dag_diff.patch # Unified diff vs. the prior run (only if dag_hash changed)
<step>.stdout.log # Step stdout (includes output markers)
<step>.stderr.log # Step stderr
<step>.inline.R # Rendered inline R code (for r_expr steps)
<step>.sessioninfo.json # R sessionInfo() — written only on R step failure
proc/
scheduler.pid # Scheduler daemon PID file
.daggle/ # Project-local DAG definitions (repo root)
my-dag.yml
DAG discovery order
When resolving a DAG by name, daggle searches in this order:
--dags-dirflag (if provided)DAGGLE_DAGS_DIRenvironment variable (if set).daggle/in the current working directory (project-local)~/.config/daggle/dags/(global)
The first match wins.
Run directory contents
Each run gets its own directory under runs/<dag>/<date>/run_<id>/.
| File | Description |
|---|---|
meta.json |
Run metadata: DAG name, parameters, start/end timestamps, final status. |
events.jsonl |
Append-only event log. See Event Schema. |
dag.yaml |
Copy of the DAG YAML taken at run start. Used by dag_diff.patch and by daggle archive to make runs self-describing and reproducible. |
dag_diff.patch |
Unified diff of this run’s dag.yaml against the previous run’s. Written only when the DAG hash changed between runs. Gives a self-contained “what changed?” record without requiring git. |
<step>.stdout.log |
Captured stdout for each step. Includes raw output markers. |
<step>.stderr.log |
Captured stderr for each step. |
<step>.inline.R |
Rendered R source for r_expr steps (useful for debugging). |
<step>.sessioninfo.json |
sessionInfo() snapshot written only when an R step fails. Contains r_version, platform, error_message, session_info (full text), and timestamp. Useful for compliance and post-mortem debugging — proves which package versions were active at the moment of failure without re-running R. |
Overriding directories
| Mechanism | Config dir | Data dir | DAGs dir |
|---|---|---|---|
| CLI flags | – | --data-dir |
--dags-dir |
| Environment variables | DAGGLE_CONFIG_DIR |
DAGGLE_DATA_DIR |
DAGGLE_DAGS_DIR |
| XDG fallback | $XDG_CONFIG_HOME/daggle |
$XDG_DATA_HOME/daggle |
– |
| Default | ~/.config/daggle |
~/.local/share/daggle |
(discovery order) |
Priority is top to bottom: CLI flags override environment variables, which override XDG, which override defaults.
Global configuration (config.yaml)
The file ~/.config/daggle/config.yaml holds global settings. All fields are optional.
# Override tool paths (useful when the scheduler can't find binaries)
tools:
rscript: /usr/local/bin/Rscript
quarto: /opt/homebrew/bin/quarto
git: /usr/bin/git
# Scheduler behaviour (all optional)
scheduler:
poll_interval: "30s" # how often to scan DAG files AND fire due cron
# entries. Lower this if you have sub-30s cron
# schedules (e.g. "500ms" for @every 1s).
max_concurrent: 4 # cap on concurrent DAG runs across the daemon
watch_debounce: "500ms" # quiet window for file-watcher triggers
max_catchup_runs: 100 # cap on `catchup: all` fires per startup
# Automatic cleanup of old run data
cleanup:
older_than: "30d"
interval: "6h"
# Named notification channels, referenced from hooks by name.
# See the Hooks page for usage.
notifications:
team-slack:
type: slack
webhook_url: https://hooks.slack.com/services/...
ops-email:
type: smtp
smtp_host: mail.example.com
smtp_port: 587
smtp_from: daggle@example.com
smtp_to: [ops@example.com]
smtp_user: daggle
smtp_password: ${SMTP_PASSWORD}Tool path resolution
At startup, daggle resolves each tool path using this precedence:
tools:in config.yaml — explicit absolute path (highest priority)exec.LookPath— searches the currentPATH- Bare binary name — fallback (may fail if not on
PATH)
Auto-detection: When daggle discovers a tool via PATH lookup (step 2), it automatically saves the resolved absolute path to config.yaml. This means running any daggle command once from an interactive shell (e.g., daggle doctor or daggle run) will persist the tool paths. Future scheduler runs — even as a system service with a minimal PATH — will find the saved paths in config.
Nested tool lookups (PATH injection): Resolved tool paths only cover the binaries daggle invokes directly. Some of those tools spawn further subprocesses via their own PATH lookup — most notably Quarto, which searches PATH for Rscript to render R chunks. To keep these working under a minimal daemon PATH, daggle prepends the directories of all resolved tools onto the PATH of every step, hook, and deadline-hook subprocess. So once Rscript is resolved (in config or via auto-detection), a Quarto step finds it even when the scheduler was launched by launchd/systemd/cron. See Scheduler troubleshooting.
Run daggle doctor to see which paths daggle resolved.