Blockyard provides four observability mechanisms: structured logging, Prometheus metrics, OpenTelemetry tracing, and an append-only audit log.

Logging#

Blockyard writes structured JSON logs to stderr. Control verbosity with the log_level setting:

[server]
log_level = "info"   # trace, debug, info, warn, error

Or via environment variable:

BLOCKYARD_SERVER_LOG_LEVEL=debug

Log levels#

LevelUse case
traceFine-grained diagnostics (WebSocket frames, load-balancer decisions). Noisy.
debugSubsystem internals (health checks, session lifecycle, container operations)
infoNormal operations (startup, requests, deploys). Default.
warnDegraded conditions (failed health checks, capacity limits)
errorFailures requiring attention (container crashes, build failures)

HTTP request logging#

All HTTP requests are logged automatically:

  • Health probes (/healthz, /readyz) are logged at debug level to avoid noise in production
  • Other requests are logged at info (2xx/3xx), warn (4xx), or error (5xx) based on the response status code

Each log entry includes method, path, status, and duration_ms.

Management listener#

By default, operational endpoints (/healthz, /readyz, /metrics) are served on the main listener alongside the application proxy and API. This means containers running untrusted Shiny app code can reach these endpoints.

For production deployments, configure a separate management listener bound to a loopback address:

[server]
bind            = "0.0.0.0:3838"
management_bind = "127.0.0.1:9100"

When management_bind is set:

  • /healthz, /readyz, and /metrics move to the management listener and are removed from the main listener
  • The management listener requires no authentication — access is controlled by the network binding (loopback = host-only)
  • /readyz always returns full per-component check details
  • Prometheus can scrape /metrics without a bearer token
  • Container bridge networks cannot reach 127.0.0.1, so untrusted workloads cannot access operational data

When AppRole auth is used (openbao.role_id), /readyz also reports a vault_token check that reflects whether the token renewal goroutine is healthy. A stale or expired token degrades readiness, signaling the operator to re-bootstrap with a fresh secret_id.

Point your health checks and Prometheus scraper at the management port:

# prometheus.yml
scrape_configs:
  - job_name: blockyard
    static_configs:
      - targets: ['localhost:9100']

On shutdown, the management listener stops first (health probes fail, signaling load balancers to drain traffic), then the main listener is shut down gracefully.

Prometheus metrics#

Enable the /metrics endpoint in your configuration:

[telemetry]
metrics_enabled = true

When served on the main listener (no management_bind), the endpoint requires authentication (bearer token or session cookie). When served on the management listener, no authentication is required.

Available metrics#

Gauges:

MetricDescription
blockyard_workers_activeNumber of currently running worker containers
blockyard_sessions_activeNumber of active proxy sessions

Counters:

MetricDescription
blockyard_workers_spawned_totalTotal workers spawned
blockyard_workers_stopped_totalTotal workers stopped
blockyard_bundles_uploaded_totalTotal bundles uploaded
blockyard_bundle_restores_succeeded_totalSuccessful dependency restores
blockyard_bundle_restores_failed_totalFailed dependency restores
blockyard_proxy_requests_totalTotal requests through the reverse proxy
blockyard_health_checks_failed_totalTotal failed worker health checks
blockyard_audit_entries_dropped_totalAudit log entries dropped due to buffer overflow

Histograms:

MetricDescription
blockyard_cold_start_secondsTime from worker spawn to healthy (buckets: 0.5s–64s)
blockyard_proxy_request_secondsProxy request duration, excluding cold start
blockyard_build_secondsBundle dependency restore duration (buckets: 5s–640s)

OpenTelemetry tracing#

Send distributed traces to an OpenTelemetry collector:

[telemetry]
otlp_endpoint = "http://otel-collector:4317"

The service name is blockyard. Spans include http.method, http.route, and http.status_code attributes. Endpoints using http://, localhost, or 127.0.0.1 connect without TLS; all others use TLS.

Security headers#

All HTTP responses include the following security headers:

HeaderValue
X-Content-Type-Optionsnosniff
X-Frame-OptionsDENY
Referrer-Policystrict-origin-when-cross-origin
Strict-Transport-Securitymax-age=63072000; includeSubDomains (HTTPS only)

API endpoints additionally set Content-Security-Policy: default-src 'none'; frame-ancestors 'none' and Cache-Control: no-store.

Audit logging#

Enable append-only audit logging to a JSONL file:

[audit]
path = "/data/audit/blockyard.jsonl"

Each line is a JSON object with the following fields:

FieldDescription
tsTimestamp (RFC 3339 with nanoseconds)
actionEvent type (see below)
actorOIDC sub of the user who triggered the action
targetResource ID (app ID, user sub, etc.), if applicable
detailAdditional context (map of key-value pairs), if applicable
source_ipCaller’s IP address, if applicable

Audit actions#

ActionTrigger
app.createApp created
app.updateApp settings changed
app.deleteApp deleted
app.startWorker started for an app
app.stopApp workers stopped
app.rollbackApp rolled back to a previous bundle
app.restoreSoft-deleted app restored
app.renameApp renamed (old name becomes an alias)
bundle.uploadBundle uploaded
bundle.restore.successDependency restore completed
bundle.restore.failDependency restore failed
access.grantPer-app access granted to a user
access.revokePer-app access revoked
credential.enrollUser enrolled a credential in OpenBao
user.loginUser logged in via OIDC
user.logoutUser logged out
user.updateUser role or active status changed
token.createPersonal Access Token created
token.revokeSingle PAT revoked
token.revoke_allAll PATs revoked for a user

Buffering#

Audit entries are buffered in memory (up to 1000 entries) and flushed to disk by a background writer. If the buffer is full, new entries wait up to 500 ms for space before being dropped. Dropped entries increment the blockyard_audit_entries_dropped_total metric. Under normal load, entries are written within milliseconds.