Skip to main content

OpenTelemetry Architecture

Overview

Gas Town uses OpenTelemetry (OTel) for structured observability of all agent operations. Telemetry is emitted via standard OTLP HTTP to any compatible backend (metrics, logs).

Backend-agnostic design: The system emits standard OpenTelemetry Protocol (OTLP) — any OTLP v1.x+ compatible backend can consume it. You are not obligated to use VictoriaMetrics/VictoriaLogs; these are simply development defaults.

Best-effort design: Telemetry initialization errors are returned but do not affect normal GT operation. The system remains functional even when telemetry is unavailable.


Quick Setup

Set at least one endpoint variable to activate telemetry — both endpoints unset means telemetry is completely disabled (no instrumentation code runs):

# Full local setup (recommended)
export GT_OTEL_METRICS_URL=http://localhost:8428/opentelemetry/api/v1/push
export GT_OTEL_LOGS_URL=http://localhost:9428/insert/opentelemetry/v1/logs

# Opt-in features
export GT_LOG_BD_OUTPUT=true # Include bd stdout/stderr in bd.call records
export GT_LOG_AGENT_OUTPUT=true # Stream Claude conversation turns to logs (PR #2199)

Local backends (Docker):

docker run -d -p 8428:8428 victoriametrics/victoria-metrics
docker run -d -p 9428:9428 victoriametrics/victoria-logs

Verify: gt prime should emit a prime event visible at http://localhost:9428/select/vmui.


Implementation Status

Core Telemetry (Main ✅)

FeatureStatusNotes
Core OTel initialization✅ Maintelemetry.Init(), providers setup
Metrics export (counters)✅ Main18 metric instruments
Metrics export (histograms)✅ Maingastown.bd.duration_ms histogram
Logs export (any OTLP backend)✅ MainOTLP logs exporter
Subprocess correlation✅ MainOTEL_RESOURCE_ATTRIBUTES via SetProcessOTELAttrs()

Session Lifecycle (Main ✅)

FeatureStatusNotes
Session lifecycle✅ Mainsession.start/session.stop events (tmux lifecycle)
Agent instantiation❌ Roadmapagent.instantiate event — no RecordAgentInstantiate function exists

Workflow & Work Events (Main ✅)

FeatureStatusNotes
Prompt/nudge telemetry✅ Mainprompt.send and nudge events
BD operation telemetry✅ Mainbd.call events (stdout/stderr opt-in via GT_LOG_BD_OUTPUT=true)
Mail telemetry✅ Mainmail operations (operation + status only; no message payload)
Sling/done telemetry✅ Mainsling and done events
GT prime telemetry✅ Mainprime + prime.context events
Work context in prime🔲 PR #2199work_rig, work_bead, work_mol on prime events

Agent Lifecycle (Main ✅)

FeatureStatusNotes
Polecat lifecycle telemetry✅ Mainpolecat.spawn/polecat.remove
Agent state telemetry✅ Mainagent.state_change events
Daemon restart telemetry✅ Maindaemon.restart events
Polecat spawn metric✅ Maingastown.polecat.spawns.total

Molecule Lifecycle

FeatureStatusNotes
Molecule lifecycle telemetry❌ Roadmapmol.cook/mol.wisp/mol.squash/mol.burn — no RecordMol* functions exist
Bead creation telemetry❌ Roadmapbead.create — no RecordBeadCreate function exists
Formula instantiation telemetry✅ Mainformula.instantiate
Convoy telemetry✅ Mainconvoy.create events

Agent Events (PR #2199)

FeatureStatusNotes
Agent conversation events🔲 PR #2199agent.event per conversation turn (text/tool_use/tool_result/thinking)
Token usage tracking🔲 PR #2199agent.usage per assistant turn (input/output/cache_read/cache_creation)
Cloud session correlation🔲 PR #2199native_session_id linking Claude to GT telemetry
Agent logging daemon🔲 PR #2199gt agent-log detached process for JSONL streaming
run.id on all events🔲 PR #2199WithRunID/RunIDFromCtx; addRunID() injects run.id into every log record

Activation (PR #2199): Requires GT_LOG_AGENT_OUTPUT=true AND GT_OTEL_LOGS_URL set.


Roadmap

P0 — Critical (blocking accurate attribution)

Work context injection at gt prime — implemented in PR #2199

Polecats are generic agents — they have no fixed rig. GT_RIG at session start reflects an allocation rig or is empty, which is meaningless for attributing work. The actual work context (which rig, bead, and molecule a polecat is processing) is only known at each gt prime invocation.

A single polecat session goes through multiple gt prime cycles, each on a potentially different rig and bead:

session start → rig="" (generic, no work yet)
gt prime #1 → work_rig="gastown", work_bead="sg-05iq", work_mol="mol-polecat-work"
bd.call, mail, sling, done ← carry work context from prime #1
gt prime #2 → work_rig="sfgastown", work_bead="sg-g8vs", work_mol="mol-polecat-work"
bd.call, mail, sling, done ← carry work context from prime #2

Fix: at each gt prime, inject GT_WORK_RIG, GT_WORK_BEAD, GT_WORK_MOL into the tmux session environment (via SetEnvironment), not just emit them as log attributes. This ensures all subsequent subprocesses (bd, mail, agent logging) inherit the current work context automatically until the next prime overwrites it.

New attributes emitted on the prime event and carried by all events until the next prime:

AttributeTypeDescription
work_rigstringrig whose bead is on the hook
work_beadstringbead ID currently hooked
work_molstringmolecule ID if the bead is a molecule step; empty otherwise

P1 — High value

Token cost metric (gastown.token.cost_usd)

Compute dollar cost per run from token counts using Claude model pricing. Emit as a Gauge metric at session end. Enables per-rig and per-bead cost dashboards.

New metric: gastown.token.cost_usd{rig, role, agent_type} — accumulated cost per session. New event attribute on agent.usage: cost_usd — cost of the current turn.


Go runtime/process metrics (low effort)

The OTel Go SDK has a contrib package (go.opentelemetry.io/contrib/instrumentation/runtime) that auto-emits goroutine count, GC pause duration, heap usage, and memory allocations. Activation is ~5 lines of code in telemetry.Init().

New metrics: process.runtime.go.goroutines, process.runtime.go.gc.pause_ns, process.runtime.go.mem.heap_alloc_bytes


Refinery queue telemetry

The Refinery's merge queue is a central health indicator but currently completely dark to observability. Expose:

MetricTypeDescription
gastown.refinery.queue_depthGaugepending items in merge queue
gastown.refinery.item_age_msHistogramage of oldest item in queue
gastown.refinery.dispatch_latency_msHistogramtime between enqueue and dispatch

New log event: refinery.dispatch with bead_id, queue_depth, wait_ms, status.


Distributed Traces (OTel Traces SDK)

Currently the waterfall relies on run.id as a manual correlation key across flat log records. Replacing this with proper OTel Traces would enable:

  • Visual waterfall in Jaeger / Grafana Tempo
  • Automatic parent → child span attribution (no manual run.id joins)
  • P95/P99 latency per operation derived from spans, not histograms

Architecture: each polecat session spawn creates a root span (gt.session). Child spans are created for bd.call, mail, sling, done. The run.id becomes the trace ID. GT_RUN propagation becomes W3C traceparent header injection.

This is a significant effort (requires go.opentelemetry.io/otel/trace + tracer provider + exporter) but would be the single highest-impact observability improvement.


P2 — Medium value

Scheduler dispatch telemetry

Expose the capacity-controlled dispatch cycle:

New metrics: scheduler.dispatch_cycle (dispatched/failed/skipped counts), scheduler.queue_depth (histogram), scheduler.capacity_usage (gauge).


done event enrichment

Currently done carries only exit_type and status. Adding work context enables per-rig completion analysis:

New attributes: rig, bead_id, time_to_complete_ms (wall time from session start to done).


Witness patrol cycle telemetry

Each witness patrol cycle should emit: duration, stale sessions detected, restarts triggered. Enables trend analysis on witness health.

New event: witness.patrol with duration_ms, stale_count, restart_count, status.


Dolt health metrics

Dolt issues are only detected at spawn time today. Exposing health metrics continuously:

New metrics: gastown.dolt.connections, gastown.dolt.query_duration_ms (histogram), gastown.dolt.replication_lag_ms.


P3 — Nice to have

ItemDescription
Deacon watchdog telemetryState machine transitions in the deacon watchdog chain
Crew session trackingCrew session cycle events: start, push, done, idle
Git operation telemetryTrack clone, checkout, fetch duration per polecat session
OTel W3C BaggageReplace GT_RUN env var propagation with W3C Baggage for standard cross-process context
Retry pattern detectionAlert when a polecat's error rate exceeds threshold across runs

Components

1. Initialization (internal/telemetry/telemetry.go)

The telemetry.Init() function sets up OTel providers on process startup:

provider, err := telemetry.Init(ctx, "gastown", version)
if err != nil {
// Log and continue — telemetry is best-effort
}
defer provider.Shutdown(ctx)

Exact signature: func Init(ctx context.Context, serviceName, serviceVersion string) (*Provider, error)

Providers:

  • Metrics: Any OTLP-compatible metrics backend via otlpmetrichttp exporter
  • Logs: Any OTLP-compatible logs backend via otlploghttp exporter

Default endpoints (when GT_OTEL_* variables are not set):

  • Metrics: http://localhost:8428/opentelemetry/api/v1/push
  • Logs: http://localhost:9428/insert/opentelemetry/v1/logs

Note: These defaults target VictoriaMetrics/VictoriaLogs for local development convenience. Gas Town uses standard OTLP — you can override endpoints to use any OTLP v1.x+ compatible backend (Prometheus, Grafana Mimir, Datadog, New Relic, Grafana Cloud, Loki, OpenTelemetry Collector, etc.).

OTLP Compatibility:

  • Uses standard OpenTelemetry Protocol (OTLP) over HTTP
  • Protobuf encoding (VictoriaMetrics, Prometheus, and others accept this)
  • Compatible with any backend that supports OTLP v1.x+

Resource attributes (set at init time by the OTel SDK):

OTel attributeSource
service.name"gastown" (hardcoded at call site)
service.versionGT binary version
host.name, host.archresource.WithHost() — OTel SDK reads system hostname
os.type, os.version, os.descriptionresource.WithOS() — OTel SDK reads OS info

Custom resource attributes (via OTEL_RESOURCE_ATTRIBUTES env var, set by SetProcessOTELAttrs()):

AttributeSource env varNotes
gt.roleGT_ROLEAgent role (e.g. gastown/polecats/Toast)
gt.rigGT_RIGRig name
gt.actorBD_ACTORBD actor/identity
gt.agentGT_POLECAT or GT_CREWAgent name
gt.sessionGT_SESSIONTmux session name — PR #2199
gt.run_idGT_RUNRun UUID — PR #2199
gt.work_rigGT_WORK_RIGWork rig at last prime — PR #2199
gt.work_beadGT_WORK_BEADHooked bead at last prime — PR #2199
gt.work_molGT_WORK_MOLMolecule step at last prime — PR #2199

2. Recording Layer (internal/telemetry/recorder.go)

The recorder provides type-safe functions for emitting all GT telemetry events. Each function emits:

  1. OTel metric counter (→ VictoriaMetrics, aggregated)
  2. OTel log record (→ VictoriaLogs, full detail)

run.id on log records: On main, log records do not carry run.id. After PR #2199 merges, addRunID(ctx, &r) will be called on every log record, injecting run.id from context (set via WithRunID) or from the GT_RUN env var.

Recording Pattern

func RecordSomething(ctx context.Context, args ..., err error) {
initInstruments() // Lazy-load OTel instruments
status := statusStr(err) // "ok" or "error"
inst.somethingTotal.Add(ctx, 1, metric.WithAttributes(
attribute.String("status", status),
attribute.String("label", value),
))
emit(ctx, "something", severity(err),
otellog.String("key1", value1),
otellog.String("key2", value2),
otellog.String("status", status),
errKV(err), // Empty string or error message
)
}

Instrument Types

TypeDescriptionExample
CountersTotal counts per attribute combinationgastown.polecat.spawns.total{status="ok"}
HistogramsDistribution of measurements (latency, duration)gastown.bd.duration_ms
Log recordsStructured events with full payloadprime, mail, agent.event (PR #2199)

3. Context Propagation

Subprocess Integration (internal/telemetry/subprocess.go)

Two mechanisms ensure subprocess telemetry is correlated:

1. Process-level inheritance (SetProcessOTELAttrs):

  • Called once at GT startup
  • Sets OTEL_RESOURCE_ATTRIBUTES in process environment
  • All exec.Command() subprocesses inherit these env vars automatically

2. Manual injection (OTELEnvForSubprocess):

  • For callers building cmd.Env explicitly (overriding os.Environ)
  • Returns pre-built env slice with:
    • OTEL_RESOURCE_ATTRIBUTES (GT context attributes)
    • BD_OTEL_METRICS_URL (mirrors GT_OTEL_METRICS_URL)
    • BD_OTEL_LOGS_URL (mirrors GT_OTEL_LOGS_URL)
    • GT_RUN (run ID for correlation — PR #2199)

Run ID Correlation (PR #2199)

On main, there is no run-level correlation key in log records. PR #2199 adds:

  • GT_RUN env var — UUID generated at polecat spawn
  • gt.run_id in OTEL_RESOURCE_ATTRIBUTES — carried by all subprocesses
  • WithRunID(ctx, runID) / RunIDFromCtx(ctx) — Go context carrier
  • addRunID(ctx, &record) — called in every emit, injects run.id into log record

Query example (after PR #2199): Retrieve all events for a single session run

run.id:uuid-1234

4. Agent Logging (PR #2199)

Status: PR #2199 (otel-p0-work-context) — not on main. The files below are added in PR #2199 and do not exist at origin/main.

Opt-in feature: GT_LOG_AGENT_OUTPUT=true streams native AI agent JSONL to VictoriaLogs.

How it works:

  1. ActivateAgentLogging() (internal/session/agent_logging_unix.go) spawns detached gt agent-log process
  2. Uses Setsid so it survives parent process exit
  3. PID file at /tmp/gt-agentlog-<session>.pid ensures single instance
  4. --since=now-60s filters to only this session's Claude instance
  5. gt agent-log (internal/cmd/agent_log.go) tails JSONL files and emits RecordAgentEvent for each
  6. internal/agentlog/ package — adapters for claudecode and opencode JSONL formats

Events emitted:

  • agent.event: One record per conversation turn (text, tool_use, tool_result, thinking)
  • agent.usage: Token usage per assistant turn (input, output, cache stats)

Session name in telemetry:

  • session: Tmux session name (e.g., gt-gastown-Toast)
  • native_session_id: Claude Code JSONL filename UUID

Environment Variables

GT-Level Variables

VariableSet byDescription
GT_OTEL_METRICS_URLOperatorOTLP metrics endpoint (default: localhost:8428)
GT_OTEL_LOGS_URLOperatorOTLP logs endpoint (default: localhost:9428)
GT_LOG_BD_OUTPUTOperatorOpt-in: Include bd stdout/stderr in bd.call records
GT_LOG_AGENT_OUTPUTOperatorOpt-in (PR #2199): Stream Claude conversation events

Session Context Variables (Set by session.StartSession)

VariableValues / FormatDescription
GT_ROLE<rig>/polecats/<name> · mayor · beads/witnessAgent role for identity parsing
GT_RIGgastown, beadsRig name (empty for town-level agents)
GT_POLECATToast, Shadow, FuriosaPolecat name (rig-specific)
GT_CREWmax, janeCrew member name
GT_SESSIONgt-gastown-Toast, hq-mayorTmux session name
GT_AGENTclaudecode, codexAgent override (if specified)
GT_RUNUUID v4PR #2199 — Run identifier, primary waterfall correlation key
GT_ROOT/Users/pa/gtTown root path
CLAUDE_CONFIG_DIR~/gt/.claudeRuntime config directory (for agent overrides)
BD_ACTOR<rig>/polecats/<name>BD actor identity (git author)
GIT_AUTHOR_NAMEAgent nameGit author name
GIT_CEILING_DIRECTORIESTown rootGit ceiling (prevents repo traversal)

Event Types

See OTel Data Model for the complete event schema, attribute tables, and metric reference.


Monitoring Gaps

Currently Monitored ✅

AreaCoverage
Agent session lifecycleFull (start, stop, respawn)
Tmux prompts/nudgesFull (content length, debouncing — content not logged)
BD operationsFull (all BD CLI calls)
Mail operationsFull (operation + status; message payload not recorded)
Polecat lifecycleFull (spawn, remove, state changes)
Formula instantiationFull (formula name, bead ID)
Convoy trackingFull (auto-convoy creation)
Daemon restartsFull (witness/deacon-initiated)
GT prime operationsFull (with formula context)
Agent conversation events🔲 PR #2199 — requires GT_LOG_AGENT_OUTPUT=true
Token usage🔲 PR #2199 — requires GT_LOG_AGENT_OUTPUT=true

Not Currently Monitored ❌

AreaNotesOperational Impact
Generic polecat work contextCritical gap — see Generic Polecat Work Context belowNo work attribution on any event between two gt prime calls; token costs unattributable
Agent instantiationNo agent.instantiate event (roadmap)Cannot anchor a run to a specific agent spawn
Molecule lifecycleNo mol.cook/wisp/squash/burn events (roadmap)Cannot observe formula-to-wisp pipeline
Bead creationNo bead.create event (roadmap)Cannot trace child bead graph during molecule instantiation
Dolt server healthHandled by pre-spawn health checks, but not exposed to telemetryDatabase issues only detected at spawn time; no real-time health monitoring
Refinery merge queueInternal operation, not surfaced via telemetryCannot monitor merge backlog or detect bottlenecks
Scheduler dispatch logsCapacity-controlled dispatch cycles not exposed to telemetryCannot track dispatch efficiency, queue depth, or capacity utilization
Crew worktree operationsNo explicit tracking of crew session cyclesCannot track crew efficiency or session patterns
Git operations (clone, checkout, etc.)Git author/name is set, but individual operations not trackedCannot diagnose git-related failures or track repository operations
Resource usage (CPU, memory, disk)Not instrumented — consider OTel process metricsCannot detect resource exhaustion or capacity planning needs
Network activityNot instrumented (Claude API calls logged by agent, but external traffic not)Cannot diagnose network issues or detect unusual external connections
Cross-rig worktree operationsWorktrees are created/managed but operations not trackedCannot correlate worktree lifecycle with work items
Witness monitoring loopsHealth checks happen but not exposed to observabilityCannot monitor witness health trends or detect degraded performance
Deacon watchdog chainInternal state machine, not currently exposed to observabilityCannot track deacon health or detect daemon failures

Generic Polecat Work Context ⚠️

Critical gap: Polecats are generic agents with no fixed rig. gt.rig in resource attributes reflects the allocation rig (or is empty), which has no bearing on the actual work being done. Work context is only determined at each gt prime invocation — and changes with every new work assignment.

This means all events emitted between two gt prime calls (bd.call, mail, sling, done) have no work attribution today. You cannot answer "which bead did this bd.call serve?" from current telemetry.

Impact:

  • gt.rig resource attribute is the allocation rig, not the work rig — misleading for multi-rig polecats
  • Token usage (agent.usage, PR #2199) cannot be attributed to a specific bead, rig, or molecule
  • bd.call, mail, done events carry no indication of which work item triggered them

Proposed solution (see Roadmap P0):

  • At each gt prime, write GT_WORK_RIG, GT_WORK_BEAD, GT_WORK_MOL into the tmux session via SetEnvironment — all subprocesses inherit automatically
  • Emit work_rig, work_bead, work_mol on the prime event
  • All events emitted after a prime (until the next one) carry the current work context via the inherited env vars

Data Model

See OTel Data Model for complete schema of all events.

The data model is independent of backend — any OTLP-compatible consumer can parse and query these events.


Queries

Metrics (Any OTLP-compatible backend)

These examples use PromQL/MetricsQL syntax, as supported by VictoriaMetrics, Prometheus, Grafana Mimir, etc.

Total counts by status:

sum(rate(gastown_polecat_spawns_total[5m])) by (status)
sum(rate(gastown_bd_calls_total[5m])) by (subcommand, status)

Naming note: OTel SDK metric names use dot notation (e.g. gastown.bd.calls.total). Prometheus-compatible backends export these with underscores (e.g. gastown_bd_calls_total). Use underscore form in PromQL queries.

Latency distributions:

histogram_quantile(0.5, rate(gastown_bd_duration_ms_bucket[5m])) by (subcommand)
histogram_quantile(0.95, rate(gastown_bd_duration_ms_bucket[5m])) by (subcommand)
histogram_quantile(0.99, rate(gastown_bd_duration_ms_bucket[5m])) by (subcommand)

Session activity:

sum(increase(gastown_session_starts_total[1h])) by (role)
sum(increase(gastown_done_total[1h])) by (exit_type)

VictoriaLogs (Structured Logs)

Find all events from a polecat:

gt.agent:Toast

Error analysis:

status:error
_msg:bd.call AND status:error
_msg:session.stop AND status:error

Polecat lifecycle:

_msg:polecat.spawn
_msg:polecat.remove
_msg:agent.state_change AND new_state:working

Debugging Examples

Track a polecat working across multiple rigs:

gt.agent:Toast

Shows all events from polecat Toast, regardless of rig assignment.

Identify sessions with high error rates:

_msg:bd.call AND status:error

Find all events for a run (after PR #2199):

run.id:uuid-1234

Backends Compatible with OTLP

BackendNotes
VictoriaMetricsDefault for metrics (localhost:8428) — open source. Override with GT_OTEL_METRICS_URL to use any OTLP-compatible backend.
VictoriaLogsDefault for logs (localhost:9428) — open source. Override with GT_OTEL_LOGS_URL to use any OTLP-compatible backend.
PrometheusSupports OTLP via remote_write receiver — open source
Grafana MimirSupports OTLP via write endpoint — open source
LokiRequires OTLP bridge (Loki uses different format) — open source
OpenTelemetry CollectorUniversal forwarder to any backend (recommended for production) — open source

Production Recommendation: For production deployments, consider using the OpenTelemetry Collector as a sidecar. The Collector provides:

  • Single agent for all telemetry
  • Advanced processing and batching
  • Support for multiple backends simultaneously
  • Better resource efficiency than per-process exporters

Appendix: Source Reference Audit

Audited against origin/main @ 2d8d71ee35fafda3bbdf353683692bfcc9165476

Initialization (internal/telemetry/telemetry.go)

ClaimSource
func Init(ctx context.Context, serviceName, serviceVersion string) (*Provider, error)telemetry.go:104
func (p *Provider) Shutdown(ctx context.Context) errortelemetry.go:68
EnvMetricsURL = "GT_OTEL_METRICS_URL"telemetry.go:36
EnvLogsURL = "GT_OTEL_LOGS_URL"telemetry.go:39
DefaultMetricsURL = "http://localhost:8428/opentelemetry/api/v1/push"telemetry.go:42
DefaultLogsURL = "http://localhost:9428/insert/opentelemetry/v1/logs"telemetry.go:45
semconv.ServiceName(serviceName) — resource attrtelemetry.go:129
semconv.ServiceVersion(serviceVersion) — resource attrtelemetry.go:130
resource.WithHost() — produces host.name, host.archtelemetry.go:132
resource.WithOS() — produces os.type, os.version, os.descriptiontelemetry.go:133
Both endpoints unset → telemetry disabled (no providers created)telemetry.go:115–120

Subprocess correlation (internal/telemetry/subprocess.go)

ClaimSource
func buildGTResourceAttrs() stringsubprocess.go:11
GT_ROLEgt.rolesubprocess.go:13
GT_RIGgt.rigsubprocess.go:16
BD_ACTORgt.actorsubprocess.go:19
GT_POLECATgt.agentsubprocess.go:23
GT_CREWgt.agent (fallback)subprocess.go:25
func SetProcessOTELAttrs()subprocess.go:42
Sets OTEL_RESOURCE_ATTRIBUTESsubprocess.go:48
Sets BD_OTEL_METRICS_URLsubprocess.go:52
Sets BD_OTEL_LOGS_URLsubprocess.go:54
func OTELEnvForSubprocess() []stringsubprocess.go:66

Recording (internal/telemetry/recorder.go)

ClaimSource
func emit(ctx, body, sev, attrs...)recorder.go:133
initInstruments() — all instruments initialized hererecorder.go:59
GT_LOG_BD_OUTPUT gates stdout/stderr loggingrecorder.go:208
RecordBDCall / bd.call eventrecorder.go:187
RecordSessionStart / session.start eventrecorder.go:218
RecordSessionStop / session.stop eventrecorder.go:236
RecordPromptSend / prompt.send event — keys_len only, content not loggedrecorder.go:250
RecordPaneRead / pane.read eventrecorder.go:266
RecordPrime / prime eventrecorder.go:282
RecordPrimeContext / prime.context eventrecorder.go:305
RecordAgentStateChange / agent.state_changehas_hook_bead boolrecorder.go:318
RecordPolecatSpawn / polecat.spawn eventrecorder.go:338
RecordPolecatRemove / polecat.remove eventrecorder.go:352
RecordSling / sling eventrecorder.go:366
RecordMail / mail event — operation, status, error onlyrecorder.go:381
RecordNudge / nudge eventrecorder.go:398
RecordDone / done eventrecorder.go:413
RecordDaemonRestart / daemon.restart eventrecorder.go:431
RecordFormulaInstantiate / formula.instantiate eventrecorder.go:442
RecordConvoyCreate / convoy.create eventrecorder.go:460
RecordPaneOutput / pane.output eventrecorder.go:477

Absent functions and features (confirmed by grep on origin/main)

ClaimVerification
RecordAgentInstantiate / agent.instantiate — does not existgrep -r "RecordAgentInstantiate|agent\.instantiate" internal/ → zero matches
RecordMolCook / mol.cook etc. — do not existgrep -r "RecordMol|mol\.cook|mol\.wisp|mol\.squash|mol\.burn" internal/ → zero matches
RecordBeadCreate / bead.create — does not existgrep -r "RecordBeadCreate|bead\.create" internal/ → zero matches
WithRunID / RunIDFromCtx — do not exist on maingrep -r "WithRunID|RunIDFromCtx" internal/telemetry/ → zero matches
GT_RUN — does not exist on maingrep -r "GT_RUN" internal/ → zero matches
GT_LOG_AGENT_OUTPUT — does not exist on maingrep -r "GT_LOG_AGENT_OUTPUT" . → zero matches
gt.session / gt.run_id in resource attrs — not in subprocess.go on mainconfirmed: subprocess.go has only gt.role, gt.rig, gt.actor, gt.agent
agent_logging_unix.go — does not exist on mainfind internal/session/ -name "agent_logging*" → zero results
agent_log.go — does not exist on mainfind internal/cmd/ -name "agent_log*" → zero results
telemetry.IsActive() — does not exist on maingrep -r "IsActive" internal/telemetry/ → zero matches

PromQL naming convention

OTel SDK uses dot notation; Prometheus-compatible backends export with underscores.

SDK namePromQL / MetricsQL name
gastown.bd.calls.totalgastown_bd_calls_total
gastown.bd.duration_msgastown_bd_duration_ms_bucket / _sum / _count
gastown.polecat.spawns.totalgastown_polecat_spawns_total
gastown.session.starts.totalgastown_session_starts_total
gastown.done.totalgastown_done_total

PR #2199 additions (commit 8b88de15, not yet on main)

ClaimSource
RecordAgentEvent / agent.eventadded in 8b88de15
RecordAgentTokenUsage / agent.usageadded in 8b88de15
gastown.agent.events.total Counteradded in 8b88de15
WithRunID / RunIDFromCtx / addRunIDadded in 8b88de15
gt.session, gt.run_id, gt.work_* in resource attrssubprocess.go updated in 8b88de15
GT_RUN propagation to subprocessessubprocess.go updated in 8b88de15
injectWorkContext / setTmuxWorkContext in prime.goadded in 8b88de15
internal/agentlog/ packagenew in 8b88de15
internal/cmd/agent_log.gonew in 8b88de15
internal/session/agent_logging_unix.gonew in 8b88de15
GT_LOG_AGENT_OUTPUT env varnew in 8b88de15
telemetry.IsActive()added in 8b88de15