Skip to main content

Agent API Touch-Point Inventory

Complete catalog of all GT↔agent integration points, mapped to source code and the proposed Factory Worker API endpoints.

Ref: gt-5zs8 | Companion: factory-worker-api.md


How to Read This Document

Each touch point lists:

  • What: What GT does through this touch point
  • Code: Source files and key functions (line numbers approximate after edits)
  • Flow: What information moves and in which direction (GT→Agent or Agent→GT)
  • Fragility: What breaks and why
  • API mapping: Which Factory Worker API endpoint replaces it

1. Prompt Delivery (tmux send-keys)

What: GT sends text to agent sessions via tmux terminal injection.

Code:

  • internal/tmux/tmux.goNudgeSession() (line ~1300): 8-step protocol (serialize → find pane → exit copy mode → sanitize → chunk at 512 bytes → debounce 500ms → ESC + 600ms readline dance → Enter with retries → SIGWINCH wake)
  • internal/tmux/tmux.gosendMessageToTarget() (line ~1210): splits at 512 bytes, 10ms inter-chunk delay
  • internal/tmux/tmux.gosendKeysLiteralWithRetry() (line ~1253): exponential backoff (500ms→2s cap) for startup race
  • internal/tmux/tmux.gosanitizeNudgeMessage() (line ~1179): strips ESC, CR, BS, DEL; replaces TAB with space
  • internal/tmux/tmux.goSendKeys(), SendKeysDebounced(), SendKeysRaw(), SendKeysReplace(), SendKeysDelayed() — variant entry points
  • internal/cmd/nudge.gorunNudge() (line ~196), deliverNudge() (line ~129): CLI entry point, routes by mode (immediate/queue/wait-idle)

Flow: GT→Agent. Text string in, no structured response.

Fragility:

  • 600ms ESC delay must exceed bash readline's 500ms keyseq-timeout; otherwise ESC+Enter becomes M-Enter (meta-return) = no submit
  • 512-byte chunk size is empirical; tmux send-keys has undocumented limits
  • Sanitization strips control chars but cannot handle all edge cases
  • No delivery confirmation — GT has no way to know the agent received the message
  • Per-session channel semaphore (30s timeout) serializes concurrent nudges

API mapping: POST /prompt — structured JSON delivery with accepted/queued response


2. Three Delivery Modes (immediate, wait-idle, queue)

What: GT routes prompt delivery through three modes depending on urgency.

Code:

  • internal/cmd/nudge.go — mode constants: NudgeModeImmediate, NudgeModeQueue, NudgeModeWaitIdle (lines ~38-44)
  • internal/nudge/queue.goEnqueue() (line ~86): writes JSON file to .runtime/nudge_queue/<session>/, atomic naming with nanosecond timestamp
  • internal/nudge/queue.goDrain() (line ~143): atomic claim via rename to .claimed, orphan recovery for abandoned claims >5min, expiry filtering
  • internal/nudge/queue.goFormatForInjection() (line ~277): formats queued nudges as <system-reminder> blocks for Claude Code hook injection
  • internal/cmd/mail_check.gorunMailCheck() (line ~16): UserPromptSubmit hook drains queue + checks mail, outputs injection block
  • internal/mail/router.goNotifyRecipient() (line ~1568): wait-idle-first strategy with 3s timeout, queue fallback

Flow: GT→Agent. Immediate: terminal injection. Queue: file→hook→injection.

Fragility:

  • Queue drain depends on UserPromptSubmit hook — non-Claude agents never drain
  • TTLs hardcoded (normal: 30min, urgent: 2hr, max depth: 50)
  • Idle agents never call Drain(), so queued nudges can expire unseen
  • Witness nudges to Refinery use immediate-only (line ~639 in handlers.go)

API mapping: POST /prompt with priority field (system/urgent/normal)


3. Idle Detection (prompt prefix + status bar)

What: GT determines if an agent is idle (waiting for input) or busy.

Code:

  • internal/tmux/tmux.gomatchesPromptPrefix() (line ~2261): NBSP normalization (U+00A0→space), matches DefaultReadyPromptPrefix = "❯ " (U+276F)
  • internal/tmux/tmux.goIsIdle() (line ~2386): status bar parsing for ⏵⏵ (U+23F5), busy = "esc to interrupt" present
  • internal/tmux/tmux.goWaitForIdle() (line ~2321): polls 200ms interval, captures 5 pane lines, returns ErrIdleTimeout
  • internal/tmux/tmux.goIsAtPrompt() (line ~2359): non-blocking point-in-time check
  • internal/tmux/tmux.gopromptSuffixes (line ~1478): [">", "$", "%", "#", "❯"] for dialog detection

Flow: Agent→GT (inferred). GT scrapes terminal output; agent doesn't know.

Fragility:

  • Prompt prefix is a Claude Code UI string — any change breaks detection
  • Status bar ⏵⏵ and "esc to interrupt" are undocumented Claude Code internals
  • NBSP normalization was a bug fix (issues/1387) for a Claude Code rendering change
  • Different agents have different prompts — no universal detection
  • Point-in-time: race between check and state change

API mapping: POST /lifecycle with event: "idle" | "busy"


4. Rate Limit Detection (pane content scanning)

What: GT scans terminal output to detect rate-limited sessions for account rotation.

Code:

  • internal/quota/scan.goScanner struct, ScanAll() (line ~77), scanSession() (line ~99): captures 30 lines of pane content, checks bottom 20 against rate-limit regex patterns
  • internal/constants/constants.goDefaultRateLimitPatterns: regex patterns for rate limit messages

Flow: Agent→GT (inferred). GT reads pane; agent doesn't participate.

Fragility:

  • Regex patterns must match exact rate limit error messages
  • Messages can change across Claude Code versions
  • Captures only bottom 20 of 30 lines — rate limit message must be recent
  • No structured signal from agent that it's rate-limited

API mapping: POST /lifecycle with event: "degraded" + rate limit metadata, or POST /telemetry with rate limit event


5. Account/Quota Management (keychain token swapping)

What: GT rotates API credentials across sessions when accounts hit rate limits.

Code:

  • internal/quota/keychain.go — Darwin-only (289 lines): KeychainServiceName() (line ~35): SHA-256 hash of config dir, SwapKeychainCredential() (line ~78): backup target → read source → write target, SwapOAuthAccount() (line ~121): swaps .claude.json oauthAccount field, ValidateKeychainToken() (line ~203): checks expiry (JSON, JWT, opaque)
  • internal/quota/scan.goScanAll() (line ~77): scan for rate-limited sessions
  • internal/quota/rotate.goPlanRotation() (line ~42): 4-stage pipeline (scan → state manager → planner → executor)
  • internal/quota/executor.goRotator.Execute() (line ~81): atomic execution with flock, concurrent on independent sessions

Flow: GT→Agent. GT swaps credentials; agent is restarted with new token.

Fragility:

  • macOS-only — entire keychain subsystem is darwin-only, no Linux/Windows
  • Credential swap requires session restart (kill processes → respawn pane)
  • OAuth account field location in .claude.json is undocumented
  • SHA-256 keying assumes Claude Code's keychain service naming convention
  • No agent-side credential refresh — always a full restart

API mapping: POST /identity with credentials field — runtime applies without restart


6. Session Lifecycle (creation, restart, teardown)

What: GT creates, restarts, and tears down agent tmux sessions.

Code:

  • internal/session/lifecycle.goStartSession() (line ~121): 13-step unified lifecycle (resolve config → settings → command → session → env → theme → wait → dialogs → delay → verify → respawn → PID track)
  • internal/polecat/session_manager.goStart() (line ~186): polecat-specific session with zombie kill, worktree, beacon, env injection, pane-died hook
  • internal/witness/manager.goStart() (line ~107): witness session with zombie grace period, role config, theme, pane-died hook
  • internal/dog/session_manager.goStart() (line ~85): dog session via unified session.StartSession()
  • internal/tmux/tmux.goNewSessionWithCommand(): single-command session creation, SetAutoRespawnHook() (line ~3126): pane-died auto-respawn with 3s debounce
  • internal/tmux/tmux.goKillSessionWithProcesses() (line ~499): 8-step teardown (process group → tree walk → SIGTERM → 2s grace → SIGKILL → pane → session)

Flow: GT→Agent. GT controls entire lifecycle; agent is passive.

Fragility:

  • 13-step creation has many failure points (tmux, dialogs, readiness)
  • Auto-respawn via pane-died hook depends on tmux's hook mechanism
  • Kill sequence must handle reparented processes (PPID=1 check)
  • Zombie cleanup has TOCTOU gap (re-verified before kill)
  • Session creation takes 5-60s depending on agent startup time

API mapping: POST /lifecycle (agent reports transitions), POST /identity (GT assigns identity at creation)


7. Spawn Admission Control

What: GT gates polecat creation with health checks and capacity limits.

Code:

  • internal/cmd/polecat_spawn.goSpawnPolecatForSling() (line ~62): Dolt health check, connection capacity, polecat count cap (25), per-bead respawn circuit breaker, per-rig directory cap (30), idle polecat reuse
  • internal/polecat/manager.goCheckDoltHealth() (line ~223): retry with exponential backoff + jitter; CheckDoltServerCapacity() (line ~276): connection count admission gate
  • internal/witness/spawn_count.goShouldBlockRespawn() (line ~74): circuit breaker after 3 respawns per bead, RecordBeadRespawn() (line ~104): flock'd cross-process counter

Flow: GT internal. Admission decisions don't involve the agent.

Fragility:

  • Polecat cap (25) and dir cap (30) are hardcoded
  • Circuit breaker state in JSON file (bead-respawn-counts.json)
  • Dolt health check adds latency to every spawn

API mapping: Internal to GT orchestration — not part of agent-facing API


8. Agent Identity (env vars + preset registry)

What: GT assigns identity to agents via environment variables and a preset registry.

Code:

  • internal/config/env.goAgentEnv() (line ~65): generates 30+ env vars including GT_ROLE, GT_RIG, GT_POLECAT, GT_CREW, BD_ACTOR, GIT_AUTHOR_NAME, GT_ROOT, GT_AGENT, GT_SESSION, plus OTEL and credential passthrough
  • internal/config/agents.gobuiltinPresets (line ~164): 10 agent presets (Claude, Gemini, Codex, Cursor, Auggie, AMP, OpenCode, Copilot, Pi, OMP) with 21 fields each (Command, Args, ProcessNames, SessionIDEnv, etc.)
  • internal/session/identity.goParseSessionName() (line ~84), ParseAddress() (line ~30), SessionName() (line ~163): identity parsing and formatting
  • internal/constants/constants.go — role constants (lines ~196-215): RoleMayor, RoleDeacon, RoleWitness, RoleRefinery, RolePolecat, RoleCrew

Flow: GT→Agent. GT sets env vars; agent reads them.

Fragility:

  • 30+ env vars must be kept in sync across tmux SetEnvironment and exec-env
  • Three propagation mechanisms (tmux SetEnvironment, PrependEnv inline, cmd.Env) can diverge
  • Agent preset discovery relies on GT_AGENT or GT_PROCESS_NAMES env vars
  • Role detection hierarchy (env → CWD → fallback) can produce mismatches

API mapping: POST /identity — structured identity assignment with all fields


9. Priming (context injection)

What: GT injects role context, work assignments, and system state at session start.

Code:

  • internal/cmd/prime.gorunPrime() (line ~101): full prime or compact/resume path
  • internal/cmd/prime_output.gooutputPrimeContext() (line ~22): role-specific context rendering; role functions: outputMayorContext(), outputWitnessContext(), outputRefineryContext(), outputPolecatContext(), outputCrewContext(), etc.
  • internal/cmd/prime_session.gohandlePrimeHookMode() (line ~266): SessionStart hook integration, reads session ID from stdin JSON, persists to disk
  • internal/cmd/prime_session.godetectSessionState() (line ~202): returns "normal" | "post-handoff" | "crash-recovery" | "autonomous"
  • internal/cmd/prime.gocheckSlungWork() (line ~421): detects hooked work, outputAutonomousDirective() (line ~542): "AUTONOMOUS WORK MODE" output
  • internal/cmd/prime_molecule.gooutputMoleculeContext() (line ~182): molecule progress and step display

Flow: GT→Agent. 10-section output: beacon, handoff warning, role context, CONTEXT.md, handoff content, attachment status, autonomous directive, molecule context, checkpoint, startup directive.

Fragility:

  • Non-Claude agents without hooks lose automatic priming entirely
  • Compact/resume path must be lighter to prevent re-initialization loops
  • Session state detection depends on handoff marker files
  • Role template rendering uses Go text/template — errors silent

API mapping: POST /context with sections array and mode (full/compact/resume)


10. Hooks (settings.json installation)

What: GT installs hook configurations into agent runtime settings files.

Code:

  • internal/hooks/config.goHooksConfig (line ~28): 8 event types (PreToolUse, PostToolUse, SessionStart, Stop, PreCompact, UserPromptSubmit, WorktreeCreate, WorktreeRemove)
  • internal/hooks/config.goDefaultBase() (line ~711): base hooks including PR-workflow guard, dangerous-command guard, SessionStart → gt prime --hook, UserPromptSubmit → gt mail check --inject, Stop → gt costs record
  • internal/hooks/config.goDefaultOverrides() (line ~199): role-specific overrides (crew PreCompact → handoff cycle, witness/deacon/refinery patrol guards)
  • internal/hooks/merge.goMergeHooks() (line ~24): applies overrides in specificity order
  • internal/cmd/hooks_install.gorunHooksInstall() (line ~48): installs hooks from registry to worktrees, installHookTo() (line ~245): loads, merges, writes settings.json
  • internal/hooks/config.goDiscoverTargets() (line ~382): finds all settings files (mayor, deacon, crew, polecats, witness, refinery per rig)
  • internal/runtime/runtime.go — hook installer registration for 6 providers: claude, gemini, opencode, copilot, omp, pi

Flow: GT→Agent (at install time). Agent reads settings.json; GT wrote it.

Fragility:

  • Each agent vendor has different hook formats (settings.json, plugins, extensions)
  • 6 different hook providers, each with different file locations
  • Non-hook agents (no framework) get no hooks at all
  • Hook merging logic (base → role → rig+role) is complex

API mapping: POST /authorize (replaces PreToolUse guards), POST /context (replaces SessionStart/PreCompact priming), POST /telemetry (replaces Stop cost recording)


11. Guard Scripts (command blocking)

What: GT blocks dangerous or policy-violating commands via PreToolUse hooks.

Code:

  • internal/cmd/tap_guard.gorunTapGuardPRWorkflow() (line ~34): blocks gh pr create, git checkout -b, git switch -c in Gas Town agent contexts; isGasTownAgentContext() (line ~103) checks GT_* env vars and CWD paths
  • internal/cmd/tap_guard_dangerous.gorunTapGuardDangerous() (line ~66): blocks 5 patterns: rm -rf /, git push --force, git push -f, git reset --hard, git clean -f; extractCommand() (line ~104) parses Claude Code JSON hook input
  • Exit code convention: 2 = BLOCK

Flow: Agent→GT→Agent. Agent calls hook → GT evaluates → exit 0 (allow) or 2 (block).

Fragility:

  • Guards read hook input from stdin in Claude Code's JSON format — format change breaks
  • Pattern matching is substring-based — can miss variations
  • Guards fail-open on stdin errors (can't parse = allow)
  • Only 3 guard scripts; coverage is incomplete

API mapping: POST /authorize — GT evaluates tool calls with full context, returns allow/deny with reason


12. Conversation Log Access (JSONL scraping)

What: GT reads Claude Code's conversation transcripts for cost and session data.

Code:

  • internal/cmd/costs.gogetClaudeProjectDir() (line ~704): maps workdir to ~/.claude/projects/{slug}/; findLatestTranscript() (line ~717): finds most recent .jsonl; parseTranscriptUsage() (line ~751): line-by-line JSONL scan summing token usage
  • internal/cmd/seance.go — session discovery from .events.jsonl (line ~61), fallback scan of ~/.claude/projects/ (line ~513), sessions-index.json (line ~674)
  • Data structure: TranscriptMessage with Type, SessionId, Message.Model, Message.Usage.{InputTokens, CacheCreationInputTokens, CacheReadInputTokens, OutputTokens}

Flow: Agent→GT (inferred). Claude Code writes JSONL; GT scrapes filesystem.

Fragility:

  • Path encoding convention (slashes→dashes) is undocumented Claude Code internal
  • JSONL message format, usage field nesting can change without notice
  • Three independent JSONL parsers (agentlog, costs.go, seance) — no shared code
  • sessions-index.json format is Claude Code internal
  • Non-Claude agents don't produce JSONL transcripts

API mapping: POST /telemetry — agent pushes structured usage events


13. Token Usage & Cost Tracking

What: GT computes session costs from transcript token counts and hardcoded pricing.

Code:

  • internal/cmd/costs.go — 1516 lines total: calculateCost() (line ~801): token→USD using modelPricing map, extractCostFromWorkDir() (line ~823): extract from Claude transcript, runCostsRecord() (line ~956): Stop hook appends to ~/.gt/costs.jsonl, runCostsDigest() (line ~1155): daily digest bead from costs.jsonl
  • internal/cmd/costs.gomodelPricing (line ~222): hardcoded table (Opus: $15/$75, Sonnet: $3/$15, Haiku: $1/$5 per million tokens, cache read 90% discount, cache create 25% premium)
  • internal/config/cost_tier.goCostTierRoleAgents() (line ~44): maps roles to models per cost tier (standard/economy/budget)

Flow: Agent→GT (inferred). GT reads transcripts at session end.

Fragility:

  • Pricing table is hardcoded — must be updated when Anthropic changes pricing
  • Cost computed at session end via Stop hook, not real-time
  • No per-bead cost attribution
  • Model ID matching is fragile (substring matching against model names)
  • Non-Claude agents have no cost tracking

API mapping: POST /telemetry with usage.cost_usd — runtime reports cost at source


14. Process Liveness Detection

What: GT checks if an agent process is actually running inside a tmux session.

Code:

  • internal/tmux/tmux.goIsAgentAlive() (line ~2157): preferred method, delegates to IsRuntimeRunning() (line ~2091) with session process names
  • internal/tmux/tmux.goresolveSessionProcessNames() (line ~2164): priority GT_PROCESS_NAMES env → GT_AGENT env → config fallback
  • internal/tmux/tmux.goGetPaneCommand() (line ~1579): #{pane_current_command} via tmux format
  • internal/tmux/tmux.gohasDescendantWithNames() (line ~1823): recursive pgrep -P <pid> -l tree walk (maxDepth=10)
  • internal/tmux/tmux.goprocessMatchesNames() (line ~1800): ps -p <pid> -o comm=
  • internal/tmux/tmux.gogetAllDescendants() (line ~681): deepest-first process tree for safe cleanup
  • internal/tmux/process_group_unix.gogetProcessGroupMembers() (line ~38), getParentPID() (line ~20), getProcessGroupID() (line ~30)
  • internal/config/agents.goProcessNames per preset: Claude=["node","claude"], Gemini=["gemini"], etc.

Flow: Agent→GT (inferred). GT walks process tree; agent doesn't know.

Fragility:

  • Process name detection relies on exact binary names
  • Shell wrappers (e.g., c2claude) require descendant tree walking
  • pgrep and ps output parsing is platform-dependent
  • Process can exit between check and action (TOCTOU)

API mapping: GET /health — agent reports its own liveness status


15. Three-Level Health Check

What: GT performs a 3-level health assessment of agent sessions.

Code:

  • internal/tmux/tmux.goCheckSessionHealth() (line ~1771): Level 1: HasSession() (tmux session exists?), Level 2: IsAgentAlive() (agent process running?), Level 3: GetSessionActivity() (activity within maxInactivity?)
  • internal/tmux/tmux.goZombieStatus (line ~1723): enum with SessionHealthy, SessionDead, AgentDead, AgentHung; IsZombie() returns true for AgentDead or AgentHung

Flow: GT→GT (internal health assessment).

Fragility:

  • HungSessionThreshold = 30 minutes (hardcoded default)
  • Activity timestamp from tmux #{session_activity} — measures any terminal activity, not meaningful agent work
  • A sleeping agent with no output looks hung even if healthy

API mapping: GET /health — agent reports status, context_usage, last_activity


16. Heartbeat Files

What: GT uses heartbeat files for liveness detection outside tmux.

Code:

  • internal/polecat/heartbeat.goTouchSessionHeartbeat() (line ~34): writes JSON to .runtime/heartbeats/<session>.json, IsSessionHeartbeatStale() (line ~74): 3-minute threshold, ReadSessionHeartbeat() (line ~54), RemoveSessionHeartbeat()
  • internal/deacon/heartbeat.goWriteHeartbeat() (line ~52): deacon heartbeat at deacon/heartbeat.json with cycle count, health stats; IsFresh() (<5min), IsStale() (5-15min), IsVeryStale() (>15min)

Flow: Agent→GT (implicit). Agent command writes file; GT reads it.

Fragility:

  • File-based — no notification on write, must poll
  • Stale threshold (3min) chosen empirically
  • Heartbeat touch depends on GT commands being called (not agent-initiated)

API mapping: GET /health — agent reports liveness directly; no files needed


17. Working Directory Detection

What: GT determines an agent's working directory through multiple methods.

Code:

  • internal/tmux/tmux.goGetPaneWorkDir() (line ~1676): #{pane_current_path} via tmux
  • internal/workspace/find.goFind() (line ~29): walks up from CWD looking for mayor/town.json marker; handles worktree paths (polecats/, crew/); FindFromCwdWithFallback() (line ~113): GT_TOWN_ROOT env fallback for deleted worktrees
  • internal/config/env.go — GT_ROOT env var set in AgentEnv()

Flow: GT→GT (detection) and GT→Agent (env var).

Fragility:

  • 5 detection methods can disagree (tmux CWD, env vars, path parsing, git worktree)
  • Worktree deletion leaves agent with no valid CWD
  • GT_TOWN_ROOT fallback exists specifically because worktree cleanup breaks CWD

API mapping: Part of POST /identity — GT assigns working directory


18. Permission Bypass (YOLO flags)

What: GT starts all agents with vendor-specific permission bypass flags.

Code:

  • internal/config/agents.go — per-preset Args:
    • Claude: --dangerously-skip-permissions
    • Gemini: --approval-mode yolo
    • Codex: --dangerously-bypass-approvals-and-sandbox
    • Cursor: -f
    • Auggie: --allow-indexing
    • AMP: --dangerously-allow-all --no-ide
    • OpenCode: env OPENCODE_PERMISSION={"*":"allow"}
    • Copilot: --yolo
  • internal/tmux/tmux.goAcceptBypassPermissionsWarning() (line ~1509): polls for "Bypass Permissions mode" dialog, sends Down+Enter; DismissStartupDialogsBlind() (line ~1558): blind key sequence fallback

Flow: GT→Agent (at startup). Always-on, no per-role granularity.

Fragility:

  • 10 different flag names across 10 agents — each is a different string
  • All-or-nothing: no per-role permission granularity
  • Claude's permission warning dialog detection depends on exact text
  • No opt-out — every agent runs with full bypass

API mapping: POST /authorize — per-call authorization with role-based rules


19. Non-Interactive Mode

What: GT runs agents in non-interactive mode for specific tasks.

Code:

  • internal/config/agents.goNonInteractiveConfig (line ~92): ExecSubcommand (e.g., "exec"), PromptFlag (e.g., "-p"), OutputFormatFlag (e.g., "--output-format json")
  • internal/config/agents.goPromptMode (line ~98): "arg" or "none"

Flow: GT→Agent. GT constructs CLI invocation with flags.

Fragility:

  • Exec subcommand and flag names differ per agent
  • Output format parsing depends on agent's output structure
  • Not all agents support non-interactive execution

API mapping: POST /prompt with structured I/O replaces CLI flag composition


20. Session Resume/Fork

What: GT resumes prior sessions or forks them for conversation recall.

Code:

  • internal/config/agents.goResumeFlag, ContinueFlag, ResumeStyle ("flag" vs "subcommand") per preset; BuildResumeCommand() (line ~534)
  • internal/cmd/seance.gorunSeance() (line ~85): spawns claude --fork-session --resume <id> for predecessor recall
  • internal/session/startup.goFormatStartupBeacon() (line ~69): [GAS TOWN] recipient <- sender • timestamp • topic format

Flow: GT→Agent. GT constructs resume command with session ID.

Fragility:

  • Resume semantics differ per agent (flag vs subcommand)
  • --fork-session is Claude Code specific
  • Session ID stored in env vars and files — multiple sources of truth
  • Beacon format parsed by LLMs — format changes affect comprehension

API mapping: POST /context with mode: "resume" and session history


21. Config Directory Isolation

What: GT isolates agent configuration per account to support credential rotation.

Code:

  • internal/config/agents.goConfigDirEnv (e.g., "CLAUDE_CONFIG_DIR"), ConfigDir (e.g., ".claude") per preset
  • internal/config/env.goCLAUDE_CONFIG_DIR set in AgentEnv() (line ~148)
  • internal/quota/keychain.goKeychainServiceName() (line ~35): SHA-256 hash of config dir for per-account keychain isolation
  • Account directory pattern: ~/.claude-accounts/<handle>/

Flow: GT→Agent. GT sets config dir env; agent uses it for all settings.

Fragility:

  • Config dir layout is Claude Code internal
  • Symlink switching between accounts is fragile
  • SHA-256 keying of keychain service names depends on Claude Code convention

API mapping: POST /identity with credentials — runtime manages its own config


22. Theme/Display (tmux status bar)

What: GT applies role-specific tmux status bar themes.

Code:

  • internal/cmd/theme.gorunTheme(): applies role/rig-specific tmux status line formatting
  • Applied during StartSession() step in internal/session/lifecycle.go

Flow: GT→tmux. Display-only, doesn't affect agent behavior.

Fragility:

  • Purely cosmetic — but theme strings used in idle detection (⏵⏵)
  • Theme depends on tmux being the terminal multiplexer

API mapping: Not part of agent API — display concern stays in GT


23. Agent Output Capture (tmux capture-pane)

What: GT reads agent terminal output for various purposes.

Code:

  • internal/tmux/tmux.goCapturePaneTrimmed(), CapturePaneLines(): captures N lines from agent's terminal
  • Used by: idle detection (5 lines), rate limit scanning (30 lines), dialog detection, readiness polling, nudge verification
  • internal/telemetry/recorder.goRecordPaneRead() (line ~266): OTel event for every capture-pane call

Flow: Agent→GT (inferred). GT reads terminal; agent doesn't know.

Fragility:

  • Terminal content is unstructured text — parsing is always regex/heuristic
  • Capture-pane only gets visible terminal buffer — scrollback limited
  • Multi-pane sessions require FindAgentPane() first

API mapping: Eliminated — POST /lifecycle and POST /telemetry provide structured data; no need to scrape terminal


24. Done/Exit Signaling

What: Agent signals work completion through GT commands and intent files.

Code:

  • internal/cmd/done.gorunDone() (line ~81): persistent polecat model, transitions to IDLE with sandbox preserved; exit constants: ExitCompleted, ExitEscalated, ExitDeferred (line ~65)
  • internal/cmd/signal_stop.gorunSignalStop() (line ~47): Stop hook handler, checks unread mail and hooked work, returns JSON {"decision":"block"|"approve","reason":"..."}
  • internal/witness/handlers.goHandlePolecatDone() (line ~110): processes POLECAT_DONE messages

Flow: Agent→GT. Agent calls gt done; GT processes exit type.

Fragility:

  • Done detection relies on agent calling gt done (a GT CLI command)
  • Stop hook must parse Claude Code's expected JSON format
  • 4 exit types but no structured error reporting
  • Stop state tracking (in /tmp) to prevent infinite block loops

API mapping: POST /lifecycle with event: "stopping" + exit metadata


25. Environment Variable Injection

What: GT injects 30+ env vars into agent sessions via tmux.

Code:

  • internal/config/env.goAgentEnv() (line ~65): generates full env map (GT_, BD_, GIT_, CLAUDE_, OTEL_*, credential passthrough)
  • Three propagation mechanisms:
    1. tmux.SetEnvironment() — session-level via set-environment
    2. config.PrependEnv() — inline export K=V && before command
    3. config.EnvForExecCommand()cmd.Env append for subprocess
  • Safety guards: NODE_OPTIONS="" (clears VSCode debugger), CLAUDECODE="" (prevents nested session detection)
  • Credential passthrough: 40+ cloud API vars (Anthropic, AWS, Google, proxy, mTLS)

Flow: GT→Agent. GT sets env; agent inherits.

Fragility:

  • Three propagation mechanisms can diverge
  • env vars visible to any process in the session (security concern)
  • Credential passthrough list must be manually maintained
  • tmux SetEnvironment only affects new shell invocations, not running processes

API mapping: POST /identity with env map — single structured delivery


26. Telemetry (OTel integration)

What: GT emits OpenTelemetry metrics and logs for all agent operations.

Code:

  • internal/telemetry/telemetry.goInit() (line ~104): OTel provider setup, VictoriaMetrics/VictoriaLogs endpoints, 30s export interval
  • internal/telemetry/recorder.go — 18 event types: RecordSessionStart(), RecordSessionStop(), RecordPromptSend(), RecordPaneRead(), RecordPrime(), RecordAgentStateChange(), RecordPolecatSpawn(), RecordPolecatRemove(), RecordSling(), RecordMail(), RecordNudge(), RecordDone(), RecordDaemonRestart(), RecordFormulaInstantiate(), RecordConvoyCreate(), RecordPaneOutput(), RecordBDCall(), RecordPrimeContext()
  • 17 OTel Int64Counter metrics (gastown.session.starts.total, etc.)
  • internal/telemetry/subprocess.goSetProcessOTELAttrs(): propagates OTEL_RESOURCE_ATTRIBUTES to subprocesses

Flow: GT→Metrics backend. Agent operations tracked by GT, not agent.

Fragility:

  • OTel export depends on VictoriaMetrics/Logs being available
  • No correlation ID threads through all events (PR #2068 proposed run.id)
  • Agent has no say in what's recorded or how

API mapping: POST /telemetry — agent pushes its own events with run_id


27. Event Logging (.events.jsonl)

What: GT logs all significant events to a JSONL file.

Code:

  • internal/events/events.goLog() (line ~85), LogFeed() (line ~98), LogAudit() (line ~103): append to .events.jsonl with flock
  • Event types (lines ~36-77): sling, handoff, done, hook, unhook, spawn, kill, boot, halt, session_start, session_end, session_death, mass_death, patrol_, merge_, scheduler_*
  • internal/tui/feed/events.goGtEventsSource (line ~216): tails .events.jsonl for TUI feed display

Flow: GT→File. Events from GT operations, not agent-reported.

Fragility:

  • Single JSONL file for all events — no rotation or size management
  • flock serialization can contend under high concurrency
  • No correlation ID linking events to specific agent runs

API mapping: POST /telemetry events supersede GT-side logging for agent-reported data


28. Zombie Detection & Recovery

What: GT detects and recovers from zombie sessions (tmux alive, agent dead).

Code:

  • internal/doctor/zombie_check.goZombieSessionCheck.Run() (line ~33): filters known GT sessions, excludes crew, calls IsAgentAlive(); ZombieSessionCheck.Fix() (line ~113): re-verifies before kill (TOCTOU guard), never kills crew sessions
  • internal/daemon/wisp_reaper.go — wisp reaper for stale wisp cleanup
  • internal/witness/handlers.go — witness patrol with restart-first policy (not nuke-first)
  • internal/dog/health.goHealthChecker.Check() (line ~46): dog-specific health check using CheckSessionHealth()
  • internal/witness/spawn_count.go — spawn storm circuit breaker: ShouldBlockRespawn() (line ~74), escalates to mayor after threshold

Flow: GT→GT. Internal monitoring, agent is passive subject.

Fragility:

  • Zombie detection depends on process tree walking (platform-specific)
  • Grace periods and thresholds are empirical (zombie kill grace, hung threshold)
  • TOCTOU gap between detection and action
  • Circuit breaker state in JSON file

API mapping: GET /health — agent reports status directly; zombie detection becomes trivial (no response = dead)


Cross-Cutting Themes

Correlation Gap

No single ID connects: OTel event ↔ conversation transcript ↔ cost entry ↔ session event ↔ bead. run_id in the Factory Worker API solves this.

Claude Code Coupling

17 of 28 touch points depend on Claude Code internals:

  • Prompt prefix (), status bar (⏵⏵), JSONL format, config dir layout, keychain service naming, session index, hook JSON format, permission dialog text, bypass flag name, resume flag semantics, sessions-index.json, transcript message structure, usage field nesting.

Agent Parity Gap

Non-Claude agents lose:

  • Hooks (no automatic priming, guards, or mail injection)
  • Conversation log access (no JSONL transcripts)
  • Cost tracking (no transcript to parse)
  • Resume/fork (different or no mechanism)
  • Permission dialog handling (different UI)

The Factory Worker API eliminates this — one API, all agents.

Push vs Scrape

Current: GT scrapes 6+ sources (tmux pane, JSONL files, process tree, heartbeat files, keychain, config dirs). Proposed: Agent pushes lifecycle events, telemetry, and health — GT never scrapes.


Summary: Touch Points → API Endpoints

#Touch PointCurrent MechanismAPI Endpoint
1Prompt deliverytmux send-keys 8-stepPOST /prompt
2Delivery modesimmediate/queue/wait-idlePOST /prompt priority
3Idle detectionprompt prefix + status barPOST /lifecycle
4Rate limit detectionpane content regexPOST /lifecycle
5Account rotationmacOS keychain swapPOST /identity
6Session lifecycle13-step tmux createPOST /lifecycle
7Spawn admissioncapacity gatesInternal (not agent-facing)
8Agent identity30+ env varsPOST /identity
9Priming10-section text outputPOST /context
10Hookssettings.json installMultiple endpoints
11Guard scriptsPreToolUse exit code 2POST /authorize
12JSONL scrapingfilesystem transcript readPOST /telemetry
13Cost trackinghardcoded pricing tablePOST /telemetry
14Process livenesspgrep tree walkGET /health
15Health check3-level tmux checkGET /health
16Heartbeat filesJSON file write/pollGET /health
17Working dir detection5 methods (tmux, env, path)POST /identity
18Permission bypass10 vendor-specific flagsPOST /authorize
19Non-interactive modeCLI flag compositionPOST /prompt
20Session resume/fork--resume/--fork flagsPOST /context
21Config dir isolationCLAUDE_CONFIG_DIR envPOST /identity
22Theme/displaytmux status barNot agent-facing
23Output capturetmux capture-paneEliminated
24Done/exit signalinggt done CLI callPOST /lifecycle
25Env var injection3 propagation mechanismsPOST /identity
26OTel telemetryGT-side recordingPOST /telemetry
27Event logging.events.jsonl appendPOST /telemetry
28Zombie detectionprocess tree + thresholdsGET /health

28 touch points → 7 API endpoints. Every hack replaced by structured communication.