Convoy Lifecycle Design
Making convoys actively converge on completion.
Flow
Three creation paths feed into the same lifecycle. Completion is event-driven
via the daemon's ConvoyManager, which runs two goroutines:
- Event poll (every 5s): Polls all rig beads stores + hq via
GetAllEventsSince, detects close events, and callsconvoy.CheckConvoysForIssue— which both checks completion and feeds the next ready issue to a polecat. - Stranded scan (every 30s): Runs
gt convoy stranded --jsonto catch convoys missed by the event-driven path (e.g. after crash/restart). Feeds ready work or auto-closes empty convoys.
Manual overrides (close --force, land) bypass the check entirely.
History: Witness and Refinery observers were originally planned as redundant observers but were removed (spec S-04, S-05). The daemon's multi-rig event poll + stranded scan provide sufficient coverage.
Auto-convoy creation: what gt sling actually does
gt sling auto-creates a convoy for every bead it dispatches, unless
--no-convoy is passed. The behavior differs significantly between
single-bead and multi-bead (batch) sling.
Single-bead sling
gt sling sh-task-1 gastown
- Checks if
sh-task-1is already tracked by an open convoy. - If not tracked: creates one auto-convoy
"Work: <issue-title>"tracking that single bead. - Spawns one polecat, hooks the bead, starts working.
Result: 1 bead, 1 convoy, 1 polecat.
Batch sling (3+ args, rig auto-resolved)
gt sling gt-task-1 gt-task-2 gt-task-3
The rig is auto-resolved from the beads' prefixes via routes.jsonl
(resolveRigFromBeadIDs in sling_batch.go). All beads must resolve
to the same rig. An explicit rig arg still works but prints a
deprecation warning:
gt sling gt-task-1 gt-task-2 gt-task-3 gastown
# Deprecation: gt sling now auto-resolves the rig from bead prefixes.
# You no longer need to explicitly specify <gastown>.
Batch sling creates one convoy tracking all beads. Before spawning
any polecats, runBatchSling (sling_batch.go) calls
createBatchConvoy (sling_convoy.go) which creates a single convoy
with title "Batch: N beads to <rig>" and adds tracks deps for all
beads.
Result: 3 beads, 1 convoy, 3 polecats -- all dispatched in parallel with 2-second delays between spawns.
The convoy ID and merge strategy are stored on each bead via
beadFieldUpdates, so gt done can find the convoy via the fast path.
There is no upper limit on the number of beads. gt sling <10 beads>
spawns 10 polecats sharing 1 convoy. The only throttle is
--max-concurrent (default 0 = unlimited).
Rig resolution errors
When the rig is explicit, the cross-rig guard checks each bead's prefix
against the target rig. On mismatch, it errors with batch-specific
suggested actions (remove the bead, sling it separately, or --force).
When auto-resolving, resolveRigFromBeadIDs errors if:
- A bead has no valid prefix
- A prefix is not mapped in
routes.jsonl(including town-levelpath=".") - Beads resolve to different rigs (lists each bead's rig, suggests slinging separately)
Already-tracked bead conflict
If any bead in the batch is already tracked by another convoy, batch sling errors before spawning any polecats. It prints:
- Which bead conflicts and which convoy it belongs to
- All beads in the existing convoy with their statuses
- The conflicting bead highlighted
- 4 recommended actions (remove from batch, move bead, close old convoy, add to existing)
Initial dispatch vs daemon feeding
- Initial dispatch is parallel. All beads get polecats sequentially
with 2-second delays between spawns, but they all dispatch in the same
batch sling call regardless of deps. Even if
gt-task-2has ablocksdep ongt-task-1, both get slung. TheisIssueBlockedcheck only applies to daemon-driven convoy feeding (after a close event), not to the initial batch sling dispatch. - Subsequent feeding respects deps. When a task closes, the daemon's
event-driven feeder checks
IsSlingableTypeandisIssueBlockedbefore dispatching the next ready issue from the shared convoy.
No-convoy mode
Batch sling with --no-convoy skips convoy creation entirely:
gt sling gt-task-1 gt-task-2 gt-task-3 gastown --no-convoy
Problem Statement
Convoys are passive trackers. They group work but don't drive it. The completion loop has a structural gap:
Create → Assign → Execute → Issues close → ??? → Convoy closes
The ??? is "Deacon patrol runs gt convoy check" - a poll-based single point of
failure. When Deacon is down, convoys don't close. Work completes but the loop
never lands.
Current State
What Works
- Convoy creation and issue tracking
gt convoy statusshows progressgt convoy strandedfinds unassigned workgt convoy checkauto-closes completed convoys
What Breaks
- Poll-based completion: Only Deacon runs
gt convoy check - No event-driven trigger: Issue close doesn't propagate to convoy
- Manual close is inconsistent across docs:
gt convoy close --forceexists, but some docs still describe it as missing - Single observer: No redundant completion detection
- Weak notification: Convoy owner not always clear
Design: Active Convoy Convergence
Principle: Event-Driven, Centrally Managed
Convoy completion should be:
- Event-driven: Triggered by issue close, not polling
- Centrally managed: Single owner (daemon) avoids scattered side-effect hooks
- Manually overridable: Humans can force-close
Event-Driven Completion
When an issue closes, check if it's tracked by a convoy:
Implementation: The daemon's ConvoyManager event poll detects close events
via SDK GetAllEventsSince across all rig stores + hq. This catches all closes
regardless of source (CLI, witness, refinery, manual).
Observer: Daemon ConvoyManager
The daemon's ConvoyManager is the sole convoy observer, running two
independent goroutines:
| Loop | Trigger | What it does |
|---|---|---|
| Event poll | GetAllEventsSince every 5s (all rig stores + hq) | Detects close events, calls CheckConvoysForIssue |
| Stranded scan | gt convoy stranded --json every 30s | Feeds first ready issue via gt sling, auto-closes empty convoys |
Both loops are context-cancellable. The shared CheckConvoysForIssue function
is idempotent — closing an already-closed convoy is a no-op.
History: The original design called for three redundant observers (Daemon, Witness, Refinery) per the "Redundant Monitoring Is Resilience" principle. Witness observers were removed (spec S-04) because convoy tracking is orthogonal to polecat lifecycle management. Refinery observers were removed (spec S-05) after S-17 found they were silently broken (wrong root path) with no visible impact, confirming single-observer coverage is sufficient.
Issue-to-Rig Resolution
Convoys are rig-agnostic. A convoy like hq-cv-6vjz2 lives in the hq store and
tracks issue IDs like sh-pb6sa — but it doesn't store which rig that issue
belongs to. The rig association is resolved at dispatch time via two lookups:
- Extract prefix:
sh-pb6sa→sh-(string parsing) - Resolve rig: look up
sh-in~/gt/.beads/routes.jsonl→ finds{"prefix":"sh-","path":"gastown/.beads"}→ rig name isgastown
This happens in feedFirstReady (stranded scan path) and feedNextReadyIssue
(event poll path) just before calling gt sling. Issues with prefixes that
don't appear in routes.jsonl (or that map to path="." like hq-*) are
skipped — see isSlingableBead().
Both paths check isRigParked after resolving the rig name. Issues targeting
parked rigs are logged and skipped rather than dispatched.
Manual Close Command
gt convoy close is implemented, including --force for abandoned convoys.
# Close a completed convoy
gt convoy close hq-cv-abc
# Force-close an abandoned convoy
gt convoy close hq-cv-xyz --reason="work done differently"
# Close with explicit notification
gt convoy close hq-cv-abc --notify mayor/
Use cases:
- Abandoned convoys no longer relevant
- Work completed outside tracked path
- Force-closing stuck convoys
Convoy Owner/Requester
Track who requested the convoy for targeted notifications:
gt convoy create "Feature X" gt-abc --owner mayor/ --notify overseer
| Field | Purpose |
|---|---|
owner | Who requested (gets completion notification) |
notify | Additional subscribers |
If owner not specified, defaults to creator (from created_by).
Convoy States
OPEN ──(all issues close)──► CLOSED
│ │
│ ▼
│ (add issues)
│ │
└─────────────────────────────┘
(auto-reopens)
Adding issues to closed convoy reopens automatically.
New state for abandonment:
OPEN ──► CLOSED (completed)
│
└────► ABANDONED (force-closed without completion)
Timeout/SLA (Future)
Optional due_at field for convoy deadline:
gt convoy create "Sprint work" gt-abc --due="2026-01-15"
Overdue convoys surface in gt convoy stranded --overdue.
Commands
Current: gt convoy close
gt convoy close <convoy-id> [--reason=<reason>] [--notify=<agent>]
- Verifies tracked issues are complete by default
--forcecloses even when tracked issues remain open- Sets
close_reasonfield - Sends notification to owner and subscribers
- Idempotent - closing closed convoy is no-op
Enhanced: gt convoy check
# Check all convoys (current behavior)
gt convoy check
# Check specific convoy (new)
gt convoy check <convoy-id>
# Dry-run mode
gt convoy check --dry-run
Future: gt convoy reopen
gt convoy reopen <convoy-id>
Explicit reopen for clarity (currently implicit via add).
Implementation Status
Core convoy manager is fully implemented and tested (see spec.md stories S-01 through S-18, all DONE). Remaining future work:
- P2: Owner field - targeted notifications polish
- P3: Timeout/SLA - deadline tracking
Key Files
| Component | File |
|---|---|
| Convoy command | internal/cmd/convoy.go |
| Auto-convoy (sling) | internal/cmd/sling_convoy.go |
| Convoy operations | internal/convoy/operations.go (CheckConvoysForIssue, feedNextReadyIssue) |
| Daemon manager | internal/daemon/convoy_manager.go |
| Formula convoy | internal/cmd/formula.go (executeConvoyFormula) |
Related
- convoy.md - Convoy concept and usage
- dog-infrastructure.md - Daemon/boot/deacon watchdog chain
- mail-protocol.md - Notification delivery