Convoy Manager Specification

Daemon-resident event-driven completion and stranded convoy recovery.

Status: Implementation complete (all stories DONE) Owner: Daemon subsystem Related: convoy-lifecycle.md | roadmap.md

1. Problem Statement

Convoys group work but don't drive it. Completion depends on a single poll-based Deacon patrol cycle running gt convoy check. When Deacon is down or slow, convoys stall. Work finishes but the loop never lands:

Create -> Track -> Execute -> Issue closes -> ??? -> Convoy closes

The gap needs three capabilities:

Event-driven completion -- react to issue closes, not poll for them.
Stranded recovery -- catch convoys missed by event-driven path (crash, restart, stale state).
Redundant observation -- multiple agents detect completion so no single failure blocks the loop.

2. Architecture

2.1 ConvoyManager (daemon-resident)

Two goroutines inside gt daemon:

Goroutine	Trigger	What it does
Event poll	`GetAllEventsSince` every 5s, all rig stores + hq	Detects `EventClosed` / `EventStatusChanged(closed)`, calls `CheckConvoysForIssue`
Stranded scan	`gt convoy stranded --json` every 30s	Feeds first ready issue via `gt sling`, auto-closes empty convoys via `gt convoy check`

Both goroutines are context-cancellable and coordinate shutdown via sync.WaitGroup.

The event poll opens beads stores for all known rigs (via routes.jsonl) plus the town-level hq store. Parked/docked rigs are skipped during polling. Convoy lookups always use the hq store since convoys are hq-* prefixed. Each store has an independent high-water mark for event IDs.

2.2 Shared Observer (`convoy.CheckConvoysForIssue`)

Shared function called by the daemon's event poll:

Observer	When	Entry point
Daemon event poll	Close event detected in any rig store or hq	`convoy.CheckConvoysForIssue` (hq store passed in)

The shared function:

Finds convoys tracking the closed issue (SDK GetDependentsWithMetadata on hq store, filtered by tracks type)
Skips already-closed convoys
Runs gt convoy check <id> for open convoys
If convoy remains open after check, feeds next ready issue via gt sling
Idempotent -- safe to call multiple times for the same event

2.3 Key Design Decisions

Decision	Rationale
SDK polling (not CLI streaming)	Avoids subprocess lifecycle management, simpler restart semantics
High-water mark (atomic int64)	Monotonically advancing, no duplicate event processing
One issue fed per convoy per scan	Prevents batch overflow; next issue fed on next close event
Stranded scan as safety net	Catches convoys missed by event-driven path (crash recovery)
Nil store disables event poll only	Stranded scan still works without beads SDK (degraded mode)
Resolved binary paths (PATCH-006)	ConvoyManager resolves `gt`/`bd` at startup to avoid PATH issues

3. Stories

Legend

Status	Meaning
DONE	Implemented, tested, integrated
DONE-PARTIAL	Implemented but has known gaps
TODO	Not yet implemented

Quality Gates (for all implementation stories)

These commands must pass for every implementation story in this spec:

go test ./...
golangci-lint run

S-01: Event-driven convoy completion detection [DONE]

Description: When an issue closes, the daemon detects the close event via SDK polling and triggers convoy completion checks.

Implementation: ConvoyManager.runEventPoll + pollEvents in convoy_manager.go

Acceptance criteria:

Tests:

TestEventPoll_DetectsCloseEvents -- real beads store, creates+closes issue, verifies log
TestEventPoll_SkipsNonCloseEvents -- create-only, no close detection

Corrective note: "Zero side effects" negative assertions have been added via TestEventPoll_SkipsNonCloseEvents_NegativeAssertion (verifies no subprocess calls, no close detection, and no convoy activity for non-close events). Originally tracked in S-11; now resolved.

S-02: Periodic stranded convoy scan [DONE]

Description: Every 30 seconds, scan for stranded convoys (unassigned work or empty). Feed ready work or auto-close empties.

Implementation: ConvoyManager.runStrandedScan + scan + findStranded + feedFirstReady + closeEmptyConvoy in convoy_manager.go

Acceptance criteria:

Tests:

TestScanStranded_FeedsReadyIssues -- mock gt, verify sling log file
TestScanStranded_ClosesEmptyConvoys -- mock gt, verify check log file
TestScanStranded_NoStrandedConvoys -- empty list: asserts sling log absent, check log absent, no convoy activity in logs
TestScanStranded_DispatchFailure -- first sling fails, scan continues
TestConvoyManager_ScanInterval_Configurable -- 0 -> default, custom preserved
TestStrandedConvoyInfo_JSONParsing -- JSON round-trip

S-03: Shared convoy observer function [DONE]

Description: A shared function for checking convoy completion and feeding the next ready issue, callable from any observer.

Implementation: CheckConvoysForIssue + feedNextReadyIssue in convoy/operations.go

Acceptance criteria:

Tests:

TestGetTrackingConvoys_FiltersByTracksType -- real store, blocks filtered
TestIsConvoyClosed_ReturnsCorrectStatus -- real store, open vs closed
TestExtractIssueID -- all wrapper variants
TestFeedNextReadyIssue_SkipsNonOpenIssues -- filtering logic
TestFeedNextReadyIssue_FindsReadyIssue -- first match
TestCheckConvoysForIssue_NilStore -- returns nil
TestCheckConvoysForIssue_NilLogger -- no panic
TestCheckConvoysForIssueWithAutoStore_NoStore -- non-existent path, nil

S-04: Witness integration [REMOVED]

Description: Witness convoy observer removed. The daemon's multi-rig event poll (watching all rig databases + hq) provides event-driven coverage for close events from any rig. The stranded scan (30s) provides backup. The witness's core job is polecat lifecycle management -- convoy tracking is orthogonal.

History: Originally had 6 CheckConvoysForIssueWithAutoStore call sites in handlers.go (1 post-merge, 5 zombie cleanup paths). All were pure side-effect notification hooks. Removed when daemon gained multi-rig event polling.

S-05: Refinery integration [REMOVED]

Description: Refinery convoy observer removed. The daemon event poll (5s) and witness observer provide sufficient coverage. The refinery observer was silently broken (S-17: wrong root path) for the entire feature lifetime with no visible impact, confirming the other two observers are sufficient. Since beads unavailability disables the entire town (not just convoy checks), the "degraded mode" justification for a third observer does not hold.

History: Originally called CheckConvoysForIssueWithAutoStore after merge. S-17 found it passed rig path instead of town root. S-18 fixed it. Subsequently removed as unnecessary redundancy.

S-06: Daemon lifecycle integration [DONE]

Description: ConvoyManager starts and stops cleanly with the daemon.

Implementation: Integrated in daemon.go Run() and shutdown() methods.

Acceptance criteria:

Opens beads store at daemon startup (nil if unavailable)
Passes resolved gtPath/bdPath to ConvoyManager
Passes logger.Printf for daemon log integration
Starts after feed curator
Stops before beads store is closed (correct shutdown order)
Stop completes within bounded time (no hang)

Tests:

TestDaemon_StartsManagerAndScanner -- start + stop with mock binaries
TestDaemon_StopsManagerAndScanner -- stop completes within 5s

S-07: Convoy fields in MR beads [DONE]

Description: Merge-request beads carry convoy tracking fields for priority scoring and starvation prevention.

Implementation: ConvoyID and ConvoyCreatedAt in MRFields struct in beads/fields.go

Acceptance criteria:

convoy_id field parsed and formatted
convoy_created_at field parsed and formatted
Supports underscore, hyphen, and camelCase key variants
Used by refinery for merge queue priority scoring

S-08: ConvoyManager lifecycle safety [DONE]

Description: Start/Stop are safe under edge conditions.

Acceptance criteria:

Stop() is idempotent (double-call does not deadlock)
Stop() before Start() returns immediately
Start() is guarded against double-call (atomic.Bool with CompareAndSwap at convoy_manager.go:50-51,80-83)

Tests:

TestManagerLifecycle_StartStop -- basic start + stop
TestConvoyManager_DoubleStop_Idempotent -- double stop
TestStart_DoubleCall_Guarded -- second Start() is no-op, warning logged

S-09: Subprocess context cancellation [DONE]

Description: All subprocess calls in ConvoyManager and observer propagate context cancellation so daemon shutdown is not blocked by hanging subprocesses.

Implementation: All exec.Command calls replaced with exec.CommandContext. Process group killing via setProcessGroup + syscall.Kill(-pid, SIGKILL) prevents orphaned child processes.

Acceptance criteria:

All exec.Command calls in ConvoyManager use exec.CommandContext(m.ctx, ...) (convoy_manager.go:200,241,257)
All exec.Command calls in operations.go accept and use a context parameter
Daemon shutdown completes within bounded time even if gt subprocess hangs (convoy_manager_integration_test.go:154-206)
Killed subprocesses do not leave orphaned child processes (convoy_manager.go, operations.go)

S-10: Resolved binary paths in operations.go [DONE]

Description: Observer subprocess calls use resolved binary paths instead of bare "gt" to avoid PATH-dependent behavior drift.

Implementation: CheckConvoysForIssue resolves via exec.LookPath("gt") with fallback to bare "gt". Threads gtPath parameter to runConvoyCheck and dispatchIssue in operations.go.

Acceptance criteria:

runConvoyCheck and dispatchIssue accept a gtPath parameter
CheckConvoysForIssue threads resolved path
All callers updated: daemon (resolved m.gtPath)
Fallback to bare "gt" if resolution fails

S-11: Test gap -- priority 1 (high blast-radius invariants) [DONE]

Description: Filled testing gaps for core invariants identified in the test plan analysis.

Tests added:

Test	What it proves
`TestFeedFirstReady_MultipleReadyIssues_DispatchesOnlyFirst`	3 ready issues -> sling log contains only first issue ID
`TestFeedFirstReady_UnknownPrefix_Skips`	Issue prefix not in routes.jsonl -> sling never called, error logged
`TestFeedFirstReady_UnknownRig_Skips`	Prefix resolves but rig lookup fails -> sling never called
`TestFeedFirstReady_EmptyReadyIssues_NoOp`	`ReadyIssues=[]` despite `ReadyCount>0` -> no crash, no dispatch
`TestEventPoll_SkipsNonCloseEvents_NegativeAssertion`	Asserts zero side effects (no subprocess calls, no convoy activity)

Acceptance criteria:

All 5 tests passing
Each test has explicit assertions (no assertion-free "no panic" tests)

S-12: Test gap -- priority 2 (error paths) [DONE]

Description: Covered error paths that previously had no test coverage.

Tests added:

Test	What it proves
`TestFindStranded_GtFailure_ReturnsError`	`gt convoy stranded` exits non-zero -> error returned
`TestFindStranded_InvalidJSON_ReturnsError`	`gt` returns non-JSON stdout -> parse error returned
`TestScan_FindStrandedError_LogsAndContinues`	`scan()` doesn't panic when `findStranded` fails
`TestPollEvents_GetAllEventsSinceError`	`GetAllEventsSince` returns error -> logged, retried next interval

Acceptance criteria:

All 4 tests passing
Error messages are verified in log assertions

S-13: Test gap -- priority 3 (lifecycle edge cases) [DONE]

Description: Covered lifecycle edge cases identified in the test plan.

Tests added:

Test	What it proves
`TestScan_ContextCancelled_MidIteration`	Large stranded list + cancel mid-loop -> exits cleanly
`TestScanStranded_MixedReadyAndEmpty`	Heterogeneous stranded list routes ready->sling and empty->check correctly
`TestStart_DoubleCall_Guarded`	Second `Start()` is no-op, warning logged

Acceptance criteria:

All 3 tests passing

S-14: Test infrastructure improvements [DONE]

Description: Improved test harness quality and reduced duplication.

Items:

Item	Impact
Extract `mockGtForScanTest(t, opts)` helper	Used by 5+ scan tests (`convoy_manager_test.go:57-117`)
Add side-effect logger to all mock scripts	All mock scripts write call logs for positive/negative assertions
Fix `DispatchFailure` test logger to capture `fmt.Sprintf(format, args...)`	Assertions verify rendered messages with correct IDs
Convert `TestScanStranded_NoStrandedConvoys` to negative test	Asserts sling/check logs absent

Acceptance criteria:

Shared mock builder exists and is used by >= 3 scan tests (5 tests use it)
All mock scripts write to call log files (negative tests can assert empty)
No assertion-free tests remain in convoy_manager_test.go

S-15: Documentation update [DONE]

Description: Update stale documentation to reflect current implementation.

Items:

Document	Issue
`docs/design/daemon/convoy-manager.md`	Mermaid diagram shows `bd activity --follow` but implementation uses SDK `GetAllEventsSince` polling
`docs/design/daemon/convoy-manager.md`	Text says "Restarts with 5s backoff on stream error" -- no stream, no backoff; it's a poll-retry loop
`docs/design/convoy/testing.md`	Row "Stream failure triggers backoff + retry loop" is stale (no stream)
`docs/design/convoy/testing.md`	`TestDoubleStop_Idempotent` listed as gap but now exists
`docs/design/convoy/convoy-lifecycle.md`	Observer table lists Deacon as primary third observer; implementation uses Refinery
`docs/design/convoy/convoy-lifecycle.md`	"No manual close" claim is stale; `gt convoy close --force` exists
`docs/design/convoy/convoy-lifecycle.md`	Relative link to convoy concepts doc is broken (`../concepts/...`)
`docs/design/convoy/spec.md`	File map test counts drifted from current suite

Acceptance criteria:

Mermaid diagram shows SDK polling architecture
Text accurately describes poll-retry semantics
Testing.md reflects current test inventory
Lifecycle observer and manual-close sections match implementation
Broken links in lifecycle doc are fixed
Spec file-map counts and command list match current source

Completion note: Completed in this review pass; remaining ambiguity about refinery root-path semantics is tracked separately in S-17.

S-16: Corrective follow-up for DONE stories [DONE]

Description: Add explicit corrective tasks for inaccuracies discovered in stories marked DONE, without changing the implementation status itself.

Rationale: DONE stories can still contain stale supporting narrative or inventory details after nearby refactors. Corrections are tracked explicitly to avoid silently editing historical delivery claims.

Scope:

S-01: clarify that non-close event "zero side effects" is currently partial until negative subprocess assertions are added (see S-11)
S-04: replace brittle line-number call-site references with symbol/section anchors in handlers.go
S-05: validate/clarify refinery townRoot vs rig-path argument assumptions for CheckConvoysForIssueWithAutoStore

Acceptance criteria:

Corrective notes are added to affected DONE stories without downgrading status
S-04 call-site references no longer depend on fixed line numbers
S-05 includes an explicit note on root-path assumptions and validation status

Status note: All corrective notes updated. S-01 negative assertion test now exists (resolved). S-04 call sites already use semantic descriptions. S-05 note updated to reflect S-17 verification findings (incorrect path, fix in S-18).

S-17: Refinery observer root-path verification [DONE]

Description: Verify whether refinery passing e.rig.Path into CheckConvoysForIssueWithAutoStore is correct for convoy visibility.

Context:

Observer helper opens beads store under <townRoot>/.beads/dolt
Refinery currently passes rig path, not explicitly town root

Findings:

The current behavior is incorrect. e.rig.Path is a rig-level path (<townRoot>/<rigName>), set in rig/manager.go as filepath.Join(m.townRoot, name). OpenStoreForTown constructs <path>/.beads/dolt, so the refinery opens <townRoot>/<rigName>/.beads/dolt instead of <townRoot>/.beads/dolt.

The rig-level .beads/ directory typically contains either a redirect file (pointing to mayor/rig/.beads) or rig-scoped metadata -- not the town-level Dolt database that holds convoy data. As a result, beadsdk.Open either fails (no dolt/ directory) or opens a rig-scoped store that does not contain convoy tracking dependencies. In both cases CheckConvoysForIssueWithAutoStore silently returns nil, effectively disabling convoy checks from the refinery observer.

Other observers handle this correctly:

Witness: resolves town root via workspace.Find(workDir) before calling
Daemon: passes d.config.TownRoot directly

Fix required: Resolve town root from e.rig.Path using workspace.Find before passing to CheckConvoysForIssueWithAutoStore, matching the witness pattern. See S-18 for implementation.

Acceptance criteria:

Behavioral expectation is documented (town root vs rig root)
If current behavior is correct, add code comment/spec note explaining why
If incorrect, create implementation follow-up story and cross-link here -> S-18

S-18: Fix refinery convoy observer town-root path [DONE]

Description: Fixed the refinery's CheckConvoysForIssueWithAutoStore call to pass the town root instead of the rig path, so convoy checks actually open the correct beads store.

Context: Identified by S-17 verification. The refinery was passing e.rig.Path (<townRoot>/<rigName>) but the function expects the town root. This silently disabled convoy observation from the refinery.

Implementation: engineer.go now resolves town root via workspace.Find(e.rig.Path) before calling CheckConvoysForIssueWithAutoStore, matching the witness pattern.

Acceptance criteria:

Refinery resolves town root via workspace.Find(e.rig.Path) before calling CheckConvoysForIssueWithAutoStore
Pattern matches witness implementation (graceful fallback if town root not found)
Import workspace package added to engineer.go
BUG(S-17) comment in engineer.go removed after fix

4. Critical Invariants

#	Invariant	Category	Blast Radius	Story	Tested?
I-1	Issue close triggers `CheckConvoysForIssue`	Data	High	S-01	Yes
I-2	Non-close events produce zero side effects	Safety	Low	S-01	Yes (`TestEventPoll_SkipsNonCloseEvents_NegativeAssertion`)
I-3	High-water mark advances monotonically	Data	High	S-01	Implicit
I-4	Convoy check is idempotent	Data	Low	S-03	Yes
I-5	Stranded convoys with ready work get fed	Liveness	High	S-02	Yes
I-6	Empty stranded convoys get auto-closed	Data	Medium	S-02	Yes
I-7	Scan continues after dispatch failure	Liveness	Medium	S-02	Yes
I-8	Context cancellation stops both goroutines	Liveness	High	S-06	Yes
I-9	One issue fed per convoy per scan	Safety	Medium	S-02	Implicit
I-10	Unknown prefix/rig skips issue (no crash)	Safety	Medium	S-02	Yes (`TestFeedFirstReady_UnknownPrefix_Skips`, `_UnknownRig_Skips`)
I-11	`Stop()` is idempotent	Safety	Low	S-08	Yes
I-12	Subprocess cancellation on shutdown	Liveness	High	S-09	Yes (`TestConvoyManager_ShutdownKillsHangingSubprocess`)

5. Failure Modes

Event Poll

Failure	Likelihood	Recovery	Tested?
`GetAllEventsSince` error	Low	Retry next 5s interval	Yes (`TestPollEvents_GetAllEventsSinceError`)
Beads store nil	Medium	Event poll disabled, stranded scan continues	Yes
Close event with empty `issue_id`	Low	Skipped	No
`CheckConvoysForIssue` panics	Low	Daemon process crash -> restart	No

Stranded Scan

Failure	Likelihood	Recovery	Tested?
`gt convoy stranded` error	Low	Logged, skip cycle	Yes (`TestFindStranded_GtFailure_ReturnsError`)
Invalid JSON from `gt`	Low	Logged, skip cycle	Yes (`TestFindStranded_InvalidJSON_ReturnsError`)
`gt sling` dispatch fails	Medium	Logged, continue to next convoy	Yes
`gt convoy check` fails	Low	Logged, continue to next convoy	No
Unknown prefix for issue	Low	Logged, skip issue	Yes (`TestFeedFirstReady_UnknownPrefix_Skips`)
Unknown rig for prefix	Low	Logged, skip issue	Yes (`TestFeedFirstReady_UnknownRig_Skips`)
`gt` subprocess hangs	Low	Context cancellation kills process group	Yes (`TestConvoyManager_ShutdownKillsHangingSubprocess`)

Lifecycle

Failure	Likelihood	Recovery	Tested?
`Stop()` before `Start()`	Low	`wg.Wait()` returns immediately	No
Double `Stop()`	Low	Idempotent	Yes
Double `Start()`	Low	Guarded (`atomic.Bool`, no-op)	Yes (`TestStart_DoubleCall_Guarded`)
Subprocess blocks shutdown	Low	Context cancellation kills process group	Yes (`TestConvoyManager_ShutdownKillsHangingSubprocess`)

6. File Map

Core Implementation

File	Contents
`internal/daemon/convoy_manager.go`	ConvoyManager: event poll + stranded scan goroutines
`internal/convoy/operations.go`	Shared `CheckConvoysForIssue`, `feedNextReadyIssue`, `getTrackingConvoys`, `IsSlingableType`, `isIssueBlocked`
`internal/beads/routes.go`	`ExtractPrefix`, `GetRigNameForPrefix` (prefix -> rig resolution)
`internal/beads/fields.go`	`MRFields.ConvoyID`, `MRFields.ConvoyCreatedAt` (convoy tracking in MR beads)

Integration Points

File	How it uses convoy
`internal/daemon/daemon.go`	Opens multi-rig beads stores, creates ConvoyManager in `Run()`, stops in `shutdown()`
`internal/witness/handlers.go`	Convoy observer removed (S-04 REMOVED)
`internal/refinery/engineer.go`	Convoy observer removed (S-05 REMOVED)
`internal/cmd/convoy.go`	CLI: `gt convoy create/status/list/add/check/stranded/close/land`
`internal/cmd/sling_convoy.go`	Auto-convoy creation during `gt sling`
`internal/cmd/formula.go`	`executeConvoyFormula` for convoy-type formulas

Tests

File	What it tests
`internal/daemon/convoy_manager_test.go`	ConvoyManager unit tests (22 tests)
`internal/daemon/convoy_manager_integration_test.go`	ConvoyManager integration tests (2 tests, `//go:build integration`)
`internal/convoy/store_test.go`	Observer store helpers (3 tests)
`internal/convoy/operations_test.go`	Operations function edge cases + safety guard tests
`internal/daemon/daemon_test.go`	Daemon-level manager lifecycle (2 convoy tests)

Design Documents

File	Contents
`docs/design/convoy/convoy-lifecycle.md`	Problem statement, design principles, flow diagram
`docs/design/convoy/spec.md`	This document (includes test harness scorecard and remaining gaps)
`docs/design/daemon/convoy-manager.md`	ConvoyManager architecture diagram (SDK polling + stranded scan)

7. Review Findings -> Story Mapping

Finding	Story
Stream-based convoy-manager doc was stale	S-15
Testing doc had stale stream/backoff and duplicate gap entries	S-15
Lifecycle observer/manual-close claims were stale	S-15
Spec file-map command/test counts drifted	S-15
DONE stories needed explicit corrective handling	S-16
Refinery observer root-path ambiguity remains	S-17 (verified)
Refinery root-path fix required	S-18

8. Non-Goals (This Spec)

These are documented in convoy-lifecycle.md as future work but are not in scope for this spec:

Convoy owner/requester field and targeted notifications (P2 in lifecycle doc)
Convoy timeout/SLA (due_at field, overdue surfacing) (P3 in lifecycle doc)
Convoy reopen command (implicit via add, explicit command deferred)
Test clock injection for ConvoyManager (P3 -- useful but not blocking)

Test Harness & Remaining Gaps

Harness Scorecard

Dimension	Score (1-5)	Key Gap
Fixtures & Setup	4	`mockGtForScanTest` shared builder covers scan tests; processLine path has own setup
Isolation	4	Temp dirs + `t.Setenv(PATH)` is solid; Windows correctly skipped; no shared state
Observability	4	All mock scripts emit call logs; negative tests assert log files absent/empty
Speed	4	All convoy-manager tests run quickly; no long-running interval waits in current suite
Determinism	4	No real timing dependencies; ticker tests use long intervals to avoid races

Test Clock Injection (P3)

Problem: ConvoyManager uses time.Ticker with 30s default. Testing "runs at interval" requires waiting or injecting a clock.

Proposal: Add clock field to ConvoyManager (interface with NewTicker(d)) defaulting to real time. Tests inject fake clock with immediate tick.

Compound Value: All periodic daemon components benefit.

Status: Not implemented. Tests use long intervals (10min) to prevent ticker firing during test.

Remaining Test Gaps

Add TestProcessLine_EmptyIssueID (close event with empty issue_id)
Expand integration test coverage for multi-rig event polling

1. Problem Statement​

2. Architecture​

2.1 ConvoyManager (daemon-resident)​

2.2 Shared Observer (convoy.CheckConvoysForIssue)​

2.3 Key Design Decisions​

3. Stories​

Legend​

Quality Gates (for all implementation stories)​

S-01: Event-driven convoy completion detection [DONE]​

S-02: Periodic stranded convoy scan [DONE]​

S-03: Shared convoy observer function [DONE]​

S-04: Witness integration [REMOVED]​

S-05: Refinery integration [REMOVED]​

S-06: Daemon lifecycle integration [DONE]​

S-07: Convoy fields in MR beads [DONE]​

S-08: ConvoyManager lifecycle safety [DONE]​

S-09: Subprocess context cancellation [DONE]​

S-10: Resolved binary paths in operations.go [DONE]​

S-11: Test gap -- priority 1 (high blast-radius invariants) [DONE]​

S-12: Test gap -- priority 2 (error paths) [DONE]​

S-13: Test gap -- priority 3 (lifecycle edge cases) [DONE]​

S-14: Test infrastructure improvements [DONE]​

S-15: Documentation update [DONE]​

S-16: Corrective follow-up for DONE stories [DONE]​

S-17: Refinery observer root-path verification [DONE]​

S-18: Fix refinery convoy observer town-root path [DONE]​

4. Critical Invariants​

5. Failure Modes​

Event Poll​

Stranded Scan​

Lifecycle​

6. File Map​

Core Implementation​

Integration Points​

Tests​

Design Documents​

7. Review Findings -> Story Mapping​

8. Non-Goals (This Spec)​

Test Harness & Remaining Gaps​

Harness Scorecard​

Test Clock Injection (P3)​

Remaining Test Gaps​

1. Problem Statement

2. Architecture

2.1 ConvoyManager (daemon-resident)

2.2 Shared Observer (`convoy.CheckConvoysForIssue`)

2.3 Key Design Decisions

3. Stories

Legend

Quality Gates (for all implementation stories)

S-01: Event-driven convoy completion detection [DONE]

S-02: Periodic stranded convoy scan [DONE]

S-03: Shared convoy observer function [DONE]

S-04: Witness integration [REMOVED]

S-05: Refinery integration [REMOVED]

S-06: Daemon lifecycle integration [DONE]

S-07: Convoy fields in MR beads [DONE]

S-08: ConvoyManager lifecycle safety [DONE]

S-09: Subprocess context cancellation [DONE]

S-10: Resolved binary paths in operations.go [DONE]

S-11: Test gap -- priority 1 (high blast-radius invariants) [DONE]

S-12: Test gap -- priority 2 (error paths) [DONE]

S-13: Test gap -- priority 3 (lifecycle edge cases) [DONE]

S-14: Test infrastructure improvements [DONE]

S-15: Documentation update [DONE]

S-16: Corrective follow-up for DONE stories [DONE]

S-17: Refinery observer root-path verification [DONE]

S-18: Fix refinery convoy observer town-root path [DONE]

4. Critical Invariants

5. Failure Modes

Event Poll

Stranded Scan

Lifecycle

6. File Map

Core Implementation

Integration Points

Tests

Design Documents

7. Review Findings -> Story Mapping

8. Non-Goals (This Spec)

Test Harness & Remaining Gaps

Harness Scorecard

Test Clock Injection (P3)

Remaining Test Gaps