Scheduler Architecture
Config-driven capacity-controlled polecat dispatch.
Quick Start
Enable deferred dispatch and schedule some work:
# 1. Enable deferred dispatch (config-driven, no per-command flag)
gt config set scheduler.max_polecats 5
# 2. Schedule work via gt sling (auto-defers when max_polecats > 0)
gt sling gt-abc gastown # Single task bead
gt sling gt-abc gt-def gt-ghi gastown # Batch task beads
gt sling hq-cv-abc # Convoy (schedules all tracked issues)
gt sling gt-epic-123 # Epic (schedules all children)
# 3. Check what's scheduled
gt scheduler status
gt scheduler list
# 4. Dispatch manually (or let the daemon do it)
gt scheduler run
gt scheduler run --dry-run # Preview first
Dispatch Modes
The scheduler.max_polecats config value controls dispatch behavior:
| Value | Mode | Behavior |
|---|---|---|
-1 (default) | Direct dispatch | gt sling dispatches immediately, near-zero overhead |
0 | Direct dispatch | Same as -1 — gt sling dispatches immediately |
N > 0 | Deferred dispatch | gt sling creates sling context bead, daemon dispatches |
No per-invocation flag needed. The same gt sling command adapts automatically.
Common CLI
| Command | Description |
|---|---|
gt sling <bead> <rig> | Sling bead (direct or deferred, per config) |
gt sling <bead>... <rig> | Batch sling/schedule multiple beads |
gt sling <convoy-id> | Sling/schedule all tracked issues in convoy |
gt sling <epic-id> | Sling/schedule all children of epic |
gt scheduler status | Show scheduler state and capacity |
gt scheduler list | List all scheduled beads by rig |
gt scheduler run | Trigger dispatch manually |
gt scheduler pause | Pause all dispatch town-wide |
gt scheduler resume | Resume dispatch |
gt scheduler clear | Remove beads from scheduler |
Minimal Example
gt config set scheduler.max_polecats 5
gt sling gt-abc gastown # Defers: creates sling context bead
gt scheduler status # "Queued: 1 total, 1 ready"
gt scheduler run # Dispatches -> spawns polecat -> closes context
Overview
The scheduler solves back-pressure and capacity control for batched polecat dispatch.
Without the scheduler, slinging N beads spawns N polecats simultaneously, exhausting API rate limits, memory, and CPU. The scheduler introduces a governor: beads enter a waiting state and the daemon dispatches them incrementally, respecting a configurable concurrency cap.
The scheduler integrates into the daemon heartbeat as step 14 — after all agent health checks, lifecycle processing, and branch pruning. This ensures the system is healthy before spawning new work.
Daemon heartbeat (every 3 min)
|
+- Steps 0-13: Health checks, agent recovery, cleanup
|
+- Step 14: gt scheduler run (capacity-controlled dispatch)
|
+- flock (exclusive)
+- Check paused state
+- Load config (max_polecats, batch_size)
+- Count active polecats (tmux)
+- Query sling contexts (bd list --label=gt:sling-context)
+- Join with bd ready to determine unblocked beads
+- DispatchCycle.Run() — plan + execute + report
| +- PlanDispatch(availableCapacity, batchSize, ready)
| +- For each planned bead: Execute → OnSuccess/OnFailure
+- Wake rig agents (witness, refinery)
+- Save dispatch state
Sling Context Beads
Scheduling state is stored on separate ephemeral beads called sling contexts. The work bead is never modified by the scheduler.
Each sling context bead:
- Is created via
bd create --ephemeralwith labelgt:sling-context - Has a
tracksdependency pointing to the work bead - Stores all scheduling parameters as JSON in its description
- Is closed when dispatch succeeds, the bead is cleared, or the circuit breaker trips
Why Separate Beads?
The previous approach stored scheduling metadata on the work bead's description (delimited block) and used labels (gt:queued) as state signals. This required:
- Two-step writes with rollback (metadata then label)
- Description sanitization to avoid delimiter collision
- Three-step dispatch cleanup (strip metadata + swap labels + retry)
- Custom key-value format/parse/strip functions (~250 lines)
Sling context beads eliminate all of this:
- Single atomic create —
bd create --ephemeralis one operation - JSON format —
json.Marshal/json.Unmarshalreplaces custom parsers - Work bead pristine — no description mutation, no label manipulation
- Clean lifecycle — open context = scheduled, closed context = done
Context Fields (JSON)
| Field | Type | Description |
|---|---|---|
version | int | Schema version (currently 1) |
work_bead_id | string | The actual work bead being scheduled |
target_rig | string | Destination rig name |
formula | string | Formula to apply at dispatch (e.g., mol-polecat-work) |
args | string | Natural language instructions for executor |
vars | string | Newline-separated formula variables (key=value) |
enqueued_at | RFC3339 | Timestamp of schedule |
merge | string | Merge strategy: direct, mr, local |
convoy | string | Convoy bead ID (set after auto-convoy creation) |
base_branch | string | Override base branch for polecat worktree |
no_merge | bool | Skip merge queue on completion |
account | string | Claude Code account handle |
agent | string | Agent/runtime override |
hook_raw_bead | bool | Hook without default formula |
owned | bool | Caller-managed convoy lifecycle |
mode | string | Execution mode: ralph (fresh context per step) |
dispatch_failures | int | Consecutive failure count (circuit breaker) |
last_failure | string | Most recent dispatch error message |
Bead State Machine
A sling context transitions through these states:
+------------------+
| |
v |
+----------+ dispatch ok +--------+ |
schedule | CONTEXT | ----------------> | CLOSED | |
--------> | OPEN | | (done) | |
+----------+ +--------+ |
| |
+-- 3 failures --> CLOSED (circuit-broken)
|
+-- gt scheduler clear --> CLOSED (cleared)
| State | Representation | Trigger |
|---|---|---|
| SCHEDULED | Open sling context bead | scheduleBead() |
| DISPATCHED | Closed sling context (reason: "dispatched") | dispatchSingleBead() success |
| CIRCUIT-BROKEN | Closed sling context (reason: "circuit-broken") | dispatch_failures >= 3 |
| CLEARED | Closed sling context (reason: "cleared") | gt scheduler clear |
Key invariant: the work bead is never modified by the scheduler. All state lives on the sling context bead.
Entry Points
CLI Entry Points
gt sling auto-detects the dispatch mode from config and the ID type:
| Command | Direct Mode (max_polecats=-1) | Deferred Mode (max_polecats>0) |
|---|---|---|
gt sling <bead> <rig> | Immediate dispatch | Schedule for later dispatch |
gt sling <bead>... <rig> | Batch immediate dispatch | Batch schedule |
gt sling <epic-id> | runEpicSlingByID() — dispatch all children | runEpicScheduleByID() — schedule all children |
gt sling <convoy-id> | runConvoySlingByID() — dispatch all tracked | runConvoyScheduleByID() — schedule all tracked |
Detection chain in runSling:
shouldDeferDispatch()— checkscheduler.max_polecatsconfig- Batch (3+ args, last is rig) —
runBatchSchedule()orrunBatchSling() --onflag set — formula-on-bead mode- 2 args + last is rig —
scheduleBead()or inline dispatch - 1 arg, auto-detect type: epic/convoy/task
All schedule paths go through scheduleBead() in internal/cmd/sling_schedule.go.
All dispatch goes through dispatchScheduledWork() in internal/cmd/capacity_dispatch.go.
Daemon Entry Point
The daemon calls gt scheduler run as a subprocess on each heartbeat (step 14):
// internal/daemon/daemon.go
func (d *Daemon) dispatchScheduledWork() {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
cmd := exec.CommandContext(ctx, "gt", "scheduler", "run")
cmd.Env = append(os.Environ(), "GT_DAEMON=1", "BD_DOLT_AUTO_COMMIT=off")
// ...
}
| Property | Value |
|---|---|
| Timeout | 5 minutes |
| Environment | GT_DAEMON=1 (identifies daemon dispatch) |
| Gating | scheduler.max_polecats > 0 (deferred mode) |
Schedule Path
scheduleBead() performs these steps in order:
- Validate bead exists, rig exists
- Cross-rig guard — reject if bead prefix doesn't match target rig (unless
--force) - Idempotency — skip if an open sling context already exists for this work bead
- Status guard — reject if bead is hooked/in_progress (unless
--force) - Validate formula — verify formula exists (lightweight, no side effects)
- Cook formula —
bd cookto catch bad protos before daemon dispatch - Build context fields —
SlingContextFieldsstruct with all sling params - Create sling context —
bd create --ephemeral+bd dep add --type=tracks(atomic) - Auto-convoy — create convoy if not already tracked, store convoy ID in context fields
- Log event — feed event for dashboard visibility
The create is a single atomic operation — no two-step write, no rollback needed.
Dispatch Engine
DispatchCycle
The dispatch loop is a generic orchestrator with injected callbacks:
type DispatchCycle struct {
AvailableCapacity func() (int, error) // Free dispatch slots (0=unlimited)
QueryPending func() ([]PendingBead, error) // Work items eligible for dispatch
Execute func(PendingBead) error // Dispatch a single item
OnSuccess func(PendingBead) error // Post-dispatch cleanup
OnFailure func(PendingBead, error) // Failure handling
BatchSize int
SpawnDelay time.Duration
}
Run() internally calls PlanDispatch(availableCapacity, batchSize, ready) to determine what to dispatch, then executes each planned item with callbacks.
Dispatch Flow
DispatchCycle.Run()
|
+- AvailableCapacity() → capacity = maxPolecats - activePolecats
|
+- QueryPending() → getReadySlingContexts():
| +- bd list --label=gt:sling-context --status=open (all rig DBs)
| +- Parse SlingContextFields from each context bead description
| +- bd ready --json --limit=0 (all rig DBs) → readyWorkIDs set
| +- Filter: context beads whose WorkBeadID is in readyWorkIDs
| +- Skip circuit-broken (dispatch_failures >= threshold)
|
+- PlanDispatch(capacity, batchSize, ready)
| +- Returns DispatchPlan{ToDispatch, Skipped, Reason}
|
+- For each planned bead:
+- Execute: ReconstructFromContext(fields) → executeSling(params)
+- OnSuccess: CloseSlingContext(contextID, "dispatched")
+- OnFailure: increment dispatch_failures, update context, maybe close
+- sleep(SpawnDelay)
dispatchSingleBead
Dramatically simplified — context fields are already parsed:
ReconstructFromContext(b.Context)→DispatchParamswithBeadID = b.WorkBeadID- Call
executeSling(params)— that's it
Post-dispatch cleanup is handled by callbacks:
- OnSuccess:
CloseSlingContext(b.ID, "dispatched") - OnFailure: increment
dispatch_failures, update context bead, close if circuit-broken
Capacity Management
Configuration
| Key | Type | Default | Description |
|---|---|---|---|
scheduler.max_polecats | *int | -1 | Max concurrent polecats (-1=direct, 0=disabled, N=deferred) |
scheduler.batch_size | *int | 1 | Beads dispatched per heartbeat tick |
scheduler.spawn_delay | string | "0s" | Delay between spawns (Dolt lock contention) |
Set via gt config set:
gt config set scheduler.max_polecats 5 # Enable deferred dispatch
gt config set scheduler.max_polecats -1 # Direct dispatch (default)
gt config set scheduler.batch_size 2
gt config set scheduler.spawn_delay 3s
Dispatch Count Formula
toDispatch = min(capacity, batchSize, readyCount)
where:
capacity = maxPolecats - activePolecats (positive = that many slots, 0 or negative = no capacity)
batchSize = scheduler.batch_size (default 1)
readyCount = sling contexts whose work bead appears in bd ready
Active Polecat Counting
Active polecats are counted by scanning tmux sessions and matching role via session.ParseSessionName(). This counts all polecats (both scheduler-dispatched and directly-slung) because API rate limits, memory, and CPU are shared resources.
Circuit Breaker
The circuit breaker prevents permanently-failing beads from causing infinite retry loops.
| Property | Value |
|---|---|
| Threshold | maxDispatchFailures = 3 |
| Counter | dispatch_failures field in sling context JSON |
| Break action | Close sling context (reason: "circuit-broken") |
| Reset | No automatic reset (manual intervention required) |
Flow
Dispatch attempt fails
|
+- Increment dispatch_failures in context bead
+- Store last_failure error message
|
+- dispatch_failures >= 3?
+- Yes -> CloseSlingContext(contextID, "circuit-broken")
| (context bead closed, work bead untouched)
+- No -> bead stays scheduled, retried next cycle
Scheduler Control
Pause / Resume
Pausing stops all dispatch town-wide. The state is stored in .runtime/scheduler-state.json.
gt scheduler pause # Sets paused=true, records actor and timestamp
gt scheduler resume # Clears paused state
Write is atomic (temp file + rename) to prevent corruption from concurrent writers.
Clear
Closes sling context beads, removing beads from the scheduler:
gt scheduler clear # Close ALL sling contexts
gt scheduler clear --bead gt-abc # Close context for specific bead
Status / List
gt scheduler status # Summary: paused, queued count, active polecats
gt scheduler status --json # JSON output
gt scheduler list # Beads grouped by target rig, with blocked indicator
gt scheduler list --json # JSON output
list reconciles sling contexts (all scheduled) with bd ready (unblocked work beads) to mark blocked beads.
Scheduler and Convoy Integration
Convoys and the scheduler are complementary but distinct mechanisms. Convoys track completion of related beads; the scheduler controls dispatch capacity. Two paths exist for dispatching convoy work:
Dispatch Paths
| Path | Trigger | Capacity Control | Use Case |
|---|---|---|---|
| Direct dispatch | gt sling <convoy-id> (max_polecats=-1) | None (fires immediately) | Default mode — all issues dispatch at once |
| Deferred dispatch | gt sling <convoy-id> (max_polecats>0) | Yes (daemon heartbeat, max_polecats, batch_size) | Capacity-controlled — batched with back-pressure |
Direct dispatch (max_polecats=-1): gt sling <convoy-id> calls runConvoySlingByID() which dispatches all open tracked issues immediately via executeSling(). Each issue's rig is auto-resolved from its bead ID prefix. No capacity control — all issues dispatch at once.
Deferred dispatch (max_polecats>0): gt sling <convoy-id> calls runConvoyScheduleByID() which schedules all open tracked issues (creating sling context beads). The daemon dispatches incrementally via gt scheduler run, respecting max_polecats and batch_size. Use this for large batches where simultaneous dispatch would exhaust resources.
When to Use Which
- Small convoys (< 5 issues): Direct dispatch (default, max_polecats=-1)
- Large batches (5+ issues): Set
scheduler.max_polecatsfor capacity-controlled dispatch - Epics: Same logic —
gt sling <epic-id>auto-resolves mode from config
Rig Resolution
gt sling <convoy-id> and gt sling <epic-id> auto-resolve the target rig per-bead from its ID prefix using beads.ExtractPrefix() + beads.GetRigNameForPrefix(). Town-root beads (hq-*) are skipped with a warning since they are coordination artifacts, not dispatchable work.
Safety Properties
| Property | Mechanism |
|---|---|
| Schedule idempotency | Skip if open sling context already exists for work bead |
| Work bead pristine | Scheduler never modifies work bead description or labels |
| Cross-rig guard | Reject if bead prefix doesn't match target rig (unless --force) |
| Dispatch serialization | flock(scheduler-dispatch.lock) prevents double-dispatch |
| Atomic scheduling | Single bd create --ephemeral — no two-step write, no rollback |
| Formula pre-cooking | bd cook at schedule time catches bad protos before daemon dispatch loop |
| Fresh state on save | Dispatch re-reads state before saving to avoid clobbering concurrent pause |
Code Layout
| Path | Purpose |
|---|---|
internal/scheduler/capacity/config.go | SchedulerConfig type, defaults, IsDeferred() |
internal/scheduler/capacity/pipeline.go | PendingBead, SlingContextFields, PlanDispatch(), ReconstructFromContext() |
internal/scheduler/capacity/dispatch.go | DispatchCycle type — generic dispatch orchestrator |
internal/scheduler/capacity/state.go | SchedulerState persistence |
internal/beads/beads_sling_context.go | Sling context CRUD (create, find, list, close, update) |
internal/cmd/sling.go | CLI entry, config-driven routing |
internal/cmd/sling_schedule.go | scheduleBead(), shouldDeferDispatch(), isScheduled() |
internal/cmd/scheduler.go | gt scheduler command tree |
internal/cmd/scheduler_epic.go | Epic schedule/sling handlers |
internal/cmd/scheduler_convoy.go | Convoy schedule/sling handlers |
internal/cmd/capacity_dispatch.go | dispatchScheduledWork(), dispatch callback wiring |
internal/daemon/daemon.go | Heartbeat integration (gt scheduler run) |
See Also
- Watchdog Chain — Daemon heartbeat, where scheduler dispatch runs as step 14
- Convoys — Convoy tracking, auto-convoy on schedule
- Property Layers — Labels-as-state pattern used by scheduler labels (see Operational State Events section)