Shipyard is a cross-platform CI orchestration layer that coordinates validation for AI agents working across parallel worktrees
I’ve been building cross-platform software where every change needs to be validated across Windows, macOS, Linux, Android, and iOS before it can land.
I rely heavily on AI agents to do the coding work. They operate in parallel across worktrees—writing code, committing, opening PRs, and attempting to validate before auto-merging.
At first, everything ran on local VMs. That worked fine when things were mostly sequential. But as I started running agents in parallel, builds began colliding. Jobs stepped on each other. Results became unreliable.
I added a queue to protect the machines.
That solved the collisions. But once everything flowed through a single queue, something more interesting happened: I could coordinate validation across all my execution environments—local machines, VMs, SSH hosts, and cloud runners (e.g. GitHub Actions runners or Namespace).
Agents could:
- run builds
- read failures
- fix issues
- retry
All without me involved.
I didn’t want to stand up or maintain traditional CI infrastructure. I wanted something lightweight that worked with the machines I already had.
That abstraction became Shipyard.
Shipyard is a thin coordination layer for builds and tests. It doesn’t replace your build system. It answers one question:
Does this commit pass everywhere it needs to?
If you’re building cross-platform software—especially with agents working in parallel—that turns out to matter.
What Makes Shipyard Different
Exact-SHA validation
Every target validates the specific commit you queued, not whatever happens to be checked out. Code is delivered via git bundles, so targets don’t need credentials. Evidence is bound to the SHA, preventing stale results from satisfying merge gates.
Smart queue for parallel agents
All agents share a machine-global queue. Jobs are prioritized and scheduled FIFO within priority. New commits replace older pending jobs on the same branch, while targeted reruns and different validation modes can run independently.
Fail-fast across targets.
If Mac fails, Shipyard stops immediately — it doesn't waste time running Windows and Linux when you already know you need to fix something. Remaining targets are marked as skipped. When you want to run everything regardless (to see the full picture), use --continue.
Targeted re-runs
If Windows fails but macOS and Linux pass, you re-run only Windows. Previous successful results are preserved and reused.
Stage-aware resume.
If your build succeeded but tests failed, you don't need to rebuild from scratch. Use --resume-from test to skip configure and build, running only the test stage. This works because Shipyard runs validation in stages (configure → build → test) and tracks which stage failed — so both you and your agent know exactly what broke and where to pick up.
Failover with intent
If a machine is unavailable, Shipyard walks a fallback chain (VM → cloud → hosted runner). Real test failures do not trigger fallback. Every result records which backend produced it.
Transient failure handling
Common SSH failures are retried with backoff. Permanent errors fail immediately.
Ecosystem detectionshipyard init detects 22 common stacks (CMake, Swift, Xcode, Rust, Go, Node, Python, etc.) and infers build/test commands. Polyglot repos are handled without duplication.
Structured output for agents
Every command supports --json with a versioned schema. Agents consume structured results directly.
Profiles
Switch between environments (local, normal, full) without editing config.
Merge gatingshipyard ship merges only when all required platforms pass for the exact HEAD SHA.
Operational cleanup
Logs and artifacts are managed automatically, with safe cleanup controls.
Sound Useful?
Try it: