
AFK Builds with 100% Green PRs (Chunk Sidecars Inside the Agent Loop)
Five RalphCI Snake game builds with Chunk sidecars in the Review Gate: AFK agent loops, pre-push CI parity, and 100% green pull requests end-to-end.
I had not touched the keyboard in twenty minutes. CI pipeline #1205 was green anyway. An agent had built another Snake game from our standard Loop Lab benchmark while I stayed out of the way. My local Review Gate failed three times first, each time on a Chunk sidecar check, not merely on my local lint-and-test run. All before a single push to GitHub.
If sidecars are new to you, a Chunk sidecar is a lightweight remote microVM that sits in your inner loop. Your workspace syncs there, Chunk runs a microbuild (lint, tests, build, and the repo policy hooks you configured), and you get CI-shaped feedback on Linux before your branch leaves the laptop.
CircleCI CTO Rob Zuber describes the goal as rebalancing inner and outer loop validation so agents are not learning about failures only after push.
Our three early sidecar failures were unrelated. First, a missing LEGAL_DISCLAIMER.md at the repo root (legal policy gate). Second, a missing SNAKE.md marker file in the experiment workspace (documentation policy gate). Third, a TypeScript compile error the Snake workspace's local tests never exercised but the sidecar's full pnpm build caught.
All three times, the lint-and-test run on my Mac localhost said green, but the red Chunk sidecar said "fix this stuff." CI Doctor landed one fix(ci-sidecar): commit per issue. Only then did my local Review Gate allow a push to GitHub.
That is a small lab story. We think it is a bigger CI story.
We humbly believe we are building toward the future of continuous integration in the agent era: validation woven through the whole loop, ending on a 100% green pull request you can merge with confidence.
Local lint and tests first. A Chunk sidecar microbuild that partially mirrors your pipeline while the agent still has context. Then CircleCI on the branch, doing what CI has always done, checking integration before a human approves merge.
AFK application builds that land as 100% green PRs are what we can demo today. The thesis underneath: CI validation that keeps up with the pace of your local coding agents.
We ran five Snake game builds to pressure-test that future. What follows is the lab report.
Hypothesis
If RalphCI runs Chunk sidecar validation (chunk validate --remote) after local lint:fix and test:run, then:
- A coding agent can run mostly unattended through build and fix cycles (while a CI Doctor agent handles sidecar and pipeline failures locally).
- Commits that pass the local Review Gate and push to GitHub show up on the branch as a 100% green PR: with every mirrored CI pipeline passing on the branch.
- CircleCI remains the authority at merge time. Sidecars extend CI into the inner loop; they do not replace it.
Our February 10 study already showed CI-aware RalphCI can get to green pipelines on this Snake benchmark when the agent answers to CircleCI after push. This experiment adds pre-push CI parity via sidecars so the PR itself is green top to bottom, and not just at the end of the PR.
Setup
Benchmark: Five independent runs in ralph-ci/experiments/w_chunk-sidecars/iteration-{1..5}/. Same Snake spec as prior Loop Lab RalphCI work: 7 tasks, Vitest TDD, canvas game.
The stack (from top to bottom):
| Layer | What runs |
|---|---|
| Build Agent | Codes against tasks.json |
| Review Gate (local) | format:fix, lint:fix, test:run (60s cap) |
| Review Gate (Chunk) | chunk validate --remote on Chunk sidecar (Linux microVM) |
| Smart push | Push only when local review gate passes |
| CircleCI | Full ci-workflow on the branch after push |
| CI Doctor | Fixes sidecar or pipeline failures with full logs |
| Human in the loop | Approval gate before merge (config default) |
Chunk mirrors CI: .chunk/config.json runs policy gates (--gate-legal, --gate-snake) plus remote pnpm lint, pnpm test:run, pnpm build, matching CircleCI jobs on the branch.
Sample size: 5 runs. Lab notebook scale.
Results
Five runs at a glance
| Run | RalphCI iterations | Sidecar CI Doctor fixes | Tasks completed | Final branch outcome |
|---|---|---|---|---|
| 1 | 11 | 3 | 7/7 | Green PR (all checks) |
| 2 | 13 | 4 | 7/7 | Green PR (all checks) |
| 3 | 11 | 3 | 7/7 | Green PR (all checks) |
| 4 | 11 | 3 | 7/7 | Green PR (all checks) |
| 5 | 11 | 3 | 7/7 | Green PR (all checks) |
Totals: 57 RalphCI iterations, 16 sidecar repair loops, 5/5 runs ended with all tasks complete and green CircleCI on the branch.
Source: iteration-*/metrics.json and activity.md.
Local green is not PR green
Every run, early iterations: Review Gate fails on Chunk sidecar, not local lint-and-test run. On all runs, the first three sidecar fixes were: restore LEGAL_DISCLAIMER.md (legal gate), add the SNAKE.md experiment marker file (documentation gate), then fix a tsc error in run-ci.ts surfaced by the monorepo build step.
Same pattern across runs: failures local tests never see. CI Doctor agent fixes them without a human in the loop. Only then does the local Review Gate allow a push to GitHub.
| Layer | Typical signal on early commits |
|---|---|
Local pnpm test:run | Pass |
| Chunk sidecar microbuild | Fail (fix CI-shaped feedback until sidecar passes) |
| CircleCI on the branch | Green once sidecar gate passes |
After the sidecar gate cleared, pushes stayed green on CircleCI. All CircleCI pipelines passed upon push to remote. No surprise reds on mirrored work.
Compared to our February baseline
On the same Snake benchmark without sidecars in the Review Gate, agents often finished locally while the branch still failed CircleCI.
With sidecars before push plus CI on the branch, five for five ended as 100% green PRs in our runs: build, lint, policy gates, and test jobs all passing on the merge target after push to GitHub.
The continuous integration breakthrough is three layers: local, sidecar, pipeline. AFK works when failure shows up in layer two, so layer three confirms instead of contradicts.
Takeaway
Rob Zuber framed inner-loop validation for agents. RalphCI plus Chunk sidecars are one way to wire that into an orchestration loop so AFK does not mean "ship and pray."
What held across five Snake game builds:
- Sidecars catch CI-shaped failures while the agent still has context.
- A 100% green PR needs three layers: local, sidecar, pipeline. Skip one, get surprised downstream.
- Humans review green branches ONLY. They rarely see red, not even on a single commit.
Green CI is still priceless. Chunk sidecars are how you earn a 100% green PR before you even ask a human to look, and a green PR instills more confidence in the agent's work than a red one.
The artifact: a 100% green PR ready for review
This is the outcome we care about. A pull request an agent built while I was AFK, with every last CircleCI check green on the branch after push to GitHub.

Twelve commits on the branch. Task work, three fix(ci-sidecar): repairs, finalize, metrics. At the bottom of the page: All checks have passed. Five green jobs:
ci/circleci: buildci/circleci: lintci/circleci: require-legal-disclaimer-mdci/circleci: require-snake-mdci/circleci: test
No conflicts with base. Merge pull request is the honest end state.
That screenshot is what we mean by AFK builds and 100% green PRs: every CircleCI check green before a human opens the branch, because validation ran local, then on a Chunk sidecar, then on the pipeline, and the agent fixed failures in between.
Five Snake game builds proved the workflow. In future runs, we may expand the repo complexity and scope, and the review gates along with it, to see how far we can push this thesis.
But we will keep the overall stack, because we humbly think that this is where continuous integration is going.
Related experiments
Hardening RalphCI loops for open source after the February 2026 study
The self-healing CI-aware AI coding loop that fixes CI failures: what we hardened after the February study, why we shipped MIT open source, and how Build Agent, CI Doctor, and Review Gate keep agents honest against real pipelines.
Team Onboarding Buddy: Claude Code Skills vs the Wiki Maze
We packaged team onboarding as Claude Code plugin skills with MCP-backed checks. The finding is simple: routing plus verification beats another stack of wiki pages.
I Gave Claude Seven Personalities and Pointed Them at the SDLC
Claude LiveCaster shipped with one announcer persona. I added six more, covering the full software development lifecycle, and discovered that YAML-driven personas are the real primitive.