/loop Gave Claude a Heartbeat. I Gave It a Microphone.

I wanted to see if Claude Code's /loop command could do something it probably wasn't designed for: run a live play-by-play announcer for AI model eval races, speaking commentary out loud through my laptop speakers every few minutes for the duration of a 45-minute race.

Sixteen commits later, it works. Six AI models racing head-to-head on eval tasks, and Claude is calling the action in real time like a motorsport broadcaster with a Southern drawl and opinions about latency.

Watch the full demo on YouTube.

Hypothesis

I expected Claude Code's /loop command to be stable enough to serve as a cron-style backbone for a recurring, stateful task. Specifically: read live log data every N minutes, generate contextual commentary based on what changed since last check, synthesize it to speech, and auto-cancel the cron when the race finishes.

My reasoning: slash commands already work for one-shot jobs. The /loop extension schedules them on an interval. If the execution stays in the main session (where permissions are already granted), it should hold together. But I'd never run anything on /loop for 30+ minutes straight, so I genuinely didn't know.

Setup

The Stack

Claude Code with /loop slash command support
MindTrial: an open-source AI model eval framework I forked from a colleague, Petr Malik, on the CircleCI AI Team. Runs tasks against multiple LLM providers (OpenAI, Google, Anthropic, DeepSeek, Mistral, xAI) simultaneously and scores the results.
Kokoro TTS: open-source text-to-speech running locally. No API calls, no latency. Sounds surprisingly good.
Custom slash commands I built for the experiment:
- /run-model-comparison builds MindTrial, launches the eval, initializes a commentary transcript
- /simulate-model-comparison generates fake log output in the same format, so I could test end-to-end without burning API tokens
- /announce-model-comparison is the single announcer iteration, designed to be called by /loop
- /stop-model-comparison kills the eval, finalizes the transcript, shows the leaderboard

How It Works

You kick off an eval race. Six model configs start processing tasks in parallel (I patched MindTrial's Go code so runTasks launches each config in its own goroutine instead of running sequentially within a provider). Then you start the announcer:

bash
/loop 5m /announce-model-comparison

Every 5 minutes, Claude wakes up, tails the eval log, parses the current leaderboard, figures out what changed since last time (lead changes, new completions, errors, scoring gaps), writes Ken Squier-style race commentary, speaks it out loud via Kokoro, and appends to a running transcript.

When the race finishes, the announcer detects it from the log, delivers a final wrap-up with standings and notable moments, and auto-cancels its own cron job via CronList + CronDelete. Zero human intervention after you start it.

The Variable

I started by trying to run the announcer as a background Task agent. That's the variable I was really testing at first. But Task agents lose Bash permissions mid-run. I discovered this the hard way when the announcer went silent 12 minutes in. So I pivoted to /loop, which runs in the main Claude Code session where permissions persist.

That pivot turned out to be the more interesting architecture anyway.

Results

Metric	Outcome
Total announcer iterations (real eval)	8 (over ~40 min race)
Missed iterations	0
Auto-cancellation on race end	Worked on every run
Background Task agent approach	Failed (Bash permissions lost mid-run)
`/loop` approach	Stable across all test runs
TTS commentary generated	Yes, contextual and evolving
Human intervention required after setup	Zero

The Background Task Discovery

The first architecture I tried was launching the announcer as a background Task agent. Makes sense in theory: background agent watches the race, main session stays free. In practice, Task agents lose Bash permissions after about 10-15 minutes. The agent can still think, but it can't execute shell commands, which means it can't tail the log, can't run TTS, can't do anything useful.

This isn't documented anywhere I could find. I discovered it when the commentary just... stopped. Checked the Task agent output and found it apologizing for being unable to execute commands.

The /loop Fix

/loop runs in the main session. Permissions persist because they were already granted when you started the session. Each iteration gets a fresh context call, but the slash command reads from the same log file and transcript, so state is maintained through the filesystem rather than through agent memory.

This is a better pattern. The announcer doesn't need to hold state in memory between calls. It reads the log, diffs against the transcript, generates commentary, writes it back. Stateless iterations with filesystem-backed state. Feels like a cron job because it is one.

TTS Quirks

One thing I didn't anticipate: model version numbers. Kokoro TTS reads "4.6" as "four six" instead of "four point six." I had to add a decimal-to-word conversion step in the announcer pipeline. Small thing, but it matters when you're narrating "Claude 3 point 5 Sonnet pulls ahead of GPT 4 point 1" every few minutes.

Also had to tune the dynamic tail sizing. The announcer reads N lines from the end of the log each iteration. If your /loop interval is 1 minute, you need fewer lines than if it's 10 minutes. I settled on a LOG_LINES_PER_MINUTE=50 constant, clamped between 60 and 500 lines, scaled by the interval. Gets the right amount of context without drowning in log noise.

The Session

16 commits. 1,056 lines added. Started at 1:17 AM, merged at 3:35 PM. The commit history tells the story: initial build, rename and refactor, simulation mode for testing, UX improvements, the Task-agent-to-loop pivot, opt-in announcer with interactive prompts, interval selection, and docs.

Built the whole thing with Claude Code writing the code while I directed architecture decisions. Ate its own dog food: Claude Code building a Claude Code automation.

Takeaway

/loop is the more interesting primitive here, not the voice announcer itself. The announcer is fun (genuinely fun, the Ken Squier voice calling lead changes between Gemini and Claude is something I didn't know I needed). But the pattern underneath is the finding.

Stateless cron iterations with filesystem-backed state is a reliable architecture for Claude Code automations. You don't need the agent to remember anything between calls. Write state to disk. Read it back next iteration. Let /loop handle the scheduling. Let the slash command handle the logic. The agent stays in the main session where permissions work, and each iteration is a clean execution.

This pattern applies to way more than race announcing. Monitoring dashboards. Periodic code review sweeps. Build status narration. Anything you'd write a cron job for, you can now write as a slash command and schedule with /loop.

The constraint that makes it work is the same constraint that makes cron jobs work: each invocation is independent. Don't rely on agent memory. Rely on the filesystem.

What's Next

I extracted the announcer into a standalone open-source project: claude-livecaster. It's the slash commands, the TTS pipeline, and the simulation harness, decoupled from MindTrial so anyone can wire it into their own long-running processes.

Things I want to test next:

/loop at scale: What happens when you run it for 4+ hours? Or days? Does it stay stable? Are there memory or context limits I haven't hit yet?
Multiple /loop jobs simultaneously: Can you run two different slash commands on different intervals in the same session?
CI pipeline narration: Same architecture, but the announcer watches a CircleCI pipeline instead of an eval race. Every few minutes, Claude narrates what's happening in your build. I have a feeling this is a better status dashboard than any dashboard.
Other TTS voices: Kokoro is good but limited. Want to try ElevenLabs for higher fidelity and see if the latency trade-off (API call vs. local) matters at 5-minute intervals.

If Background Task agents ever get persistent Bash permissions, I'd revisit that architecture too. Running the announcer in the main session works, but it does block you from using that session for other things while the race is running.

Sixteen commits before lunch. Ship it. 🤖🎙️ 🚀

This experiment was conducted as part of the Loop Lab program at CircleCI. The original implementation lives in MindTrial PR #2. The extracted open-source project is claude-livecaster. Full demo: YouTube.