Building Autonomous Coding Agents with GitHub Codespaces and Claude Code
Stripe recently published how they built Minions — autonomous coding agents that execute end-to-end engineering tasks without human intervention. The key ingredients: isolated environments, a coding agent (they forked Block's Goose), and an orchestration layer that mixes deterministic steps with agentic creativity.
We built something similar using GitHub Codespaces and Claude Code. Here's the architecture — and it's simpler than you'd think.
The Idea
The goal is straightforward: send an API request with a task description, have an AI agent execute it in a full development environment, and get the results back.
No browser UI. No manual setup. Just an HTTP endpoint that turns a prompt into executed code.
sequenceDiagram
participant Client as API Client
participant VPS as blle-api (VPS)
participant GH as GitHub CLI
participant CS as Codespace
participant Claude as Claude Code
Client->>VPS: POST /codespace/execute
VPS->>GH: gh codespace ssh
GH->>CS: SSH tunnel
CS->>Claude: claude -p "task..."
Claude->>Claude: Read, Edit, Bash, etc.
Claude-->>CS: Result
CS-->>VPS: stdout piped back
VPS-->>Client: JSON or SSE stream
Why Codespaces?
Stripe uses internal "devboxes" — pre-warmed EC2 instances loaded with their codebase. GitHub Codespaces gives you the same thing for free (120 core-hours/month):
- Pre-loaded with your code — it's tied to your repo
- Full Linux environment — install anything, run any command
- Isolated from production — no risk of breaking live systems
- Auto-sleep and auto-wake — stops when idle, wakes on SSH
- Claude Code pre-installed — via the VS Code extension
The codespace is the sandbox. The agent gets full permissions inside it but can't touch anything outside.
The Proxy Architecture
The API server (a Fastify app on a VPS) doesn't run Claude Code itself. It proxies requests to the codespace via gh codespace ssh:
function spawnClaude(args: string[], cwd: string | undefined, codespaceName: string) {
const claudeCmd = args.map(a => `'${a.replace(/'/g, "'\\\''")}'`).join(" ");
const findClaude = 'CLAUDE_BIN=$(ls -t /home/vscode/.vscode-remote/extensions/' +
'anthropic.claude-code-*/resources/native-binary/claude 2>/dev/null | head -1); ' +
'if [ -z "$CLAUDE_BIN" ]; then CLAUDE_BIN=claude; fi';
const remoteCmd = cwd
? `${findClaude} && cd '${cwd}' && "$CLAUDE_BIN" ${claudeCmd}`
: `${findClaude} && "$CLAUDE_BIN" ${claudeCmd}`;
return spawn(GH_BIN, ["codespace", "ssh", "-c", codespaceName, "--", remoteCmd], {
env: { ...process.env },
stdio: ["ignore", "pipe", "pipe"],
});
}
A few details worth noting:
- Claude binary discovery — Claude Code is installed as a VS Code extension, not globally. The binary path includes the version number, so we glob for the latest one.
- Auto-wake — If the codespace is sleeping,
gh codespace sshwakes it up automatically. No need to check state first. --dangerously-skip-permissions— The agent runs with full tool access. No confirmation prompts. This is the "autonomous" part.
The API
A single endpoint does everything:
POST /codespace/execute
Request Body
| Field | Default | Description |
|---|---|---|
prompt |
(required) | The task for Claude to execute |
stream |
false |
SSE streaming vs batch JSON |
maxTurns |
10 |
Agent iteration limit |
resume |
"resume" |
"resume" (by sessionId), "continue" (most recent), "new" |
sessionId |
— | Session ID from a previous response |
model |
— | Model override (sonnet, opus, haiku) |
systemPrompt |
— | Custom system prompt |
allowedTools |
all | Tool allowlist |
disallowedTools |
none | Tool denylist |
agents |
— | Custom subagent definitions |
Non-Streaming Response
{
"id": "execution-uuid",
"sessionId": "claude-session-uuid",
"exitCode": 0,
"result": [{
"type": "result",
"result": "Hello world.",
"duration_ms": 2144,
"total_cost_usd": 0.055
}],
"durationMs": 10879
}
The sessionId is the key to multi-turn conversations. Send it back in the next request to resume the same conversation — the agent remembers everything from previous turns.
Streaming
For long-running tasks, SSE streaming lets you see what the agent is doing in real-time:
curl -N -X POST https://api.example.com/codespace/execute \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{"prompt": "Fix the failing tests", "stream": true, "resume": "new"}'
Events arrive in order:
execution_start— Execution ID assignedsystem(init) — Session info, available tools, connected MCP serversassistant— Claude's responses and tool calls as they happenresult— Final summary with cost, duration, token usageexecution_complete— Exit code and total duration
Two important implementation details for streaming:
- Claude CLI requires
--verbosewhen using--output-format stream-json - The
CLAUDECODEenvironment variable must be cleared to allow spawning Claude from within another Claude session
Session Resume
This is where it gets powerful. Every response includes a sessionId. Send it back to continue the conversation:
# First call: start a new session
curl -X POST .../codespace/execute \
-d '{"prompt": "Remember the number 42. Just say OK.", "resume": "new"}'
# Response includes: sessionId = "93394d3e-..."
# Second call: resume that session
curl -X POST .../codespace/execute \
-d '{"prompt": "What number did I ask you to remember?", "sessionId": "93394d3e-..."}'
# Response: "42."
Three resume modes:
resume(default) — Resume a specific session by IDcontinue— Resume the most recent session in the working directorynew— Start fresh
Custom Subagents
Claude Code has native support for subagents — specialized AI workers that the main agent can delegate to. We expose this through the agents field:
{
"prompt": "Review the auth module and fix any security issues",
"agents": {
"security-reviewer": {
"description": "Reviews code for security vulnerabilities. Use proactively.",
"prompt": "You are a security expert. Focus on OWASP top 10.",
"tools": ["Read", "Grep", "Glob"],
"model": "haiku"
},
"fixer": {
"description": "Fixes code issues found by reviewers.",
"prompt": "Apply minimal, targeted fixes.",
"tools": ["Read", "Edit", "Write", "Bash"],
"model": "sonnet"
}
}
}
This maps directly to Claude Code's --agents CLI flag. Each subagent gets:
- Its own context window — exploration doesn't pollute the main conversation
- Its own model — route cheap tasks to Haiku, complex ones to Opus
- Scoped tools — a reviewer only gets read access, a fixer gets write access
- Custom system prompt — specialized behavior per task type
flowchart TD
A[API Request] --> B[Main Agent - Opus]
B --> C{Delegate?}
C -->|Security review| D[security-reviewer - Haiku]
C -->|Fix issues| E[fixer - Sonnet]
C -->|Direct work| F[Main agent handles it]
D -->|Findings| B
E -->|Changes made| B
B --> G[Return result]
style B fill:#e3f2fd
style D fill:#fff3e0
style E fill:#c8e6c9
When subagents are used, the modelUsage in the response shows separate token counts and costs per model — confirming which subagents handled which parts of the task.
Making It Available via Dynamic MCP
The final piece: making the codespace agent available to any AI system. By wrapping the /codespace/execute endpoint in a Dynamic MCP tool, any connected agent — Claude in the browser, a Slack bot, an n8n workflow — can invoke a full coding agent on a GitHub Codespace with a single tool call.
The tool accepts all the same parameters (prompt, sessionId, agents, etc.), calls the API, and returns a clean summary with the result, session ID, cost, and model used.
How This Compares to Stripe's Minions
| Feature | Stripe Minions | Our Setup |
|---|---|---|
| Execution environment | Internal devboxes (EC2) | GitHub Codespaces |
| Coding agent | Goose (forked) | Claude Code |
| Orchestration | Custom blueprints | API + CLI flags |
| Tool management | Toolshed (400+ MCP tools) | Dynamic MCP (DB-backed) |
| Subagents | Custom implementation | Native Claude Code subagents |
| CI feedback loop | Built-in (max 2 rounds) | Not yet |
| PR generation | Automatic | Not yet |
| Cost | Significant infrastructure | Free tier (120 hrs/mo) |
The biggest difference: Stripe built custom orchestration ("blueprints") that interleave deterministic steps with agentic ones. We're using Claude Code's native capabilities instead. This trades some control for dramatically less code.
What's Next
Inspired by Stripe's architecture and Block's Goose codebase:
Blueprints — Define multi-step workflows: lint → implement → test → fix CI → push. Deterministic nodes run shell commands, agentic nodes invoke Claude.
Context hydration — Before the agent starts, automatically gather relevant context (recent commits, open issues, project docs) and inject it into the system prompt.
CI feedback loops — Run tests after changes, feed failures back to the agent. Cap at 2 rounds like Stripe does.
Automatic PR creation — After the agent finishes, create a branch and open a PR with a structured description.
The Setup
To replicate this:
| Component | Purpose |
|---|---|
| GitHub Codespace | Isolated dev environment with Claude Code |
| Fastify API server | Proxy endpoint on VPS |
GitHub CLI (gh) |
SSH tunnel to codespace |
| Claude Code (VS Code ext) | The coding agent |
gh auth with codespace scope |
API access to codespaces |
The total infrastructure cost: a cheap VPS for the API proxy, plus GitHub's free codespace hours. The agent runs on your Claude subscription — no API token billing.
Final Thoughts
Stripe's Minions paper showed that autonomous coding agents are production-ready. But you don't need Stripe's infrastructure to build one. A GitHub Codespace, Claude Code, and a simple proxy API get you surprisingly far.
The key insight from both Stripe's approach and ours: the agent needs a real development environment, not a sandbox. It needs to run builds, execute tests, read documentation, and use real tools. Codespaces give you that for free.
The next frontier is orchestration — defining repeatable workflows that mix scripted steps with agent creativity. Stripe calls them blueprints. Goose calls them recipes. Whatever you call them, they're the bridge between "AI that writes code" and "AI that ships code."
Built and deployed using Claude Code on a GitHub Codespace — the same system described in this article.