Building Autonomous Coding Agents with GitHub Codespaces and Claude Code

Stripe recently published how they built Minions — autonomous coding agents that execute end-to-end engineering tasks without human intervention. The key ingredients: isolated environments, a coding agent (they forked Block's Goose), and an orchestration layer that mixes deterministic steps with agentic creativity.

We built something similar using GitHub Codespaces and Claude Code. Here's the architecture — and it's simpler than you'd think.

The Idea

The goal is straightforward: send an API request with a task description, have an AI agent execute it in a full development environment, and get the results back.

No browser UI. No manual setup. Just an HTTP endpoint that turns a prompt into executed code.

sequenceDiagram
    participant Client as API Client
    participant VPS as blle-api (VPS)
    participant GH as GitHub CLI
    participant CS as Codespace
    participant Claude as Claude Code

    Client->>VPS: POST /codespace/execute
    VPS->>GH: gh codespace ssh
    GH->>CS: SSH tunnel
    CS->>Claude: claude -p "task..."
    Claude->>Claude: Read, Edit, Bash, etc.
    Claude-->>CS: Result
    CS-->>VPS: stdout piped back
    VPS-->>Client: JSON or SSE stream

Why Codespaces?

Stripe uses internal "devboxes" — pre-warmed EC2 instances loaded with their codebase. GitHub Codespaces gives you the same thing for free (120 core-hours/month):

Pre-loaded with your code — it's tied to your repo
Full Linux environment — install anything, run any command
Isolated from production — no risk of breaking live systems
Auto-sleep and auto-wake — stops when idle, wakes on SSH
Claude Code pre-installed — via the VS Code extension

The codespace is the sandbox. The agent gets full permissions inside it but can't touch anything outside.

The Proxy Architecture

The API server (a Fastify app on a VPS) doesn't run Claude Code itself. It proxies requests to the codespace via gh codespace ssh:

function spawnClaude(args: string[], cwd: string | undefined, codespaceName: string) {
  const claudeCmd = args.map(a => `'${a.replace(/'/g, "'\\\''")}'`).join(" ");
  const findClaude = 'CLAUDE_BIN=$(ls -t /home/vscode/.vscode-remote/extensions/' +
    'anthropic.claude-code-*/resources/native-binary/claude 2>/dev/null | head -1); ' +
    'if [ -z "$CLAUDE_BIN" ]; then CLAUDE_BIN=claude; fi';
  const remoteCmd = cwd
    ? `${findClaude} && cd '${cwd}' && "$CLAUDE_BIN" ${claudeCmd}`
    : `${findClaude} && "$CLAUDE_BIN" ${claudeCmd}`;

  return spawn(GH_BIN, ["codespace", "ssh", "-c", codespaceName, "--", remoteCmd], {
    env: { ...process.env },
    stdio: ["ignore", "pipe", "pipe"],
  });
}

A few details worth noting:

Claude binary discovery — Claude Code is installed as a VS Code extension, not globally. The binary path includes the version number, so we glob for the latest one.
Auto-wake — If the codespace is sleeping, gh codespace ssh wakes it up automatically. No need to check state first.
--dangerously-skip-permissions — The agent runs with full tool access. No confirmation prompts. This is the "autonomous" part.

The API

A single endpoint does everything:

POST /codespace/execute

Request Body

Field	Default	Description
`prompt`	(required)	The task for Claude to execute
`stream`	`false`	SSE streaming vs batch JSON
`maxTurns`	`10`	Agent iteration limit
`resume`	`"resume"`	`"resume"` (by sessionId), `"continue"` (most recent), `"new"`
`sessionId`	—	Session ID from a previous response
`model`	—	Model override (sonnet, opus, haiku)
`systemPrompt`	—	Custom system prompt
`allowedTools`	all	Tool allowlist
`disallowedTools`	none	Tool denylist
`agents`	—	Custom subagent definitions

Non-Streaming Response

{
  "id": "execution-uuid",
  "sessionId": "claude-session-uuid",
  "exitCode": 0,
  "result": [{
    "type": "result",
    "result": "Hello world.",
    "duration_ms": 2144,
    "total_cost_usd": 0.055
  }],
  "durationMs": 10879
}

The sessionId is the key to multi-turn conversations. Send it back in the next request to resume the same conversation — the agent remembers everything from previous turns.

Streaming

For long-running tasks, SSE streaming lets you see what the agent is doing in real-time:

curl -N -X POST https://api.example.com/codespace/execute \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"prompt": "Fix the failing tests", "stream": true, "resume": "new"}'

Events arrive in order:

execution_start — Execution ID assigned
system (init) — Session info, available tools, connected MCP servers
assistant — Claude's responses and tool calls as they happen
result — Final summary with cost, duration, token usage
execution_complete — Exit code and total duration

Two important implementation details for streaming:

Claude CLI requires --verbose when using --output-format stream-json
The CLAUDECODE environment variable must be cleared to allow spawning Claude from within another Claude session

Session Resume

This is where it gets powerful. Every response includes a sessionId. Send it back to continue the conversation:

# First call: start a new session
curl -X POST .../codespace/execute \
  -d '{"prompt": "Remember the number 42. Just say OK.", "resume": "new"}'
# Response includes: sessionId = "93394d3e-..."

# Second call: resume that session
curl -X POST .../codespace/execute \
  -d '{"prompt": "What number did I ask you to remember?", "sessionId": "93394d3e-..."}'
# Response: "42."

Three resume modes:

resume (default) — Resume a specific session by ID
continue — Resume the most recent session in the working directory
new — Start fresh

Custom Subagents

Claude Code has native support for subagents — specialized AI workers that the main agent can delegate to. We expose this through the agents field:

{
  "prompt": "Review the auth module and fix any security issues",
  "agents": {
    "security-reviewer": {
      "description": "Reviews code for security vulnerabilities. Use proactively.",
      "prompt": "You are a security expert. Focus on OWASP top 10.",
      "tools": ["Read", "Grep", "Glob"],
      "model": "haiku"
    },
    "fixer": {
      "description": "Fixes code issues found by reviewers.",
      "prompt": "Apply minimal, targeted fixes.",
      "tools": ["Read", "Edit", "Write", "Bash"],
      "model": "sonnet"
    }
  }
}

This maps directly to Claude Code's --agents CLI flag. Each subagent gets:

Its own context window — exploration doesn't pollute the main conversation
Its own model — route cheap tasks to Haiku, complex ones to Opus
Scoped tools — a reviewer only gets read access, a fixer gets write access
Custom system prompt — specialized behavior per task type

flowchart TD
    A[API Request] --> B[Main Agent - Opus]
    B --> C{Delegate?}
    C -->|Security review| D[security-reviewer - Haiku]
    C -->|Fix issues| E[fixer - Sonnet]
    C -->|Direct work| F[Main agent handles it]
    D -->|Findings| B
    E -->|Changes made| B
    B --> G[Return result]

    style B fill:#e3f2fd
    style D fill:#fff3e0
    style E fill:#c8e6c9

When subagents are used, the modelUsage in the response shows separate token counts and costs per model — confirming which subagents handled which parts of the task.

Making It Available via Dynamic MCP

The final piece: making the codespace agent available to any AI system. By wrapping the /codespace/execute endpoint in a Dynamic MCP tool, any connected agent — Claude in the browser, a Slack bot, an n8n workflow — can invoke a full coding agent on a GitHub Codespace with a single tool call.

The tool accepts all the same parameters (prompt, sessionId, agents, etc.), calls the API, and returns a clean summary with the result, session ID, cost, and model used.

How This Compares to Stripe's Minions

Feature	Stripe Minions	Our Setup
Execution environment	Internal devboxes (EC2)	GitHub Codespaces
Coding agent	Goose (forked)	Claude Code
Orchestration	Custom blueprints	API + CLI flags
Tool management	Toolshed (400+ MCP tools)	Dynamic MCP (DB-backed)
Subagents	Custom implementation	Native Claude Code subagents
CI feedback loop	Built-in (max 2 rounds)	Not yet
PR generation	Automatic	Not yet
Cost	Significant infrastructure	Free tier (120 hrs/mo)

The biggest difference: Stripe built custom orchestration ("blueprints") that interleave deterministic steps with agentic ones. We're using Claude Code's native capabilities instead. This trades some control for dramatically less code.

What's Next

Inspired by Stripe's architecture and Block's Goose codebase:

Blueprints — Define multi-step workflows: lint → implement → test → fix CI → push. Deterministic nodes run shell commands, agentic nodes invoke Claude.

Context hydration — Before the agent starts, automatically gather relevant context (recent commits, open issues, project docs) and inject it into the system prompt.

CI feedback loops — Run tests after changes, feed failures back to the agent. Cap at 2 rounds like Stripe does.

Automatic PR creation — After the agent finishes, create a branch and open a PR with a structured description.

The Setup

To replicate this:

Component	Purpose
GitHub Codespace	Isolated dev environment with Claude Code
Fastify API server	Proxy endpoint on VPS
GitHub CLI (`gh`)	SSH tunnel to codespace
Claude Code (VS Code ext)	The coding agent
`gh auth` with `codespace` scope	API access to codespaces

The total infrastructure cost: a cheap VPS for the API proxy, plus GitHub's free codespace hours. The agent runs on your Claude subscription — no API token billing.

Final Thoughts

Stripe's Minions paper showed that autonomous coding agents are production-ready. But you don't need Stripe's infrastructure to build one. A GitHub Codespace, Claude Code, and a simple proxy API get you surprisingly far.

The key insight from both Stripe's approach and ours: the agent needs a real development environment, not a sandbox. It needs to run builds, execute tests, read documentation, and use real tools. Codespaces give you that for free.

The next frontier is orchestration — defining repeatable workflows that mix scripted steps with agent creativity. Stripe calls them blueprints. Goose calls them recipes. Whatever you call them, they're the bridge between "AI that writes code" and "AI that ships code."

Built and deployed using Claude Code on a GitHub Codespace — the same system described in this article.

Next Article: Ditch the Laptop: A Full Dev Station from Your Phone with DeX, Codespaces, and Claude Code

Need help with your project or have questions?

We specialize in AI automation, custom integrations, and intelligent workflows tailored to your business needs.

Whether you need help deploying, building, implementing, or creating a solution - or just want expert guidance on your project - we're here to help.