Skip to main content

Building Autonomous Coding Agents with GitHub Codespaces and Claude Code

Stripe recently published how they built Minions — autonomous coding agents that execute end-to-end engineering tasks without human intervention. The key ingredients: isolated environments, a coding agent (they forked Block's Goose), and an orchestration layer that mixes deterministic steps with agentic creativity.

We built something similar using GitHub Codespaces and Claude Code. Here's the architecture — and it's simpler than you'd think.

The Idea

The goal is straightforward: send an API request with a task description, have an AI agent execute it in a full development environment, and get the results back.

No browser UI. No manual setup. Just an HTTP endpoint that turns a prompt into executed code.

sequenceDiagram
    participant Client as API Client
    participant VPS as blle-api (VPS)
    participant GH as GitHub CLI
    participant CS as Codespace
    participant Claude as Claude Code

    Client->>VPS: POST /codespace/execute
    VPS->>GH: gh codespace ssh
    GH->>CS: SSH tunnel
    CS->>Claude: claude -p "task..."
    Claude->>Claude: Read, Edit, Bash, etc.
    Claude-->>CS: Result
    CS-->>VPS: stdout piped back
    VPS-->>Client: JSON or SSE stream

Why Codespaces?

Stripe uses internal "devboxes" — pre-warmed EC2 instances loaded with their codebase. GitHub Codespaces gives you the same thing for free (120 core-hours/month):

  • Pre-loaded with your code — it's tied to your repo
  • Full Linux environment — install anything, run any command
  • Isolated from production — no risk of breaking live systems
  • Auto-sleep and auto-wake — stops when idle, wakes on SSH
  • Claude Code pre-installed — via the VS Code extension

The codespace is the sandbox. The agent gets full permissions inside it but can't touch anything outside.

The Proxy Architecture

The API server (a Fastify app on a VPS) doesn't run Claude Code itself. It proxies requests to the codespace via gh codespace ssh:

function spawnClaude(args: string[], cwd: string | undefined, codespaceName: string) {
  const claudeCmd = args.map(a => `'${a.replace(/'/g, "'\\\''")}'`).join(" ");
  const findClaude = 'CLAUDE_BIN=$(ls -t /home/vscode/.vscode-remote/extensions/' +
    'anthropic.claude-code-*/resources/native-binary/claude 2>/dev/null | head -1); ' +
    'if [ -z "$CLAUDE_BIN" ]; then CLAUDE_BIN=claude; fi';
  const remoteCmd = cwd
    ? `${findClaude} && cd '${cwd}' && "$CLAUDE_BIN" ${claudeCmd}`
    : `${findClaude} && "$CLAUDE_BIN" ${claudeCmd}`;

  return spawn(GH_BIN, ["codespace", "ssh", "-c", codespaceName, "--", remoteCmd], {
    env: { ...process.env },
    stdio: ["ignore", "pipe", "pipe"],
  });
}

A few details worth noting:

  • Claude binary discovery — Claude Code is installed as a VS Code extension, not globally. The binary path includes the version number, so we glob for the latest one.
  • Auto-wake — If the codespace is sleeping, gh codespace ssh wakes it up automatically. No need to check state first.
  • --dangerously-skip-permissions — The agent runs with full tool access. No confirmation prompts. This is the "autonomous" part.

The API

A single endpoint does everything:

POST /codespace/execute

Request Body

Field Default Description
prompt (required) The task for Claude to execute
stream false SSE streaming vs batch JSON
maxTurns 10 Agent iteration limit
resume "resume" "resume" (by sessionId), "continue" (most recent), "new"
sessionId Session ID from a previous response
model Model override (sonnet, opus, haiku)
systemPrompt Custom system prompt
allowedTools all Tool allowlist
disallowedTools none Tool denylist
agents Custom subagent definitions

Non-Streaming Response

{
  "id": "execution-uuid",
  "sessionId": "claude-session-uuid",
  "exitCode": 0,
  "result": [{
    "type": "result",
    "result": "Hello world.",
    "duration_ms": 2144,
    "total_cost_usd": 0.055
  }],
  "durationMs": 10879
}

The sessionId is the key to multi-turn conversations. Send it back in the next request to resume the same conversation — the agent remembers everything from previous turns.

Streaming

For long-running tasks, SSE streaming lets you see what the agent is doing in real-time:

curl -N -X POST https://api.example.com/codespace/execute \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"prompt": "Fix the failing tests", "stream": true, "resume": "new"}'

Events arrive in order:

  1. execution_start — Execution ID assigned
  2. system (init) — Session info, available tools, connected MCP servers
  3. assistant — Claude's responses and tool calls as they happen
  4. result — Final summary with cost, duration, token usage
  5. execution_complete — Exit code and total duration

Two important implementation details for streaming:

  • Claude CLI requires --verbose when using --output-format stream-json
  • The CLAUDECODE environment variable must be cleared to allow spawning Claude from within another Claude session

Session Resume

This is where it gets powerful. Every response includes a sessionId. Send it back to continue the conversation:

# First call: start a new session
curl -X POST .../codespace/execute \
  -d '{"prompt": "Remember the number 42. Just say OK.", "resume": "new"}'
# Response includes: sessionId = "93394d3e-..."

# Second call: resume that session
curl -X POST .../codespace/execute \
  -d '{"prompt": "What number did I ask you to remember?", "sessionId": "93394d3e-..."}'
# Response: "42."

Three resume modes:

  • resume (default) — Resume a specific session by ID
  • continue — Resume the most recent session in the working directory
  • new — Start fresh

Custom Subagents

Claude Code has native support for subagents — specialized AI workers that the main agent can delegate to. We expose this through the agents field:

{
  "prompt": "Review the auth module and fix any security issues",
  "agents": {
    "security-reviewer": {
      "description": "Reviews code for security vulnerabilities. Use proactively.",
      "prompt": "You are a security expert. Focus on OWASP top 10.",
      "tools": ["Read", "Grep", "Glob"],
      "model": "haiku"
    },
    "fixer": {
      "description": "Fixes code issues found by reviewers.",
      "prompt": "Apply minimal, targeted fixes.",
      "tools": ["Read", "Edit", "Write", "Bash"],
      "model": "sonnet"
    }
  }
}

This maps directly to Claude Code's --agents CLI flag. Each subagent gets:

  • Its own context window — exploration doesn't pollute the main conversation
  • Its own model — route cheap tasks to Haiku, complex ones to Opus
  • Scoped tools — a reviewer only gets read access, a fixer gets write access
  • Custom system prompt — specialized behavior per task type
flowchart TD
    A[API Request] --> B[Main Agent - Opus]
    B --> C{Delegate?}
    C -->|Security review| D[security-reviewer - Haiku]
    C -->|Fix issues| E[fixer - Sonnet]
    C -->|Direct work| F[Main agent handles it]
    D -->|Findings| B
    E -->|Changes made| B
    B --> G[Return result]

    style B fill:#e3f2fd
    style D fill:#fff3e0
    style E fill:#c8e6c9

When subagents are used, the modelUsage in the response shows separate token counts and costs per model — confirming which subagents handled which parts of the task.

Making It Available via Dynamic MCP

The final piece: making the codespace agent available to any AI system. By wrapping the /codespace/execute endpoint in a Dynamic MCP tool, any connected agent — Claude in the browser, a Slack bot, an n8n workflow — can invoke a full coding agent on a GitHub Codespace with a single tool call.

The tool accepts all the same parameters (prompt, sessionId, agents, etc.), calls the API, and returns a clean summary with the result, session ID, cost, and model used.

How This Compares to Stripe's Minions

Feature Stripe Minions Our Setup
Execution environment Internal devboxes (EC2) GitHub Codespaces
Coding agent Goose (forked) Claude Code
Orchestration Custom blueprints API + CLI flags
Tool management Toolshed (400+ MCP tools) Dynamic MCP (DB-backed)
Subagents Custom implementation Native Claude Code subagents
CI feedback loop Built-in (max 2 rounds) Not yet
PR generation Automatic Not yet
Cost Significant infrastructure Free tier (120 hrs/mo)

The biggest difference: Stripe built custom orchestration ("blueprints") that interleave deterministic steps with agentic ones. We're using Claude Code's native capabilities instead. This trades some control for dramatically less code.

What's Next

Inspired by Stripe's architecture and Block's Goose codebase:

Blueprints — Define multi-step workflows: lint → implement → test → fix CI → push. Deterministic nodes run shell commands, agentic nodes invoke Claude.

Context hydration — Before the agent starts, automatically gather relevant context (recent commits, open issues, project docs) and inject it into the system prompt.

CI feedback loops — Run tests after changes, feed failures back to the agent. Cap at 2 rounds like Stripe does.

Automatic PR creation — After the agent finishes, create a branch and open a PR with a structured description.

The Setup

To replicate this:

Component Purpose
GitHub Codespace Isolated dev environment with Claude Code
Fastify API server Proxy endpoint on VPS
GitHub CLI (gh) SSH tunnel to codespace
Claude Code (VS Code ext) The coding agent
gh auth with codespace scope API access to codespaces

The total infrastructure cost: a cheap VPS for the API proxy, plus GitHub's free codespace hours. The agent runs on your Claude subscription — no API token billing.

Final Thoughts

Stripe's Minions paper showed that autonomous coding agents are production-ready. But you don't need Stripe's infrastructure to build one. A GitHub Codespace, Claude Code, and a simple proxy API get you surprisingly far.

The key insight from both Stripe's approach and ours: the agent needs a real development environment, not a sandbox. It needs to run builds, execute tests, read documentation, and use real tools. Codespaces give you that for free.

The next frontier is orchestration — defining repeatable workflows that mix scripted steps with agent creativity. Stripe calls them blueprints. Goose calls them recipes. Whatever you call them, they're the bridge between "AI that writes code" and "AI that ships code."


Built and deployed using Claude Code on a GitHub Codespace — the same system described in this article.

Need help with your project or have questions?

We specialize in AI automation, custom integrations, and intelligent workflows tailored to your business needs.

Whether you need help deploying, building, implementing, or creating a solution - or just want expert guidance on your project - we're here to help.

Contact us today to discuss your project.