Threat Model#
Carpenter’s threat model is prompt injection, not adversarial users. The danger is untrusted data — web content, webhooks, API responses — manipulating the AI into generating harmful code.
This reframing produces a distinctive architecture: code is sanitized before the reviewer sees it, credentials never leave the platform process, and every state change is persisted as a file on disk.
The Review Pipeline#
When an agent calls submit_code, the code passes through six stages before execution:
| Stage | What Happens |
|---|---|
| 1. Hash check | SHA-256 against approval cache. Previously approved identical code skips review. |
| 2. Import check | import * is unconditionally rejected — no retry. |
| 3. AST parse | Reject syntax errors before spending API tokens. |
| 4. Injection scan | Regex detection of suspicious patterns. Advisory flags for the reviewer. |
| 5. Histogram analysis | A cheap model examines per-source word frequency histograms (comments, strings, docstrings separately) for injection patterns. Advisory, non-blocking. |
| 6. Sanitize + Review | The distinctive step (see below). |
Sanitization: The Core Defense#
Before the reviewer AI sees the code, it is sanitized:
- String and bytes literals → placeholders (
S1,S2, …) - F-strings → single placeholders
- User-defined variables → sequential identifiers (
a,b,c, …) - Comments and docstrings → stripped
- Imported names and builtins → preserved
The reviewer sees structure and intent without payload content. An attacker can embed instructions in string literals, but the reviewer never sees them — it judges the code’s shape, not its data.
Review Outcomes#
| Verdict | Meaning |
|---|---|
| APPROVE | Execute the code |
| REWORK | Fixable issues — agent retries (up to 3 times) |
| MAJOR | Security concern — requires human decision |
| REJECTED | Policy violation — no retry |
Adversarial Review Mode#
An optional mode where the reviewer must find issues. Zero findings means insufficient review, not clean code. Three-pass escalation: standard review → escalated scrutiny → model escalation or human handoff.
Network Egress Control#
Executors run with default-deny network egress. Only the platform callback endpoint is reachable. All web access goes through act/web.py via the callback — the platform process makes the actual outbound request; the executor never touches the external network directly.
This is enforced at the network level (Docker isolation, network namespaces, or iptables rules), not just by convention.
Credential Isolation#
Credentials never enter the executor environment. The tool system is an RPC client — when executor code calls a tool that needs credentials, the request goes to the platform process, which holds all secrets, rate limits, and audit logs.
| Tool tier | Mechanism | Credential exposure |
|---|---|---|
| Callback | HTTP POST to platform | None |
| Direct | Pure Python in executor | None needed |
| Environment | Credential injection | Explicitly configured per-tool |
Execution Sessions#
Every code execution gets a platform-generated UUID session, created before the subprocess launches. The executor sends this session ID in callback requests; the platform validates that the session exists, is reviewed, and hasn’t expired. A compromised executor cannot spoof review status — session IDs are opaque tokens whose meaning is determined by the platform’s database.