Skip to main content
  1. Documentation/

Security Model

3 mins
Table of Contents

Threat Model
#

Carpenter’s threat model is prompt injection, not adversarial users. The danger is untrusted data — web content, webhooks, API responses — manipulating the AI into generating harmful code.

This reframing produces a distinctive architecture: code is sanitized before the reviewer sees it, credentials never leave the platform process, and every state change is persisted as a file on disk.

The Review Pipeline
#

When an agent calls submit_code, the code passes through six stages before execution:

StageWhat Happens
1. Hash checkSHA-256 against approval cache. Previously approved identical code skips review.
2. Import checkimport * is unconditionally rejected — no retry.
3. AST parseReject syntax errors before spending API tokens.
4. Injection scanRegex detection of suspicious patterns. Advisory flags for the reviewer.
5. Histogram analysisA cheap model examines per-source word frequency histograms (comments, strings, docstrings separately) for injection patterns. Advisory, non-blocking.
6. Sanitize + ReviewThe distinctive step (see below).

Sanitization: The Core Defense
#

Before the reviewer AI sees the code, it is sanitized:

  • String and bytes literals → placeholders (S1, S2, …)
  • F-strings → single placeholders
  • User-defined variables → sequential identifiers (a, b, c, …)
  • Comments and docstrings → stripped
  • Imported names and builtins → preserved

The reviewer sees structure and intent without payload content. An attacker can embed instructions in string literals, but the reviewer never sees them — it judges the code’s shape, not its data.

Review Outcomes
#

VerdictMeaning
APPROVEExecute the code
REWORKFixable issues — agent retries (up to 3 times)
MAJORSecurity concern — requires human decision
REJECTEDPolicy violation — no retry

Adversarial Review Mode
#

An optional mode where the reviewer must find issues. Zero findings means insufficient review, not clean code. Three-pass escalation: standard review → escalated scrutiny → model escalation or human handoff.

Network Egress Control
#

Executors run with default-deny network egress. Only the platform callback endpoint is reachable. All web access goes through act/web.py via the callback — the platform process makes the actual outbound request; the executor never touches the external network directly.

This is enforced at the network level (Docker isolation, network namespaces, or iptables rules), not just by convention.

Credential Isolation
#

Credentials never enter the executor environment. The tool system is an RPC client — when executor code calls a tool that needs credentials, the request goes to the platform process, which holds all secrets, rate limits, and audit logs.

Tool tierMechanismCredential exposure
CallbackHTTP POST to platformNone
DirectPure Python in executorNone needed
EnvironmentCredential injectionExplicitly configured per-tool

Execution Sessions
#

Every code execution gets a platform-generated UUID session, created before the subprocess launches. The executor sends this session ID in callback requests; the platform validates that the session exists, is reviewed, and hasn’t expired. A compromised executor cannot spoof review status — session IDs are opaque tokens whose meaning is determined by the platform’s database.