Security Model

Table of Contents

Threat Model
#

Carpenter’s threat model is prompt injection, not adversarial users. The danger is untrusted data — web content, webhooks, API responses — manipulating the AI into generating harmful code.

This reframing produces a distinctive architecture: code is sanitized before the reviewer sees it, credentials never leave the platform process, and every state change is persisted as a file on disk.

The Review Pipeline
#

When an agent calls submit_code, the code passes through six stages before execution:

Stage	What Happens
1. Hash check	SHA-256 against approval cache. Previously approved identical code skips review.
2. Import check	`import *` is unconditionally rejected — no retry.
3. AST parse	Reject syntax errors before spending API tokens.
4. Injection scan	Regex detection of suspicious patterns. Advisory flags for the reviewer.
5. Histogram analysis	A cheap model examines per-source word frequency histograms (comments, strings, docstrings separately) for injection patterns. Advisory, non-blocking.
6. Sanitize + Review	The distinctive step (see below).

Sanitization: The Core Defense
#

Before the reviewer AI sees the code, it is sanitized:

String and bytes literals → placeholders (S1, S2, …)
F-strings → single placeholders
User-defined variables → sequential identifiers (a, b, c, …)
Comments and docstrings → stripped
Imported names and builtins → preserved

The reviewer sees structure and intent without payload content. An attacker can embed instructions in string literals, but the reviewer never sees them — it judges the code’s shape, not its data.

Review Outcomes
#

Verdict	Meaning
APPROVE	Execute the code
REWORK	Fixable issues — agent retries (up to 3 times)
MAJOR	Security concern — requires human decision
REJECTED	Policy violation — no retry

Adversarial Review Mode
#

An optional mode where the reviewer must find issues. Zero findings means insufficient review, not clean code. Three-pass escalation: standard review → escalated scrutiny → model escalation or human handoff.

Network Egress Control
#

Executors run with default-deny network egress. Only the platform callback endpoint is reachable. All web access goes through act/web.py via the callback — the platform process makes the actual outbound request; the executor never touches the external network directly.

This is enforced at the network level (Docker isolation, network namespaces, or iptables rules), not just by convention.

Credential Isolation
#

Credentials never enter the executor environment. The tool system is an RPC client — when executor code calls a tool that needs credentials, the request goes to the platform process, which holds all secrets, rate limits, and audit logs.

Tool tier	Mechanism	Credential exposure
Callback	HTTP POST to platform	None
Direct	Pure Python in executor	None needed
Environment	Credential injection	Explicitly configured per-tool

Execution Sessions
#

Every code execution gets a platform-generated UUID session, created before the subprocess launches. The executor sends this session ID in callback requests; the platform validates that the session exists, is reviewed, and hasn’t expired. A compromised executor cannot spoof review status — session IDs are opaque tokens whose meaning is determined by the platform’s database.

Threat Model#

The Review Pipeline#

Sanitization: The Core Defense#

Review Outcomes#

Adversarial Review Mode#

Network Egress Control#

Credential Isolation#

Execution Sessions#