Skip to main content

Carpenter

Measure twice, cut once.

A pure-Python AI agent platform where every action is reviewable code. Agents observe freely, but can only act through audited Python — inspected before it runs, not after.

Read the Docs Source Code


Measure Twice, Cut Once
#

AI agents are unreliable. Models make mistakes and prompt injections can hijack their behavior — so you can never fully trust what an autonomous agent will do. Most mitigations are probabilistic. Carpenter’s are structural.

Every action is reviewable Python code, inspected before it runs. A sanitization step strips data from the code before the AI reviewer sees it. When the execution space is bounded, Carpenter proves safety by exploring every path. For high-stakes actions, the pipeline scales to multiple independent reviewers, a judge, and separation of powers.

The result is a hard, auditable boundary between intent and action. Read more about our research foundations →


Three Pillars
#

Observe Freely
#

Agents have unrestricted read access — files, state, arc trees, knowledge base, skills. No action is needed to look around. This gives the agent full situational awareness without any security risk. Learn more →

Act Carefully
#

Every side effect — file writes, API calls, git operations, state mutations — goes through submit_code. The code is hashed, parsed, sanitized, and reviewed by a separate AI before execution. Learn more →

Learn Continuously
#

A compression chain turns raw activity into durable knowledge: daily notes, weekly patterns, monthly insights. Skills crystallize learned patterns. Conversation summaries bridge context across sessions. Learn more →


Capabilities
#

Arcs: The Work Tree — One abstraction for tasks, projects, cron jobs, and sub-steps. A recursive tree with a state machine, escalation policies, and iterative planning.

Security Model — Six-stage code review pipeline with sanitization that strips string literals before the reviewer sees them. Network egress denied by default. Credentials never leave the platform process.

Trust & Taint — Arc-level taint zones, a two-LLM firewall for untrusted data, separation-of-powers verification, and encryption at rest for tainted output.

Skills & Memory — Three-stage progressive disclosure for skills. Reflective self-improvement via cadenced cron. Full-text search across conversation history.

Multi-Provider AI — YAML model registry with cost tiers, role-based routing, per-step minimum tier enforcement, circuit breakers, and model escalation.

Platform Extensibility — Core logic separated from platform-specific code via dependency injection. Linux, Android, Windows, macOS — each a thin package that registers executors, sandboxes, and tools.