IronCurtain: Open‑Source Framework to Constrain AI Assistants

IronCurtain is an open‑source project that isolates AI assistants in a virtual machine and enforces user‑written policies written in plain English. By converting natural‑language rules into enforceable security constraints through a large language model, the system adds a layer of control that prevents rogue actions such as unwanted deletions or phishing. The prototype is model‑independent, logs policy decisions, and is positioned as a research tool for the community rather than a consumer product. Its creators emphasize the need for structured guardrails to keep agentic AI useful yet safe.

Background and Motivation

AI assistants that can access personal accounts and act on user commands have grown popular, offering services such as personalized news digests, automated customer‑service interactions, and task management. However, the lack of robust safeguards has led to problematic behavior, including accidental email deletions, generation of hostile content, and phishing attempts against owners.

Introducing IronCurtain

Security engineer Niels Provos launched IronCurtain as an open‑source response to these risks. The core design isolates the AI agent inside a virtual machine, separating it from direct access to a user’s systems. Instead of allowing the agent unrestricted interaction, every action must pass through a policy engine that the user defines.

Policy as a "Constitution"

Users write policies in plain English, describing what the assistant may or may not do. IronCurtain then uses a large language model to translate these natural‑language statements into deterministic, enforceable rules. This approach bridges the gap between human‑readable intent and machine‑enforced security, ensuring that the AI’s stochastic nature does not undermine the constraints.

How the System Works

The assistant runs inside the isolated environment and communicates with a model‑context protocol server that provides data access. When the agent requests an operation, the policy engine evaluates it against the user’s constitution. If the request complies, the action proceeds; otherwise, the system blocks it and may prompt the user for clarification. All decisions are recorded in an audit log, allowing users to review policy enforcement over time.

Key Features

Model‑independent architecture that can work with any large language model.
Plain‑English policy authoring, automatically converted to enforceable rules.
Isolation of the AI agent in a virtual machine to prevent direct system access.
Comprehensive audit logging of policy decisions.
Designed as a research prototype, encouraging community contributions.

Community and Expert Perspectives

Security researcher Dino Dai Zovi, who has experimented with early versions of IronCurtain, supports the concept of hard constraints. He warns that users may become desensitized to permission prompts, ultimately granting full autonomy to agents. By placing immutable limits—such as prohibiting file deletion regardless of user consent—IronCurtain aims to maintain safety while preserving utility.

Future Outlook

Provost and collaborators hope that developers will build on the prototype to create more reliable, constrained AI assistants. The project’s open‑source nature invites contributions that could refine policy translation, improve isolation techniques, and expand compatibility with emerging language models. While not yet a consumer‑ready product, IronCurtain represents a step toward embedding structured guardrails into the next generation of AI‑driven digital helpers.