Sandbox the author, Flag the release: Governing OpenAI Codex with Unleash

Alex Casalboni

Developer Advocate

June 24, 2026

OpenAI Codex does not wait for you to type. It reads your repository, edits files, runs your tests, and can finish a task on its own and hand you back a pull request. That is a real shift from autocomplete. The agent is the author now, and you are closer to a reviewer than a typist.

This is useful, and it changes where the risk sits. When a person writes a change, they build up a mental model of what it does and why. When an agent writes the change, that model lives in the prompt and the diff, not in a teammate’s head. The code can still be correct. The gap is in understanding, and it shows up at two different moments: while the code is being written, and after it ships.

Codex addresses the first moment well. It runs inside an operating-system sandbox and asks for approval before it acts. What it does not address, because no coding tool can, is the second moment. Once a change is merged and deployed, the question is no longer “should the agent be allowed to run this command” but “what happens to real users if this code misbehaves.” That second question is what FeatureOps answers.

So there are two boundaries worth thinking about. One protects your machine and your repository while the agent works. The other protects your users and your uptime after the agent’s code goes live. Codex gives you the first. Unleash gives you the second. They are stronger together than either is alone.

The authoring boundary: what Codex already controls

Codex ships with a safety model that most engineering leaders will appreciate. Every command and file edit runs inside a sandbox built on the operating system itself, Apple Seatbelt on macOS and bubblewrap on Linux. In the default mode, the agent can read and write inside your working directory, and network access is off unless you turn it on. You choose how much freedom to grant, from read-only, to write-in-workspace, to full access for trusted environments.

On top of the sandbox, Codex asks before it acts. When the agent wants to run a tool, it shows you what the tool is and what it is about to do, then waits. With options to allow once, allow for the session, always allow, or cancel. For a read-only lookup that is a small courtesy. For a write operation, like creating a flag or enabling one in production via the Unleash MCP server, it is a genuine control.

This is the part people miss when they worry that AI agents will “YOLO” changes into production. They do not. The agent’s actions are gated, and a human is in the loop by default. The understanding gap is real, but it is a comprehension problem at authoring time, not a process bypass. Normal pull request review still applies on top of everything Codex does locally.

Why the authoring boundary is not enough

Here is the catch. The sandbox and the approval prompt protect the act of writing code. They have no opinion about what that code does next week, at three in the morning, when it is serving real traffic.

Think about what gets approved. A developer asks Codex to add a new payment provider. The agent writes the integration, the developer reviews the diff, approves the file edits, and merges a clean pull request. Every control did its job. Then the new provider has a bad afternoon, starts timing out, and there is no quick way to turn it off without a redeploy. The code was written safely. But it was shipped without a runtime switch.

This is not hypothetical. In June 2025 a missing feature flag caused a global Google Cloud outage that lasted more than three hours. Google’s own postmortem was blunt: “if the change had been protected by a feature flag, recovery would have taken seconds instead of hours”. Google has world-class engineers. The change still shipped without a runtime boundary, and that was the difference between a blip and an incident.

The broader trend points the same way. The DORA State of AI-Assisted Software Development report found that delivery stability tends to dip as AI usage goes up. More code, written faster, lands in production. The tooling that keeps releases stable has not kept pace with the tooling that writes the code. Feature flags are how you close that gap, and they matter more, not less, when an agent is doing the writing.

The runtime boundary: feature flags as the second control

A feature flag is a runtime switch around a piece of code. The code can merge and deploy while the flag is off. You turn it on for internal users, then a small percentage of traffic or a specific segment or tenant, then everyone, watching as you go. If something looks wrong, you turn it off. No redeploy, no rollback branch, no incident bridge at midnight.

For AI-written code this is exactly the control you want. The change can ship the moment it is ready, because shipping and releasing are now separate decisions. The agent and its reviewer get speed. The operations team gets a switch. The blast radius of any single change is something you decide, not something you discover.

We built the Unleash MCP server so that Codex can manage these flags as part of writing the code, not as a separate chore afterward. With the Unleash server connected, the agent can look at a change and decide whether it needs a flag, check whether a suitable flag already exists so it does not create a duplicate, create one with a name and type that follow your conventions, and wrap the new code behind it. It can also list and audit your existing flags and help clean them up once a feature is fully rolled out.

The result is that the runtime boundary gets built at the same time as the code. When Codex finishes a risky change, the change is already wrapped in a flag and shipped disabled. You decide when it turns on.

What it looks like in practice

A developer opens Codex in their editor and asks it to add support for a new shipping provider. The agent recognizes this as the kind of external integration that should be guarded, so it checks with Unleash, sees there is no existing flag for it, and proposes creating one. Codex shows the developer the proposed flag and its name and waits for approval. The developer approves. The agent writes the integration and wraps the call to the new provider behind the flag, leaving the existing provider as the fallback path.


if (unleash.isEnabled("shipping-acme-provider", context)) {
  return acmeShipping.getRates(order);
} else {
  return currentShipping.getRates(order);
}

The pull request merges with the flag off. The team enables it in staging, tries it, then turns it on for five percent of production traffic. The rates look right and the error rate stays flat, so they increase it over a few days to everyone. A week later the old provider’s code path is dead, and the developer asks Codex to clean it up and remove the flag from the code base, and then archive the flag on Unleash.

Two boundaries did their jobs. Codex’s sandbox and approval prompt governed the agent while it wrote and changed files. The Unleash flag governed the integration after it shipped. Neither one alone would have given the team both safe authoring and safe release.

The case that makes it obvious: code you did not watch

Codex can also work asynchronously. You hand it a task, it runs in its own isolated cloud environment, and it comes back with a pull request while you do something else. Several of these can run at once.

When you delegate a task and review the resulting pull request, you did not watch the code get written line by line. You are reviewing an outcome, not a process. That is fine, as long as the outcome can ship behind a switch. A flag turns an unsupervised pull request into a contained one. It merges as disabled, and you control the rollout exactly as you would for code a teammate wrote. The more work you delegate, the more the runtime boundary earns its place.

Enforcing both boundaries across the organization

For a single developer, both boundaries are a matter of good habits. For an organization, they should be policy, and both Codex and Unleash let you make them policy.

On the Codex side, administrators can enforce configuration centrally across the CLI, the editor extension, and the cloud. You can require a particular sandbox mode, keep approval prompts on, and define which external tools the agent is allowed to use, so the Unleash server is permitted and unknown ones are not. Individual developers cannot quietly opt out.

On the Unleash side, governance lives in the platform. The agent only ever has the permissions of the access token it was given, so it cannot exceed what you allow. You can require change requests so that turning a flag on in production needs human approval, regardless of what any agent asks for. Flag evaluation happens inside your own applications, so no user data is sent to Unleash to make a decision.

These two layers line up neatly. Codex governs how the agent runs. Unleash governs how the release ships. One constrains the author, the other constrains the rollout, and you can enforce both from the top down rather than hoping every developer remembers.

The full release picture

Creating a flag is the start. Unleash provides the rest of the release lifecycle. Rollout strategies let you target specific users, regions, or percentages, so you can move from internal users to a small cohort to everyone on a schedule you control. Impact Metrics pull production signals like error rates and latency straight from your application and tie them to the flag, so you can see how a feature is actually behaving and pause a rollout automatically when something drifts.

Getting started

The integration is open source on both sides. Codex’s command-line tool is open source, and Unleash’s open-source core gives you feature flag infrastructure at no cost. Connecting the Unleash MCP server to Codex takes a few lines of configuration and a personal access token, and the same setup works in the Codex CLI and the editor extension.

The Unleash MCP server also works with Claude Code, Cursor, GitHub Copilot, Kiro, and OpenCode, so the runtime boundary stays consistent no matter which assistant your developers prefer.

Two boundaries, one safe path to production

AI agents have made writing code dramatically faster. They have not, on their own, made shipping it safer. Codex closes part of that gap with a sandbox and an approval prompt that keep a human in control while the agent works. Feature flags close the rest, by giving you a switch on every change after it ships.

Sandbox the author. Flag the release. Codex governs the agent, Unleash governs the rollout, and you can enforce both across the whole organization. The agent writes the code, a human approves the actions, and the release goes out behind a flag you control. That is what safe by default looks like when the author is a machine.

Share this article