Reserve your spot at FeatureOps Summit (June 23)

Events

FeatureOps Summit 2026 is the definitive, virtual gathering for developers, engineers, architects, and product leaders who are closing the gap between engineering velocity and business impact.

Everything you can’t do with environment variables (and what you actually should)

Wojtek Gawroński

Developer Advocate

June 23, 2026

Let me be clear upfront: environment variables are great and they have their place. This is not a hit piece.

The Twelve-Factor App got this right in 2011. Environment variables are the universal interface for process-level configuration. Every language reads them. Every container runtime injects them. Every CI/CD pipeline sets them. Every orchestrator manages them. Every operating system has them. They’re the closest thing we have to a truly cross-platform, cross-language, cross-framework configuration primitive.

You have used environment variables for so many things:

Database connection strings: Different per environment, sensitive, known at deploy time.
API keys and secrets: Hopefully, through a secrets manager, and not hardcoded in a .env file committed to source control.
Service URLs and ports: Where is the auth service? What port does the cache listen on?
Environment detection: think NODE_ENV=production or similar. Simple. Effective. Universal.
Resource allocation hints: Worker count, pool size, memory limits, execution environment-specific settings (e.g., JVM settings) – values that are set once per deployment and don’t change during the process lifetime.

These are process-lifetime, environment-specific, context-free values. They’re Layer 2 decisions (set at deploy time, same for every request) and some are even Layer 1 (baked into the build). Environment variables handle them perfectly.

The problem starts when we ask env vars to do something they were never designed for. And some of you already saw that (or experienced already) in the examples listed above.

Where environment variables break down

No per-request context

This is the fundamental limitation. An environment variable is the same value for every request hitting that process. It can’t know who the user is, which tenant they belong to, what region they’re in, or what plan they’re on.


# This is the same for every request, every user, forever
FEATURE_NEW_CHECKOUT=true

Want to enable the new checkout for 5% of users? Want to roll it out to enterprise customers first? Or show it to internal testers but not production users? You can’t. The variable is true or false, and it’s the same true or false for every request.

The workaround teams build: a proxy layer that reads a database or config service, resolves targeting rules, and… congratulations, you’ve built a feature flag system.

Changes require restart

Modifying an environment variable on a running process requires at minimum a restart, and with container orchestrators like Kubernetes, typically a rolling update or pod recreation. That means:


# Changing this value in production:
# 1. Update the ConfigMap
# 2. Trigger a rolling restart (or wait for reconciliation)
# 3. Wait for new pods to be ready
# 4. Old pods drain connections
# Total: minutes, at best
env:
  - name: ENABLE_CACHE_V2
    valueFrom:
      configMapKeyRef:
        name: feature-config
        key: ENABLE_CACHE_V2

During an incident, minutes at best are an eternity. If the value you need to change is a kill switch for a failing dependency, and the fastest you can change it is a rolling restart across your fleet – that’s not a kill switch. That’s a slow, polite request for the problem to eventually go away.

Reversibility is a first-class property of runtime control. An environment variable that requires a rolling restart to change doesn’t provide it.

Compared to a runtime flag evaluation: where the SDK polls or receives a push update, the flag changes in (milli)seconds, the next request sees the new value. No restart, no rolling update, no draining required.

No coordination across processes

Each process reads its own copy of the environment at startup. There’s no mechanism for coordinating a change across all instances simultaneously. If you update a ConfigMap, pods pick up the change at different times depending on restart order, readiness probe timing, and rolling update strategy.

Again, for something like a kill switch – where you need every instance to stop calling a failing dependency now – this uncoordinated rollout means you have stragglers making bad calls for the duration of the update window.

No audit trail

Who changed ENABLE_DARK_MODE from false to true in production at 3 AM? If it’s in a ConfigMap managed by GitOps, you might have a commit hash. If someone ran kubectl edit configmap during an incident, you might have nothing. The bottomline is: it depends on your tooling.

Additionally, environment variables have no native concept of change tracking, approval workflows, or audit logs. They were not designed to do that in the first place. The value is just there. It was changed. By someone. Probably.

You can argue that GitOps is the answer, but I would counterargue that it only covers half the story. A commit hash tells you what was deployed and when. It doesn’t tell you what was activated, for whom, or why. That gap – between deployment audit and runtime audit – is where incidents go untraced and compliance auditors start asking uncomfortable questions.

No gradual rollout

It’s all or nothing. When you flip FEATURE_NEW_PRICING=true, every user on every instance of your application sees it simultaneously. There’s no feedback loop like enable for 1%, watch the metrics, ramp to 10%, then 50%, then 100%. The only way to approximate gradual rollout with environment variables is to deploy different values to different applications – which means maintaining parallel deployments, separate routing, and a coordination overhead that grows linearly with the number of variants you want to test.

No targeting

You can’t say: enable this for tenants in the EU or show this to users on the enterprise plan with an environment variable. The variable doesn’t know about tenants, plans, or regions at the process level. Those are contextual concepts, bound at a request-level.

The Layer 1.5 trap

If you look at this list of gaps, you’ll notice they all point in the same direction: environment variables are a deploy-time mechanism being asked to make request-time decisions.

They sit between Layer 1 (baked into the artifact) and Layer 2 (configured at deployment). They’re more flexible than a hardcoded constant, but they’re still fundamentally static from the perspective of a running process. I would call this Layer 1.5 and it’s the exact spot where most teams stuck.

The progression usually looks like this:

Start with environment variables. Simple, works, no dependencies.
Need to change something without restarting. Build a config service that reads from a database. Application polls it.
Need to know who the user is. Add context passing. Application resolves the value based on user attributes.
Need an audit trail. Add logging around config changes.
Need gradual rollout. Add percentage-based logic.
Need targeting. Add rule evaluation.

At step 2, you stopped using environment variables and you’ve started building a feature flag system. But you might not realize it yet, because it still feels like configuration.

This is how most homegrown feature flag systems are born. A database table replaces the .env file. A REST endpoint replaces the config reload. An if-statement reads the value. It’s usually Layer 1.5 dressed up as Layer 3. It works, until you need the things that actually make Layer 3 valuable: per-request context, consistent targeting, lifecycle management, edge evaluation, and governance.

At that point you’ve built a distributed system with caching, consistency requirements, and a multi-language client to call that service – competing with feature delivery for the same engineers’ time.

I think the problems and pains are visceral enough, so I want to pause here. We’ll explore this homegrown maturity trap in depth later in the series.

The Boundary

Here’s the rule I like to use:

Environment variables are for process-lifetime,
environment-specific, context-free values.

If the value:

Is the same for every request → environment variable is fine
Changes less often than you deploy → environment variable is fine
Doesn’t need to know about request-level attributes → environment variable is fine
No operational urgency for change (e.g., kill switch) → environment variable is fine

If the value:

Needs to change without a deployment → you need runtime control
Needs to differ contextually per-request → you need runtime control
Steers gradual rollout or any other flexible exposure → you need runtime control
Needs an audit trail of who changed what and when → you need runtime control
Is a kill switch for an incident response → you need runtime control

The second list is Layer 3 territory. That’s where runtime control belongs, with mechanisms such as feature flags, contextual evaluation, and externalized decision-making.

What this means practically

I’m not suggesting you rip out your environment variables. That would be absurd. They do exactly what they’re supposed to do, and they do it well.

What I’m suggesting is that the next time you reach for an environment variable to control something that’s really a runtime decision – a rollout, an experiment, a kill switch, a per-tenant behavior – pause and ask: am I using a Layer 1.5 tool for a Layer 3 problem?

Because the cost of that mismatch isn’t the cost of the environment variable itself. It’s the restart during the incident; the “all users at once” rollout that you can’t roll back without another deployment; the config change at 3 AM with no audit trail; and the workaround code you build around a mechanism that was never designed for what you’re asking it to do.

Environment variables got you here. They’re solid. They’re staying.

But like any other tool, they are not a silver bullet. And pretending they are is how teams end up with a dirt road where a paved path for Layer 3 should be.

Environment variables aren’t the only Layer 1-2 mechanism being stretched beyond its design. Next, we look at what happens when static configuration baked into your artifacts hits the same ceiling. And why comparing the speed of light with your CI/CD pipeline’s speed will show the same problem – an upper bound, but that one is on your reaction time.

This is the 2nd post of “The Runtime Control Layer” – a series on FeatureOps for infrastructure engineers.

Previously: Build, Deploy, or Request: Where your configuration decisions actually belong.
Next: Static Config in Your Artifacts: The Comfort and the Cost.

Share this article

Feature flag use cases

Customer Case Studies

Get Started with Open Source

Learn & Improve