Tired of Cleaning Up Stale Feature Flags? Let AI Do the Work!

June 26, 2025

Article by Melinda Fekete

Let’s be honest, cleaning up unused feature flags isn’t exactly an exciting task. Unless you’re one of those magical humans who love thanking and letting go of lines of code that no longer spark joy, dealing with stale flags is likely not your favorite.

This isn’t just our observation; it’s a widely felt pain point in the developer community: “Incumbent tools warn, but cleanup is still manual diff-hunting in code (zombie flags, anyone?)”.

Having a good process for addressing technical debt is a non-negotiable part of managing code complexity. So during a recent hackathon, we decided to explore this topic. We asked ourselves: could we get AI to do the tedious task of cleaning up feature flags for us?

Here’s what we discovered when we put AI-powered coding tools to the test.

Why using AI for flag cleanup makes so much sense

Handing this task over to an AI is a pretty clear win. Here’s why:

It’s a low-creativity, limited-scope task. This is a repetitive job humans often dread, but AI is especially good at. Coding assistants can quickly navigate and understand a codebase, catch nuances and edge cases, and do that across any technology or programming language.
It’s non-negotiable for managing codebase complexity. So it’s in our best interest to find and experiment with new tools that take the headache out of the equation.
It helps you stay focused. Instead of stopping what you’re doing to hunt down every use of an old flag, you can let a tool run in the background. Imagine just getting a PR that’s ready for you to review.

Unleash already gives you helpful reminders to clean up your flags, but we wanted to see if we could cut out the manual work completely.

Dissecting flag removal

Before we could automate anything, we needed a solid process. We started by dissecting exactly what “removing a flag” means. Our initial goal was to create the perfect AI prompt that encapsulated our deep understanding of feature flags and their removal process, thereby creating an AI expert for flag cleanup.

So, let’s break down what we actually do when we remove a flag:

First, you make a decision. You typically go with one of the following options:

Keep the feature: Make the “on” path of the code permanent.
Discard the feature: Remove the feature and stick with the “off” or fallback behavior.
Keep one version: If you were testing variants, pick one and remove the code paths for the others.

Then, you roughly follow the steps:

Find all the places a specific flag is used in the code.
Understand what the code is doing.
Remove the conditional logic and associated code paths that are no longer needed, thereby simplifying the code’s flow.
Clean up anything left over, like unused imports, comments, dead code, or redundant boolean expressions that can now be simplified.

With this checklist, we wrote a very detailed set of instructions, complete with examples of what to do—and, more importantly, what not to do. We thought being super specific was the key.

We were wrong.

Sometimes, less is more

It turns out that some AI coding tools are very good at ignoring long, complicated instructions. We quickly realized that in most cases, less is more.

Many of the AI tools we tried are already pretty good at removing flags without much guidance. In fact, our super-detailed instructions sometimes confused them and led to worse results. The key takeaway? You need to experiment. Iterate, test, and find the sweet spot for your specific tool and environment. And given how quickly things move in the space, treat this as an ongoing journey of discovery, not a one-time setup.

Another fun caveat: the non-deterministic nature of these models means you can run the exact same prompt twice with the same model and environment and get a perfect result one time, and a very much unjustified 1000-line diff the next. We saw even more variance between different models and the tools they run in.

Our tried-and-true tools that get the job done

The AI tooling space is moving at lightning speed, so this list might be outdated by the time you finish this sentence. Nevertheless, here’s a snapshot of the tools we’ve experimented with:

The final output is shaped by three main factors: the clarity of your prompt, the model’s capabilities, and the tools you use. So far, our team’s favorites have been OpenAI Codex, Aider with Gemini 2.5 Pro, and JetBrains Junie.

The magic of simple prompts

The good news is that you usually don’t need a complicated prompt.

Try prompting your coding assistant with natural language like:

“Remove the ai-cleanup-experiment feature flag. Keep the feature—it should always be enabled now.”

And for features that didn’t make the cut:

“Remove the ai-cleanup-experiment feature flag. Discard it and use the fallback behavior.”

This simple approach works surprisingly well most of the time. The AI will find the flag, clean up the conditional logic, and remove the leftover pieces.

Once you know what works, you can begin to automate more around the prompt and context management. Some tools, like Cursor, let you create reusable instructions. And with GitHub Copilot, you can even assign an existing GitHub issue and get Copilot to open a PR for you.

As great as this is, it’s not perfect. Sometimes the coding assistant does a flawless job. Other times… not so much. So for now, use these tools as what they are: assistants. They do the heavy lifting, but you’re still the one who needs to approve the final result.

What we’re thinking about next

This whole experiment made us realize something important. The real value isn’t just in the time you save with the cleanup, but in automating the entire process. Reminders and tasks that are automatically created for the cleanup are just as important—they ensure that cleanup doesn’t get swept under the rug, or worse, forgotten about. We want to have virtually no viable excuses for not cleaning up flags.

So what if Unleash could handle the flag removal for you? Imagine marking a feature flag as completed in Unleash. That single click would automatically create a cleanup issue in GitHub, assign it to a coding agent, and have the PR waiting for you.

We’re actively exploring how to make AI-powered flag cleanup a reliable, integrated part of Unleash. If you’ve been experimenting with AI workflows or if the idea of automated tech debt removal gets you as excited as we are, we’d love to hear from you!

Share this article

Tired of Cleaning Up Stale Feature Flags? Let AI Do the Work!

Why using AI for flag cleanup makes so much sense

Dissecting flag removal

Sometimes, less is more

Our tried-and-true tools that get the job done

The magic of simple prompts

What we’re thinking about next

Explore further

Google to Engineers: “It’s Time to Make Feature Flags Mandatory”

Agentic Software Development Patterns and Feature Flag Runtime Primitives

Smarter Software Releases: Why Feature Flags Are the Key to Modern Release Management