Feature flags could transform how you test software


Feature flagging tools can easily be integrated into a testing environment. Along the way, these tools have a tendency to simplify infrastructure.

Why test?

However you slice it, testing is critical to software development. The quality of software running in production is a key income driver for mature organizations. 

They cannot treat failure as unexpected. Understandably, preventing failure becomes a big focus. That means testing early, testing often.

Testing is a great way to make sure that the quality of a product remains consistent and reliable. But, just as software is constantly evolving, so are testing techniques and approaches.

Let’s talk about some of them.

Testing types

There are a number of ways to test software. They generally fall into three categories:

  • Unit Tests: These are for testing individual units or code components. Code coverage is king, here. Unit tests should ensure the specific code is working exactly as expected.
  • Integration Tests: These take a more holistic approach to testing a code base. Integration tests focus on how different pieces work together. They can be a bit more abstract, and focus on reducing problem sets.  
  • Functional/System Tests: These tests focus on the end-user/client experience. The intent is to answer the question, “How does the piece of software function in practice?” Often functional and system tests are considered the final authority in testing. 

You might encounter other types of tests such as performance tests or security tests. While they each have their own function, for the purpose of discussion they can be considered part of one of the larger three test categories. 

Testing with feature flags

Traditional tests tend to house functionality across a number of different branches or commits. The more versions out in production, the more branches/commits are needed to be maintained for testing. 

Testing with feature flags is simpler than that. The flags keep functionality within a single code base. They determine whether a functionality is active or not.

This can be true regardless of how many versions are out in the wild. The complexity comes from your own unique runtime control needs. If it makes sense to incorporate different personas for integration or system tests, there’s definitely space for it. 

Mirror Production

Ideally, your feature management software will include targeting. This includes the ability to target subgroups for use cases like entitlements or canary releases.

You really want to duplicate your production environment as best as possible. You should try to fill your tests with as many possible contexts as you can think of. 

For example, if a feature is only available to beta users, then the test should be run with both activated and deactivated contexts in the form of a “beta on” context and “beta off” context.

When doing higher level testing, be mindful that context should be consistent across different systems

One potential exception: Some flags are toggleable, but don’t have any associated targeting rules. You might want to create a special set of targeting rules anyway so that you’re able to test two aspects of a functionality. You should be mindful of the enablement context in case you’re dealing with adjacent systems.  

Another exception: You may have flags that engage in a percentage rollout. Usually this isn’t the best way to test functionality, as you’re relying on chance. In any case, you should just treat the flag configuration the same way you would a simple on/off feature flag.   

Test-specific contexts

Testing scenarios don’t always have to mimic real life, however. Instead of mirroring production with similar targeting rules, you can instead use test-specific contexts.

For example, a “customerPlan” context might seem intuitive for targeting rules with flags such as “show-new-feature.” But in a testing scenario, it might make more sense to have a context like “is-new-feature-version” with values of “disabled,” “v1,” and “v2.”

You might think this approach limits how similar a testing environment is with production, or get in the way of team collaboration, and you’re right. But this does ensure that the setup is clear for those actually conducting the tests.

Obviously there are pros and cons to mirroring and not mirroring environments. Your team may find some sort of hybrid approach works best.

Using Edge to keep performance up

When you add testing flags, you’re essentially introducing a platform into the general flow of your software’s behavior. This can compete with things like quick flagging initialization and changing contexts.

Here’s where a near-side proxy comes into play. For users of Unleash, this solution comes in the form of the Unleash Edge. 

Unleash Edge can service SDK’s with the same capacity as the host service with only a fraction of the requirements. This is true whether your Unleash instance is SaaS or self-hosted.

By placing at least one instance of Unleash Edge adjacent to the testing environment, any context change from a FE client will be nearly as fast as a local check from one of the server-side SDK instances. 

Your SDK  will benefit from quicker initialization times across the board, no matter which SDK you’re using. The latency between the service and your SDK will be brought down a heck of a lot. 

Similar to its production counterparts, in a test environment Unleash Edge benefits from a cluster configuration. This means better performance. It also means a huge amount of resilience. 


Testing with feature flags is often seen as a much bigger obstacle than it actually is. 

Of course, adjusting to a new methodology is a challenge in and of itself, and should not be discounted. But if you’re looking to engage in runtime control mechanisms like feature flags for the first time, don’t let that be a blocker. 

The benefits really are huge across the entire development life-cycle: from production, to debugging, to release. 

Share this article