Feature flag best practices in a nutshell
Feature flags provide some great opportunities when it comes to decreasing time-to-market while keeping control of possible issues impacting your end-users. In the blog post “What are the feature flag best practices?” we looked into best practices while working with feature flags. Feature flag lifetime is still a challenge. Flags are easy to create, often forgotten about thereafter, right? After all, usually feature flags are to be considered technical debt in your code. Our experience is that feature flags tend to live for too long.
Some feature flag lifetime statistics first
What is the range of lifetimes of a feature flag? This was a question ignited by Kent Beck on twitter, and triggered our curiosity here at Unleash.
Fortunately, Unleash has been in use by many applications over the last few years and we have gathered statistics from 130 applications. Here is some of what we found; On average, an application uses 9 flags, but there is a huge spread. Probably because an application can either be a small micro-service or a huge monolith. We went in and looked at all flags used by these applications, and plotted when they were marked as “archived”, which is an active act of the user when the flag is not needed anymore.
Histogram showing the number of flags that fall inside each bucket. You can see from the chart that most flags only live for 5 days.
From the histogram, we can see that about 50 percent of the flags are archived within the 50 first days. INF is used to collect all flags active for more than 425 days, and 11% falls into this bucket. We suspect these flags are mostly killing switches. The rest of the flags, about 39%, seems to live anywhere between 50 days 425 days. We have not investigated whether these flags have been in active use, or simply been forgotten about.
The second question Beck asked was the change frequency of feature flags. Luckily, we have metrics for that too. What we see from our numbers is that most flags are changed 5 times in their life span (median). Below is a histogram of the change frequency of the inspected feature flags.
Histogram showing the change frequency for feature flags. You can see from the chart that most flags are changed 5 times or less.
In the next paragraphs, we will look into the different types of flags. We will also share our thoughts on what they are used for and how long to keep them around.
Long lived vs. Short lived feature flags lifetime
The expected feature flag lifetime typically depends on its purpose for being created. Our experience is that there are three broad categories of feature flags:
- Release flags
- Experiment flags
- Operational flags
These categories also indicate the expected feature flag lifetime.
In this blog post by Peter Hodgson, a fourth category, «Permission feature flags» are also introduced. This category is to enable features to certain sets of users, e.g. Alpha users, premium features to VIP customers, and similar. We in Unleash will argue that this is not a good practice to use feature flags for your up-sell purposes. When you are providing premium features to your premium customers, you need to make sure you can easily track this in your CRM system. Obviously, you could integrate your favorite feature flag management system with your back-office systems. Then again, we believe you should aim for best-of-breed systems to operate your business.
The release flags are created in order to verify that the newly created feature works as expected in the live production environment. The business reason to utilize release flags is to decrease time spent on some of the testing and validation of the developed feature upfront release to production. By doing this, the risk of issues increases. To balance this, release flags are introduced. A best practice is to use a Gradual roll-out pattern.
Gradual rollout pattern
The gradual roll-out pattern is to enable the newly deployed feature for a small number of users. Often this pattern is referred to as Canary releases. At Unleash we usually refer to this pattern as the Gradual roll-out pattern, as we think this is more descriptive. First, the newly developed feature is enabled only for the developer, or the developer team. This is easily achieved by using the userWithId activation strategy in Unleash. This allows the development team to verify the deployed feature without the risk of impacting any end users.
Second, the team might enable the feature for some or all employees of the company. This will increase the number of users facing the new feature while limiting the impact of any wrongdoing. By adding a strategy type called ‘TeamID’ and adding team members to this activation strategy, the team will see the new feature. The last step is to enable the new feature for a small percentage of the end-users. Depending on the traffic, this can be anywhere between 1% – 25% of the users. The purpose of this step is to validate the release by exposing it only to a small number of end-users. If any issues are discovered, the team has an easy way back. This step is done by a gradual roll-out strategy in Unleash.
The product owner can decide to introduce the new feature to the Alpha/Beta or users in the Early adopter program. From a product management point of view, this is a great way to engage with your most valued customers, providing them with new features before it is generally available.
General availability pattern
The purpose of the general availability pattern is to decouple code deployment to production from general availability to customers. The main reason for using this pattern is to decrease the use of feature branches. In what is a feature flag service blog post we have walked through why using feature branches are bad for your time-to-market.
The general availability pattern wraps the new features not yet available to the end-users using a feature flag. This allows the development teams to continue to put the new, and sometimes yet unfinished features, into production while the product owner is in control of when it will be released to the end-users. The business reasons for using the general availability pattern can be many. It might be connected to contractual reasons. Some customers have different requirements on how often they are willing to take new updates in the code.
Planned marketing campaigns are other reasons why the general availability pattern is used. In this situation, the marketing campaign of the new features requires the feature to be generally available at a specific date and time to get the maximum ROI of the campaign. Using the general availability pattern allows the product manager to enable the new functionality for the public at the right time for the marketing campaign. One example is if the new features are introduced to the public live on stage at a conference.
Release flags and expected lifetime
It’s easy to understand that release feature flags usually have a short expected lifetime. Usually, these feature flags live for days or weeks. Sometimes they might live for some months, even though there should be strong reasons for them to live this long. When the new feature is out and generally available, the team should plan for cleaning up the feature flags as part of the technical debt management. This step is usually what we find most teams tend to skip, due to the need or desire to quickly move on to the next part to be developed.
The purpose of experiment flags is to allow rapid learning and introduce more fact-based decision-making into the product development process. The feature flags are in this pattern enabling different variants of the feature to different segments of the customer base. By applying metrics to the experiments, the product owner or product manager is able to know what variant gives the best business outcome.
The best-known pattern is the A/B/n testing. In this pattern, at least two variants (A and B) of the same feature are developed and put into production in parallel. The next step is now to decide how much of the traffic should be forwarded to each of the variants. Typically this would be a 50/50 split, although different configurations might be used, depending on the context of the experiment. To achieve this, the variant support in Unleash is being used in combination with a suitable activation strategy. In some situations, it might be needed to run more than two variants in parallel parts of the experiment. This is easily achieved by adding more variants to the feature flags.
Expected feature flag lifetime
The expected lifetime of the experiment flags obviously depends on the type of experiment and the amount of traffic available. To get a statistically valid result of the experiment, the dataset (e.g. the user segments and the metrics in the experiments) are of most importance. We will not dive into these details in this blog post. From our experience, the experiment flags usually live for weeks.
Sometimes they live for months, but we do advise against them to live for multiple months. The reason for this is that usually, you should have a clear direction after running the experiment in a month or two. Also, it is our experience that it is quite complex and time-consuming running multiple experiments in parallel over a long period of time. Then it is better to decrease complexity, move onto the next experiment. If the decision proved to be wrong – pick it up again and improve.
The purpose of operational flags is to introduce a quick safety mechanism to increase control in situations where not all parts of the system might be under the control of the DevOps team. The best known operational flag pattern is «Kill switches».
The Kill switch feature flag pattern as feature flags that wrap in less important or flaky parts of your system. The reason for doing this is usually one of two. The first reason is that this allows the DevOps team to disable these less important parts of the system in periods when there is unusually high traffic. The reason to disable part of the system is the fact that in such unusual times, it is better to keep parts of the business up compared to close it all down.
The other scenario where Kill switches are a typical pattern is when the system is dependent on 3rd party integrations. In this scenario, the DevOps team usually wants an easy way to disable this integration while having the business as undisrupted as possible. From our experience, 3rd party integrations often are challenging to work with since parts of the system is outside the control of the development team. Read more about kill switches and best practices.
Expected feature flag lifetime
It is in the operational flag category where we see the largest variance in the expected lifetime of the feature flags. As with the Release category, they are usually introduced as part of a release of a new feature and they should be removed from the code as soon as the DevOps team gets confident that there are no issues. Still, quite often we do see that the DevOps team decides to keep some of the Operational flags for a long period of time, sometimes even permanent. From our experience, this decision makes sense.
Feature flag best practices – Summary
As for most of the topics in product development, the question of «what is the expected lifetime of a feature flag» is «it depends». At Unleash, we strongly believe what is most important, is that the development team has a good understanding of the purpose of the feature flag. Then the team needs to make a conscious decision on how to handle the feature flag as part of the technical debt management. What is clear – feature flags tend to live for too long.
Want to try Unleash?
|START FREE||PLAY WITH DEMO|