Un-breaking breaking changes using feature flags
When we got a feature request that we really wanted to implement because it would be a great improvement to our product, we found ourselves in a bind. The problem was that the change would most likely break the workflows of other customers. In this article, I describe how I managed to roll out the change without triggering a flood of angry emails.
Recently, a customer of our managed hosting platform for Drupal and WordPress platform asked us to tighten the security of their website deployment. We realized that the necessary changes would not only help this customer but also improve website safety for all our customers. We love this kind of universally useful feature request because for operational efficiency reasons, we try to avoid making changes for individual customers only.
However, it was easy to see that the restrictions we had to implement for this feature request came with a high risk of breaking existing workflows. People were accustomed to the current level of restrictions, and when you’re above a certain number of customers, you can be sure that some of them will have set up processes that rely on this exact status quo.
A breaking change requires copious advance notice. But the customer who had raised the issue was more or less blocked until we shipped a solution. So how do you make a change in your production environment, but limit its effects to exactly those customers whom it will benefit, not disrupt?
The solution to this Catch 22 came to me late at night, right before falling asleep. (Let me know if you have the same kind of brain.) It’s called “feature flags”.
Feature flags
Feature flags are used to enable new features for a limited number of users only. Common reasons to limit the use of a feature to a smaller circle, at least initially, are that you’d like to prevent bursts in resource usage, or that you want to quickly discover issues that only show up in production without the risk that they’ll impact all your customers at once.
Technically, a feature flag allows you to add code paths additional or alternative to existing ones, and switch those new code paths on or off based on specific conditions. Because the state of a feature flag is usually based on information outside of your code, you can enable or disable a feature at any time without having to deploy updates.
Usually, state of a feature flag is derived from other data, for example a database field or an environment variable. If you have an early access program, the feature condition could be customer.eap_member == true
.
In a gradual rollout scenario, you might want to make the new feature available to only a fraction of your customer base for starters. You could pick those customers in a targeted or random way, write their customer ID’s to a database table with a feature identifier, and define the feature state like this:
feature_enabled?(customer: current_user.customer_id, feature: 17)`
It’s easy to build a simple feature flag implementation yourself. In more complex scenarios, I recommend you check out mature solutions like the Ruby gems Rollout and Flipper.
Feature flags for breaking changes
In my security change situation, I realized that I was dealing with just another feature rollout, only with the tiniest of target groups. I basically had to create an “early security improvement program” with the customer who requested the change as its only member. This would enable us to roll out the change immediately for them, and after a generous notice period, for all our customers.
The resulting logic in our infrastructure code turned out very simple:
set_up_website_environment
if feature_active?(:full_isolation)
set_more_restrictive_permissions
else
set_the_usual_permissions
end
By the time of writing, the new code has long been deployed to production. Funnily enough, it’s still completely dormant because our customer just could not find a good time for its activation yet. We might end up doing the full rollout process right away. Thanks to the feature flag being already in place, we’ll apply the change first to customers in our Early Access Program, and a bit later to everyone else — including the customer who initiated the whole project.
Summary
Feature flags allow you to have your cake, and eat it, too. You can roll out new code, but limit its activation dynamically in a data-driven manner.
Next time you have to introduce a code change that might not make everyone equally happy, consider using a feature flag to control its blast radius.