A/B testing Archives - SD Times

Harness updates platform with Test Intelligence, Feature Flags, and more

Jakub Lewkowicz — Wed, 16 Jun 2021 17:46:13 +0000

Harness announced that it is leveling up its software delivery pipeline with new test intelligence, feature flags and cloud autostopping capabilities.

Harness’s new test intelligence feature reduces test cycle time by up to 98% by using AI/ML workflows to prioritize and optimize test execution without compromising quality. The new capabilities shift failed tests earlier into the build cycle so that developers can quickly find out if a fix worked.

The new feature flag capabilities enable developers to release new features without making them visible to users. It also makes it easier to try capabilities such as A/B testing or software functionality variations like one- or two-step checkout.

Developers currently use multiple toolsets and pipelines for software delivery, which limits their velocity, productivity and deployment frequency due to context switching, and babysitting configuration and upgrades which introduces toil for developers. The new Unified Pipeline enables them to manage all aspects of software delivery from a single tool, the company explained.

Harness also integrated its acquisition of Lightwing technology into its Cloud Cost Management module to enable engineering teams to auto-stop and restart their non-production environments within seconds.

“Significant costs and many hours are incurred daily as engineering teams continuously build, test and deploy software,” said Jyoti Bansal, the CEO and cofounder of Harness. “The new Harness platform gives developers the only pipeline they’ll need. Customers can now do it all from one platform—so they can ultimately deliver software at scale quickly, reliably and securely.”

Additional details on the expanded capabilities within Harness’ platform are available here.

The post Harness updates platform with Test Intelligence, Feature Flags, and more appeared first on SD Times.

SD Times Open-Source Project of the Week: Lumos

Jakub Lewkowicz — Fri, 10 Jul 2020 13:14:27 +0000

This week’s featured open-source project is Lumos, a Python library built to compare metrics between two datasets, accounting for population differences and invariant features.

Lumos was open sourced this month by Microsoft. In a technical paper that shows the results from a real-world deployment of Lumos in Microsoft RTC applications , the Microsoft team wrote: “Regressions in metric values need to be detected and diagnosed as early as possible to reduce the disruption to users and product owners.”

“It has enabled engineering teams to detect 100s of real changes in metrics and reject 1000s of false alarms detected by anomaly detectors,” the team added.

According to Microsoft, regressions commonly surface due to genuine product regressions, changes in user population, and bias due to telemetry loss or processing.

The application of Lumos has resulted in freeing up as much as 95% of the time allocated to metric-based investigations. Now the project has been open sourced, it can be coupled with any production system to manage the volume of alerting efficiently, Microsoft explained.

The solution uses a new methodology that includes existing, domain-specific anomaly detectors, but was found to reduce the false-positive alert rate by over 90%. It also provides insight into locating the root cause of a Key Performance Indicator (KPI) incident.

Lumos uses the principles of A/B testing and compares the dataset before the metric anomaly and the dataset during the metric anomaly. The configuration file contains hyper-parameters for running the workflow and details which columns in the dataset correspond to the metric, invariant, and hypothesis columns.

This project has adopted the Microsoft Open Source Code of Conduct and is welcoming contributions and suggestions.

The post SD Times Open-Source Project of the Week: Lumos appeared first on SD Times.

premium Feature experimentation: Walk before you run

Christina Cardoza — Thu, 09 Jul 2020 15:15:51 +0000

Software innovation doesn’t happen without taking risks along the way. But risks can be scary for businesses afraid of making mistakes.

There is another way, according to Jon Noronha, senior vice president of product at Optimizely, a progressive delivery and experimentation platform provider. Feature experimentation, he said, allows businesses to go to market quicker while improving product quality and minimizing the fear of failure.

“I like to think of feature experimentation as a safety net. It’s something that gives people the confidence to do something bold or risky,” he said. “Imagine you are jumping on a trapeze with no net. You’re going to be really scared to take even the smallest step because if you fall, you’re going to really hurt yourself. When there is a net, you know the worst thing that can happen is you land on the net and bounce a little bit.”

Feature experimentation is that net that allows you to leap, but catches you if you fall, Noronha explained. It enables businesses to take small risks, roll it out to a few users, and measure the impact of changes before releasing it to 100% of the user base.

Christopher Condo, a principal analyst at the research firm Forrester, said, “In order to be innovative, you need to really understand what your customers want and be willing to try new experiences. Using feature experimentation allows businesses to be more Agile, more willing to put out smaller pieces of functionality, test it with users and continue to iterate and grow.”

However, there are still some steps businesses need to take before they can squeeze out the benefits of feature experiment. They need to learn to walk before they can run.

Progressive Delivery: Walk
Progressive delivery is the walk that comes before the run (feature experimentation), according to Dave Karow, continuous delivery evangelist at Spilt, a feature flag, experimentation and CD solution provider. Progressive delivery assumes you have the “crawl” part already in place, which is continuous delivery and continuous integration. For instance, teams need to have a centralized source of information in place where developers can check in code and have it automatically tested for basic sanity with no human intervention, Karow explained.

Without that, you won’t see the true promise of progressive delivery, John Kodumal, CTO and co-founder of LaunchDarkly, a feature flag and toggle management company, added.

“Imagine a developer is going to work on a feature, take a copy of the source code and take a copy of their plan and work on it for some time. When they are done, they have to merge their code back into the source code that is going to go out into production,” Karow explained. “In the meantime, other developers have been making other changes. What happens is literally referred to in the community as ‘merge hell.’ You get to a point where you think you finished your work and you have to merge back in and then you discover all these conflicts. That’s the crawl stuff. It’s about making changes to the software faster and synchronizing with coworkers to find problems in near real-time.”

Once you have the crawl part situated, the progressive delivery part leverages feature flags (also known as feature toggles, bits or flippers) to get features into production faster without breaking the application. According to Optmizely’s Noronha, feature flags are one layer off the safety net that feature experimentation offers. It allows the development teams to try things at lower risks and roll out by slowly and gradually enabling developers to expose key functionalities with the goal of catching bugs or errors before they become widespread. “It’s making it easier to roll things out faster, but be able to stop rollouts without a lot of drama,” Karow said.

Some examples of feature flags

Feature flags come in several different flavors. Among them are:

Release flags that enable trunk-based development. “Release Toggles allow incomplete and un-tested codepaths to be shipped to production as latent code which may never be turned on,” Pete Hodgson, an independent software delivery consultant, wrote in a post on MartinFowler.com.
Experiment flags that leverage A/B testing to make data-driven optimizations. “By their nature Experiment Toggles are highly dynamic – each incoming request is likely on behalf of a different user and thus might be routed differently than the last,” Hodgson wrote.
Ops flags, which enable teams to control operational aspects of their solution’s behavior. Hodgson explained “We might introduce an Ops Toggle when rolling out a new feature which has unclear performance implications so that system operators can disable or degrade that feature quickly in production if needed.”
Permission flags that can change the features or experience for certain users. “For example we may have a set of ‘premium’ features which we only toggle on for our paying customers. Or perhaps we have a set of “alpha” features which are only available to internal users and another set of “beta” features which are only available to internal users plus beta users,” Hodgson wrote.

One way to look at it is through the concept of canary releases, according to Kodumal, which is the idea of being able to release some change and controlling the exposure of that change to a smaller audience to validate that change before rolling it out more broadly.

These flags help minimize the blast radius of possible messy situations, according to Forrester’s Condo. “You’re slowly gauging the success of your application based on: Is it working as planned? Do customers find it useful? Are they complaining? Has the call value gone up or stayed steady? Are the error logs growing?” As developers implement progressive delivery, they will become better at detecting when things are broken, Condo explained.

“The first thing is to get the hygiene right so you can build software more often with less drama. Implement progressive delivery so you can get that all the way to production. Then dip your toes into experimentation by making sure you have that data automated,” said Split’s Karow.

Feature experimentation: Run
Feature experimentation is similar to progessive delivery, but with better data, according to Karow.

“Feature experimentation takes progressive delivery further by looking at the data and not just learning whether or not something blew up, but why it did,” he said.

By being able to consume the data and understand why things happen, it enables businesses to make better data-driven decisions. The whole reason you do smaller releases is to actually confirm they were having the impact you were looking for, that there were no bugs, and you are meeting users’ expectations, according to Optmizely’s Noronha.

It does that through A/B testing, multi-armed bandits, and chaos experiments, according to LaunchDarkly’s Kodumal. A/B testing tests multiple versions of a feature to see how it is accepted. Multi-armed bandits is a variation of an A/B test, but instead of waiting for a test to complete it uses algorithms to increase traffic allocations to see how features work. And chaos experiments refer to finding out what doesn’t work rather than looking for what does work.

“You might drive a feature experiment that is intended to do something like improve engagement around a specific feature you are building,” said Kodumal. “You define the metric, build the experiment, and validate whether or not the change being made is being received positively.”

The reason why feature experimentation is becoming so popular is because it enables development teams to deploy code without actually turning it on right away. You can deploy it into production, test it in production, without the general user base seeing it, and either release it or keep it hidden until it’s ready, Forrester’s Condo explained.

In some cases, a business may decide to release the feature or new solution to its users, but give them the ability to turn it on or off themselves and see how many people like the enhanced experience. “Feature experimentation makes that feature a system of record. It becomes part of how you deliver experiences to your customers in a varied experience,” said Condo. “It’s like the idea of Google. How many times on Google or Gmail has it said ‘here is a brand new experience, do you want to use it?’ And you said ‘no I’m not ready.’ It is allowing companies to modernize in smaller pieces rather than all at once.”

What feature experimentation does is it focuses on the measurement side, while progressive delivery focused on just releasing smaller pieces. “Now you are comparing the 10% release against the other 90% to see what the difference is, measuring that, understanding the impact, quantifying it, and learning what’s actually working,” said Opitmizely’s Noronha.

While it does reduce risks for businesses, it doesn’t eliminate the chance for failure. Karow explained businesses have to be willing to accept failure or they are not going to get very far. “At the end of the day, what really matters is whether a feature is going to help a user or make them want to use it or not. What a lot of these techniques are about is how do I get hard data to prove what actually works,” Karow explained.

To get started, Noronha recommends to look for parts of the user experience that drive traffic and make simple changes to experiment with. Once they prove it out and get it entrenched in one area, then it can be quickly spread out to other areas more easily.

“It’s sort of addictive. Once people get used to working in this way, they don’t want to go back to just launching things. They start to resent not knowing what the adoption of their product is,” he said.

Noronha expects progressive delivery and feature experimentation will eventually merge. “Everyone’s going to roll out into small pieces, and everyone’s going to measure how those things are doing against the control,” he said.

“What both progressive delivery and feature experimentation do is provide the ability to de-risk your investment in new software and R&D. They give you the tooling you need to think about decomposing those big risky things into smaller, achievable things where you have faster feedback loops from customers,” LaunchDarkly’s Kodumal added.

Experimenting with A/B testing
A/B testing is one of the most common types of experiments, according to John Kodumal, CTO and co-founder of LaunchDarkly, a feature flag and toggle management company

It is the method of comparing two versions of an application or functionality. Previously, it was more commonly used for front-end or visual aesthetic changes done to a website rather than a product. For instance, one could take a button that was blue and make it red, and see if that drives more clicks, Jon Noronha, senior vice president of product at Optimizely, a progressive delivery and experimentation platform provider, explained. “In the past several years, we’ve really transitioned to focusing more on what I would call feature experimentation, which is really building technology that helps people test the core logic of how their product is actually built,” he said.

A/B testing is used in feature experimentation to test out two competing theories and see which one achieves the result the team is looking for. Christopher Condo, a principal analyst at the research firm Forrester, explained that “It requires someone to know and say ‘I think if we alter this experience to the end user, we can improve the value.’ You as a developer want to get a deeper understanding of what kind of changes can improve the UX and so A/B testing comes into play now to show different experiences from different people and how they are being used.”

According to Dave Karow, continuous delivery evangelist at Spilt, a feature flag, experimentation and CD solution provider, this is especially useful in environments where a “very important person” within the business has an opinion or the “highest paid person” on the team wants you to do something and a majority of the team members don’t agree. He explained normally what someone thinks is going to work, doesn’t work 8 or 9 times out of 10. But with A/B testing, developers can still test out that theory, and if it fails they can provide metrics and data on why it didn’t work without having to release it to all their customers.

A good A/B test statistical engine should be able to tell you within a few days which experience or feature is better. Once you know which version is performing better, you can slowly replace it and continue to iterate to see if you can make it work even better, Condo explained.

Kodumal explained A/B testing works better with feature experimentation because in progressive delivery the customer base you are gradually delivering to is too small to run full experiments on and achieve the statistical significance of a fully rigorous experiment.

“We often find that teams get value out of some of the simpler use cases in progressive delivery before moving onto full experimentation,” he said.

Feature experimentation is for any company with user-facing technology
Feature experimentation has already been used among industry leaders like eBay, LinkedIn and Netflix for years.

“Major redesigns…improve your service by allowing members to find the content they want to watch faster. However, they are too risky to roll out without extensive A/B testing, which enables us to prove that the new experience is preferred over the old,” Netflix wrote in a 2016 blog post explaining its experimentation platform.

Up until recently it was only available to those large companies because it was expensive. The alternative was to build your own product, with the time and costs associated with that. “Now there is a growing marketplace of solutions that allow anyone to do the same amount of rigor without having to spend years and millions of dollars building it in-house,” said Dave Karow, continuous delivery evangelist at Spilt, a feature flag, experimentation and CD solution provider

Additionally, feature experimentation used to be a hard process to get started with, with no real guidelines to follow. What has started to happen is the large companies are getting to share how their engineering teams operate and provide more information on what goes on behind the scenes, according to Christopher Condo, a principal analyst at the research firm Forrester. “In the past, you never gave away the recipe or what you were doing. It was always considered intellectual property. But today, sharing information, people realize that it’s really helping the whole industry for everybody to get better education about how these things work,” Condo said.

Today, the practice has expanded into something that every major company with some kind of user-facing technology can and should take advantage of, according to Jon Noronha, senior vice president of product at Optimizely, a progressive delivery and experimentation platform provider.

Norona predicts feature experimentation “will eventually grow to be adopted the same way we see things like source control and branching. It’s going to go from something that just big technology companies do to something that every business has to have to keep up.”

“Companies that are able to provide that innovation faster and bring that functionality that consumers are demanding, they are the ones that are succeeding, and the ones that aren’t are the ones that are left behind and that consumers are starting to move away from,” John Kodumal, CTO and co-founder of LaunchDarkly, a feature flag and toggle management company, added.

The post premium Feature experimentation: Walk before you run appeared first on SD Times.

SD Times Open-Source Project of the Week: spark-inequality-impact

Jakub Lewkowicz — Fri, 29 May 2020 13:21:16 +0000

LinkedIn is sharing its “Project Every Member” initiative with the open sourcing of spark-inequality-impact, an Apache Spark library that can be used by other organizations in any domain where measuring and reducing inequality, or avoiding unintended inequality consequences may be desirable.

“This work is furthering our commitment to closing the network gap and making sure everyone has a fair shot at finding and accessing opportunities, regardless of their background or connections,” LinkedIn wrote in a blog post.

LinkedIn announced last month that it would be building inclusive products through A/B testing in the initiative called Project Every Member.

LinkedIn stated that any change on its platform is subjected to a series of testing and analysis processes to ensure that it achieves intended product goals and business objectives through A/B testing. The best way to go about it is to start by giving a preview of the change or feature to a few members for a limited time, and then measure the results.

The Atkinson index is then used to determine which end of the distribution contributed most to the observed inequality and allows developers to encode other information about the population being measured into the analysis to overcome any shortcomings that A/B testing has.

LinkedIn decided to implement Atkinson index computations using Apache Spark due to scalability considerations with respect to the size of the data over which to compute inequality, for example, the number of individuals who are part of specific A/B tests and the number of times inequality needs to be computed.

While inequality metrics can already be computed on R and Python, they typically require users to fit all the data in memory within a single machine.

“We are releasing a package that leverages the fact that the Atkinson index can be decomposed as a sum, which means the data does not to be held in memory all at once. We then use it as part of a larger pipeline that applies it to many A/B tests at once,” LinkedIn wrote.

The code is available on GitHub here.

The post SD Times Open-Source Project of the Week: spark-inequality-impact appeared first on SD Times.

Process drives better feature flag management

Lisa Morgan — Tue, 03 Dec 2019 18:31:54 +0000

Feature experimentation platforms make feature flagging easier to do and easier to manage, but even the greatest tools don’t contain all the DNA an organization needs to succeed. Without a process in place, developers, product managers, and even salespeople may be able to turn flags on and off at will, which may not be the most prudent course of action. When a process is in place, businesses are in a better position to ensure their feature experimentation aligns with their goals.

“Trouble can ensue if everyone is turning things on and off willy-nilly,” said Chris Condo, senior analyst at Forrester. “It should be understood that certain people have the authority to do things and other people don’t. If any salesperson, any account manager, anyone, can have access to this, I think that that’s asking for trouble.”

Feature experimentation platforms tend to provide visibility and insight into who’s seeing what, when, and why they’re seeing it. Condo said LaunchDarkly’s designation as a “feature management” platform is ingenious because it suggests that feature flagging should be an ongoing practice. However, it’s not just clever marketing; the platform (and some others) provide true feature management capabilities.

“If you’re littering your code with code paths that aren’t being executed, you’re going to end up with technical debt that you want to keep in check,” said John Kodumal, CTO and co-founder of LaunchDarkly. “You need tools that are prescriptive about telling you when to remove feature flags. If you don’t have that, you can end up with a lot of problems.”

For example, one company built its own feature flagging system, but feature flags were never deleted. When the system crashed, the website reverted to an old version that was a decade out of date because there were a lot of code paths no one had ever seen before. The old version of the site was non-functional, and apparently no one had tested it. So, the end result was a site-wide outage.

Tools aren’t everything
Tools are essential for sound feature flag management, but tools can’t guarantee sound feature flag management, particularly if they’re not supported by processes.

“Process, tools, and culture have to work together. You have to have a process that allows you to remove flags as quickly as possible,” said Kodumal.

Conversely, while a process may exist, developers may still not feel comfortable deleting flags if they don’t trust the tooling. Alternatively, feature flag removal might be something that’s done during a cleanup week as opposed to continuously.

“A set of practices and techniques are only useful if they’re adopted consistently across an organization. The technology is sort of the prerequisite for that, but it’s not the whole story,” said Jon Noronha, senior director of product at Optimizely. “Big established companies that haven’t always done things like feature flagging and experimentation [need help making] that jump along the way.”

Documentation is also important, according to Sophie Harpur, product manager at Split.io. That means documenting the hypothesis, what was tested, how it was tested, what was learned by turning the feature on and off, and next steps, including the next feature that’s going to be released.

“Having that all in a centralized place is really important to foster an experimentation culture,” said Harpur. “Having that documentation of experiments gives you that opportunity to make sure things are being done correctly, so allowing people to review what people are testing, getting more eyes on it to make sure people are testing correctly and effectively. We’re always keen for process around experimentation.”

The post Process drives better feature flag management appeared first on SD Times.

Speed releases with feature flags

Lisa Morgan — Tue, 03 Dec 2019 17:00:57 +0000

Feature experimentation platforms can help organizations deliver software faster because they are aware of expected and unexpected feature-related behavior sooner. As organizations continue to accelerate their release cycles, they continue to ship ever smaller amounts of code that should be experimented with, individually.

“It’s the next obvious step in continuous delivery [because] continuous delivery is all about shipping the smallest unit of change LaunchDarklypossible, which is an individual code path,” said John Kodumal, CTO and co-founder of . “Feature planning is fundamentally about that, and what that unlocks is pretty game-changing.”

RELATED CONTENT: Waving the flag for feature experimentation

Feature flagging also enables teams to work even more independently than they have been, which also tends to accelerate development.

“Customers using DVCS and feature branches have told us it was really expensive to take a long branch, merge that back into the mainline code base and deploy it because the two code branches diverged pretty quickly,” said Kodumal. “Integration time is extremely expensive.”

With feature flagging, teams can merge code and keep it turned off and then turn it on in an isolated environment, like a staging server or a developer’s production account, which also helps teams move faster and be more productive. However, “move faster” doesn’t necessarily equate to “move faster and break things.” Organizations need to take risk management into account, because the “value” they’re delivering isn’t as valuable as they think if it’s being delivered at the price of customer angst or the company’s reputation.

“Release velocity has gone up in many cases, but increasingly you see teams scratching their heads wondering about the actual value of all these launches,” said Jon Noronha, senior director of product at Optimizely. “People find themselves in ‘the feature factory’ where they’re shipping one thing after another, but uncertain about the actual [business] outcomes.”

Increasingly, businesses want to understand the value and outcomes of investments. Feature flagging helps.

“People want to be able to use feature flagging for all kinds of software, not just the web, desktop or mobile devices, but things like Amazon Fire Sticks or Roku boxes,” said LaunchDarkly’s Kodumal. “Whatever it is, the story’s the same: they need to deploy small units of code that have changed quickly across any networked piece of software.”

Speed does not trump quality
Digital experiences in all channels are constantly resetting customers’ expectations. While some organizations may deliver great experiences in one or more channels, they may fall short in others. Yet, from the customer perspective, the best example in any channel is the standard by which all others are measured.

“People expect the software they’re using on a day-to-day basis to be flawless and they don’t tolerate crashes they way the used to,” said Kodumal. “When we as consumers trust software companies to so much of our lives, it’s just not acceptable for the bank app to crash or for the banking app to be unavailable because the company needs three hours of downtime to upgrade the app.”

Feature flagging allows users to see and measure expected and unexpected impacts.

“Doing things in a data-driven way can [help you identify expected and unexpected impacts] quickly and allow you to reduce the number of customers that are affected,” said Sophie Harpur, product manager at Split.io. “You should have metrics tied to every feature release and make sure you’re detecting things that you’re not expecting.”

Another benefit of feature flagging is the ability to move beyond brain-dead minimal viable products (MVPs) to MVPs that enable customers to test-drive software on their own data.

“I can put out an MVP to a select set of alpha users who are maybe super users of my product and get their direct feedback. If I’m using a feature experimentation platform and you’re not, one of the problems you’re going to have is you’re going to have to deploy that code to a staging server, but users won’t be able to use it on their data because your software is quarantined on a separate set of servers,” said Chris Condo, senior analyst at Forrester. “Meanwhile, my users are testing my feature directly on their data. They can look at it, see how it works, try their complex queries or look at the data in different ways. I’ve lowered risk and I’ve increased my velocity because I know I can control the exposure of that feature.”

The post Speed releases with feature flags appeared first on SD Times.

Waving the flag for feature experimentation

Lisa Morgan — Tue, 03 Dec 2019 16:27:23 +0000

Digital transformation is making companies more software-dependent than they’ve ever been. As analog products become digital and manual processes are superseded by their automated or machine-assisted equivalents, organizations are building more apps at the core and edge to compete more effectively. One way of hedging bets in an organization’s favor is using feature flagging to determine what is and is not translating to business value.

Of course, feature flagging isn’t a new concept for developers, but the thought processes around it and its usage are evolving.

“I think the major change is that people realize that instead of being the exception to how engineering works where they’re using feature flags because they have a high-risk feature or they’re not quite sure how it’s going to work, now they’re looking at it as like maybe that’s how we should manage things all the time,” said Chris Condo, senior analyst at Forrester.

What’s new about feature flagging
Feature experimentation platforms tend to support a range of methods that may include feature flagging, A/B testing, and canary and blue-green deployments. The platforms also reflect the shrinking nature of what’s being released with some platforms supporting increments as small as a code path.

Interestingly, the users of such platforms are evolving to include not only developers and DevOps teams, but product managers, designers, and even line-of-business users. Who can access feature flags depends on the platform selected and the customer’s culture and practices.

“The needs of the product manager, the developer, the designer and the analyst come together in this world,” said Jon Noronha, senior director of product at Optimizely. “Increasingly, [developers] are collaborating directly with their product manager or with the data scientist that might be assigned to their team or going to a designer to answer certain questions upfront of a project that they might not have even asked before.”

Apparently, the collaboration is having some interesting side effects, like developers telling their product managers they want agreement on success metrics before they start writing code. That’s a stark contrast to the traditional way of working in which the product manager tells developers what features to build, and then developers build them. Instead, it’s becoming more of a collaborative effort in which goals are collaboratively defined.

“I think a lot of different functions now can benefit from experimentation. Obviously, marketing has played a part in optimization but also thinking of the HR department in your company. Can they test their job opening pages? How can they benefit from experimentation? Are they getting the right candidates in?” said Sophie Harpur, product manager at Split.io. “I think it kind of can go across the board, so making it accessible to everyone across the org.”

Feature flagging has been a developer concept historically. Product managers haven’t been able to access anything or change anything themselves. Similarly, a salesperson couldn’t turn a feature on or off for a prospect that wants to try something in beta.

“If you can only change things by playing around with command-line tools or editing values in the database, it’s kind of an archaic and error-prone process even for developers, so they’re not likely to use it all that often,” said John Kodumal, CTO and co-founder of LaunchDarkly. “It’s a niche technique you’re only going to use when you absolutely have to do it.”

Without a feature experimentation platform, a product manager usually files a Jira ticket. With a feature management platform, the same product manager can access the feature flags they need and modify them.

Driving positive business impacts
As more companies adopt a quantitative mindset, they’re compelled to measure the effectiveness of individual features through experimentation. Toward that end, organizations are monitoring technical metrics such as feature and API performance, and business metrics such as increasing customer engagement.

“Restaurants, airlines, and car manufacturers realize that in order to compete in 2019, they need to have the best software on the market,” said Optimizely’s Noronha. “They need to bring that in-house, build it themselves and adopt some of the best practices that the Silicon Valley elite use. Those companies use feature flagging pervasively throughout their processes.”

There are also some organizational dynamics that are fueling the demand for feature flagging, including CEOs who are questioning the value of software investments and the sought-after, recently acquired talent that wonders why the code base is such as mess and how anyone manages to get work done.

“I think those guys are there to make forward progress on both the infrastructure and on the value of the work,” said Optimizely’s Noronha. “Feature flags are often the first step in just getting your infrastructure under control. You fence off certain areas of your products where you can make progress on them independently. One customer [called this] ‘containing the blast radius of new changes,’ which I really liked.”

Feature flags also allow developers to change the narrative about their own success metrics. Instead of telling the CEO they built 30 new features last quarter, they can show how much the new features increased the value delivered to the customer, which is what the CEO cares more about. Demonstrating positive business impact as verified by data also tends to lower the barriers to future funding.

Importantly, feature flagging enables companies to test small changes and detect if they’re impacting the metrics that the organization cares most about.

“A lot of the kind of large enterprise companies are all moving in that direction,” said Split’s Harpur. ” It’s thinking about that transformation of organizations turning into digital companies and then turning into experimentation companies.”

Experimentation allows organizations to move faster and invest in the right things. The overarching benefit is staying ahead of the curve.

“To be data-driven is to let your [users] tell you what they like rather than having the highest paid person tell you what you should be doing,” said Lizzie Eardley, senior data scientist at Split.io.

Where canary and blue-green releases fit in
Feature flagging can be done in place of canary and blue-green releases or they can be implemented as complementary practices, depending on the goal.

In a canary release, a change is pushed out to a small group of users to see how it performs. If it performs well, then it’s rolled out to more users. Feature flagging allows a more precise selection of users, down to individual users.

“Typically, what happens in a canary release is that particular code is put out there and anybody can get access to it. It’s just that it’s only available on a certain set of systems and then you can measure whether or not it’s ready for further deployment,” said Forrester’s Condo. “If you’re putting out a brand new product and during that canary release you only have alpha users and then maybe beta users and then you decide it’s actually performing well, let’s spread it out. If it’s just a microservice that’s been updated, a small piece of your website that’s changed or has a new feature, maybe use a feature flag and measure the impact. I think there’s the right tool for the right situation.”

Blue-green deployments involve identical hardware environments, one of which runs live software while the other remains idle. New code is deployed to the idle environment and tested. Users are then switched over to the formerly idle servers that now run the updated software, and the process continues.

“With blue green-deployments you can flip back and forth between one version and a newly launched version, but it’s just two versions, really, because nobody scales blue-green beyond that,” said LaunchDarkly’s Kodumal. “Feature flagging allows you to do things in a more fine-grained way, so if you have 20 different developers committing and releasing at the same time you have the granularity to say these things are not risky, so let’s turn them on and those other things are risky, so let’s turn them off. And at deploy time, with blue-green, it’s still a very binary decision: either all of the new code is being deployed or not.”

With feature flagging, the level of granularity can be as fine as a code path, so it’s possible to decide whether an individual code path should be executed.

On the other hand, not everything needs to be feature-flagged. For example, a simple bug fix may not warrant feature flag overhead. If it’s an infrastructure configuration change, a blue-green release may make more sense than a feature flag.

Another type of testing that tends to be supported by feature experimentation platforms is A/B testing in which companies are experimenting with two different shopping cart flows, or two different site designs, to determine which is most successful, statistically speaking.

“Feature flagging and A/B testing have gone down parallel tracks,” said Optimizely’s Noronha. “You’ve had development teams implementing feature flags for the purpose of continuous integration and deployment, and product analytics where there’s been this evolution of A/B testing from being something that just the biggest tech companies do to something that’s much more mainstream. Those practices have converged into something that a combined product and engineering team does to monitor their progress.”

In short, feature experimentation isn’t one thing. There are different ways to experiment, each of which has its benefits and drawbacks, depending on the use case.

Using feature flags for entitlements
Feature flags can be used as a means of controlling access rights based on a subscription. Instead of having huge buckets into which customers fall, such as Basic, Professional, or Enterprise product levels, feature flagging can allow individually customized products.

“You’re seeing companies take feature flagging one step further,” said Forrester’s Condo. “They’re saying, ‘Hey, instead of managing multiple levels of licenses and people having to install keys or do these complicated setups, we can simply put them in different demographic [categories] and turn features on or off or give them the ability to turn features on or off and let them decide what level of subscription they should have.”

LaunchDarkly uses its own platform itself for many things, including changing the rate limits of its API based on the customer or characteristics of their traffic. That way, the company can customize the flow based on an individual customer’s requirements.

“We can impact not only the way people develop software, but how businesses run their software because the cost of customization is lower,” said LaunchDarkly’s Kodumal. “Being able to release specific versions to specific customers is incredibly powerful.”

The post Waving the flag for feature experimentation appeared first on SD Times.

SD Times GitHub project of the week: Wasabi

Madison Moore — Fri, 07 Apr 2017 13:00:15 +0000

It’s tax season, which means you might be utilizing some of financial technology provider Intuit’s products this month. Intuit doesn’t just offer tax and financial products; it also offers developers and teams a data-driven, flexible and scalable open-source platform: Wasabi.

Wasabi A/B testing service is an API-driven project which lets users own their data while running experiments across the web, mobile and desktop. It’s actually tested in production at Intuit, where the company uses it as the experimentation platform for TurboTax, QuickBooks, Mint.com, and other Intuit offerings.

It’s packed with features, and at this time, only Mac OS X is supported.

To start, Wasabi runs on servers in the cloud and on-premise, so users have complete control over their data. It’s 100% API-driven, according to Intuit, which means the Wasabi REST API is compatible with any language and environment.

Developers will also be able to assign users into experiments in real time, as a way to preserve traffic for other parallel A/B tests. There’s also well-defined interfaces for plugging in your own access control, sending data to data pipelines, and providing fully custom bucket allocations, according to Intuit’s GitHub page.

Additionally, developers can spin up a Wasabi Docker instance in five minutes and it will be in production with the platform, instrumentation and experiments within a day, reads the GitHub page.

Developers can get started with a complete Wasabi stack by checking out the documentation on Intuit’s GitHub page here.

Top 5 trending projects on GitHub this week:
#1. Interactive Coding Challenges: This project received a huge update; now it includes Interactive Python coding interview challenges, with algorithms and Anki flashcards.
#2. Reactide: Reactide is the first dedicated IDE for React web application development.
#3. Bash Guide: A guide to learn Bash.
#4. FreeCodeCamp: Last week it was in the fifth spot, and this week it moved up. FreeCodeCamp, the ever-trending GitHub project!
#5. Mastodon: A GNU Social-compatible microblogging server.

The post SD Times GitHub project of the week: Wasabi appeared first on SD Times.

Solving product integration testing challenges as fast as Netflix

Madison Moore — Wed, 06 Jul 2016 18:30:21 +0000

With top-rated shows like “Orange is the New Black” or “House of Cards,” Netflix needs to have a well-versed integration test team to make sure each of its 80 million users are getting a great experience. With such a fast-paced environment, challenges are sure to come to the surface.

Netflix’s product engineering integration test team recently looked at three major challenges that it has encountered while ensuring quality experiences for Netflix users. These challenges included testing and monitoring for “High Impact Titles (HITs),” A/B testing, and global launches.

(Related: Automation is critical for testing)

HITs pose a problem for the test team because they are the shows that have the highest visibility and need to be tested extensively. (For instance, “Orange is the New Black” drew 6.7 million viewers in the first 72 hours it was online.) This means the team will be testing weeks before and long after the launch date of a HIT to make sure the platform’s running smoothly.

Netflix has two phases with different strategies in order to make sure HITs deliver a good member experience. The first strategy starts before the title launches. Complex test cases are created to “verify that the right kind of titles are promoted to the right member profiles,” according to a Netflix tech blog post. Automated tests won’t work here since the “system is in flux,” so most of the testing during this phase is manual.

The testing doesn’t end here. Netflix engineers will write tests that check to see if the title continues to find its correct audience organically. Netflix said it has 600 hours of original programming in addition to all of its licensed content, which means manual testing in this phase is just not enough.

“Once the title is launched, there are generic assumptions we can make about it, because data and promotional logic for that title will not change—e.g. number of episodes > 0 for TV shows, title is searchable (for both movies and TV shows), etc.,” according to the Netflix tech team. “This enables us to use automation to continuously monitor them and check if features related to every title continue to work correctly.”

Netflix also has a variety of A/B tests running, and a major challenge with adding end-to-end automation for its A/B tests is the variety of components that needed to be automated, according to the tech team. To solve this, Netflix decided to implement its automation by accessing microservices through their REST endpoints.

With this approach, the team was able to obtain test runtimes within a range of four to 90 seconds. With Java-based automation, they estimated a median runtime to have taken between five and six minutes, according to their blog post.

Another challenge for Netflix was when it had a simultaneous launch in 130 countries in 2015. Originally, Netflix designed to run the test code in a loop for each country. This made each test log significantly larger, which would make it difficult to investigate failures.

The team decided to use the Jenkins Matrix plug-in to parallelize its tests. Now, tests in each country would run in parallel. Additionally, Netflix used an opt-in model so the team could write automated tests that are global-ready. Currently, automation is running globally for Netflix, and it covers all high-priority integration test classes, including HITs in the regions where a title is available, according to the blog post.

For the future, Netflix’s automation projects in its road map include workflow-based tests, alert integration, and chaos integration.

The post Solving product integration testing challenges as fast as Netflix appeared first on SD Times.

Guest View: How to launch features like Facebook

Nancy Hua — Thu, 26 May 2016 20:00:15 +0000

With more than 1.5 billion monthly active users (MAUs), Facebook has a lot of factors to take into account for each mobile release. If they followed mobile “best practices,” they’d have some pretty significant problems:

Deploying to millions of users in one fell swoop (greatly increasing the risk of widespread crashes)
Asking users to update their apps nearly every week
Sending unproven features that could tip their KPIs either direction

The goal is to mitigate risk
Different users have different levels of tolerance for experimental features and the inevitable bugs. While users closer to the development of your product such as your coworkers or beta testers may be tolerant of bugs or experimental features, end users are not.

Facebook utilizes their internal feature-flagging tool called Gatekeeper to push out new features and ensure that they’re driving the right KPIs (and not causing crashes). The code looks like this:

This code combined with Gatekeeper allows Facebook to toggle features on and off for any segment of mobile users. They’re able to deploy a new feature to only Zuckerberg’s personal phone, users in Pakistan, or a small percentage of users in California.

This also allows them to break down the release process to gain the maximum amount of qualitative and quantitative data possible, before they roll out to their entire user base. To do so they utilize mobile feature flags to do:

Dogfooding
Beta Testing
A/B Testing
Field Testing and Staged Rollouts
Conditional Logic

Let’s break it down.

Dogfooding
“Fail early. Fail often. But fail internally.” — Jason Mark, Fast Company

Dogfooding refers to the process in which a company uses its own product for testing purposes (i.e. eating your own dog food). This process allows them to uncover bugs or usability issues well before any end user sees the product.

Rather than giving employees test devices that they may or may not use on a regular basis, Facebook does something special: They forcibly update employees to early test versions of their apps. Using Gatekeeper, they’re able to enable features only to users who are identified as Facebook employees.

(Related: Putting testing back into DevOps)

Using feature flags for internal testing opens channels for immediate and honest feedback. Employees are quick to share their thoughts and gripes, helping the company validate new changes early on.

The team at Facebook has even built a specific function called “Rage Shake” to help employees report device states during crashes. As the name implies, employees can violently shake their phones to send the device state to Facebook’s QA team.

Beta testing
Moving on, Facebook conducts beta tests to validate it on users with less of a personal stake. With mobile beta testing, users have to opt in, and they tend to be the innovator/early adopter crowd who want to test out the cutting edge. This also means they have a bit of a higher tolerance for bugs and crashes than your typical end user.

While both iOS and Android have their own systems, they’re slow, clunky and hard to use. iOS also comes with a major limitation: a cap of 2,000 beta testers. For Facebook, that’s only 0.00012% of their user base. “This is not enough testers to validate apps broadly at scale,” said Christian Legnitto, a former Facebook engineer. Using feature flagging allows Facebook to surpass this limitation and add users seamlessly by user IDs.

A/B testing
To get quantitative data on how users will behave when shown the new changes, Facebook uses an internal tool called Airlock in conjunction with Gatekeeper that allows them to allocate users for an A/B test.

This allows them to get a different type of feedback: quantitative data on how users are reacting to a new change and whether it’ll help them achieve their goals.

Field testing and staged rollouts
“We often test new experiences with a small percentage of the global community,” an Instagram spokesman told The Verge.

The last stage before a full release is to run field tests, which are experiments conducted under actual use conditions (as opposed to within a lab). In this case, it means that Facebook is deploying its new features to select groups, such as when they rolled out new mobile profiles to users in the U.K. and California, how they did a staged rollout over six months with Timeline, or how they’re testing a new UI for Instagram.

Image via the Verge

Feature flags for Facebook
While most mobile teams deploy to hundreds of thousands of users in one fell swoop, Facebook has done a superb job of creating a methodology that allows them to release validated and tested features out to their users. With their amalgam of dogfooding, beta testing, A/B testing and field tests, they’re able to meticulously segment and glean qualitative and quantitative data from different personas on the tolerance scale.

If mobile is to advance as an industry, we need to stop waiting on Apple to make the necessary changes to the App Store and start taking back the control. By either building or buying A/B testing and feature flagging solutions for mobile, more teams will be able to closely monitor deployment, validate features early on, and release on their own terms.

The post Guest View: How to launch features like Facebook appeared first on SD Times.