Wednesday, July 30, 2014

The Data Driven Quality Mindset

"Success is not delivering a feature; success is learning how to solve the customer's problem." - Mark Cook, VP of Products at Kodak

I've talked recently about the 4th wave of testing called Data Driven Quality (DDQ). I also elucidated what I believe are the technical prerequisites to achieving DDQ. Getting a fast delivery/rollback system and a telemetry system is not sufficient to achieve the data driven lifestyle. It requires a fundamentally different way of thinking. This is what I call the Data Driven Quality Mindset.

Data driven quality turns on its head much of the value system which is effective in the previous waves of software quality. The data driven quality mindset is about matching form to function. It requires the acceptance of a different risk curve. It requires a new set of metrics. It is about listening, not asserting. Data driven quality is based on embracing failure instead of fearing it. And finally, it is about impact, not shipping.

Quality is the matching of form to function. It is about jobs to be done and the suitability of an object to accomplish those jobs. Traditional testing operates from a view that quality is equivalent to correctness. Verifying correctness is a huge job. It is a combinatorial explosion of potential test cases, all of which must be run to be sure of quality. Data driven quality throws out this notion. It says that correctness is not an aspect of quality. The only thing that matters is whether the software accomplishes the task at hand in an efficient manner. This reduces the test matrix considerably. Instead of testing each possible path through the software, it becomes necessary to test only those paths a user will take. Data tells us which paths these are. The test matrix then drops from something like O(2n) to closer to O(m) where n is the number of branches in the code and m is the number of action sequences a user will take. Data driven testers must give up the futile task of comprehensive testing in favor of focusing on the golden paths a user will take through the software. If a tree falls in the forest and no one is there to hear it, does it make a noise? Does it matter? Likewise with a bug down a path no user will follow.

Success in a data driven quality world demands a different risk curve than the old world. Big up front testing assumes that the cost to fix an issue rises exponentially the further along the process we get. Everyone has seen a chart like the following:


In the world of boxed software, this is true. Most decisions are made early in the process. Changing these decisions late is expensive. Because testing is cumulative and exhaustive, a bug fix late requires re-running a lot of tests which is also expensive. Fixing an issue after release is even more expensive. The massive regression suites have to be run and even then there is little self hosting so the risks are magnified.

Data driven quality changes the dynamics and thus changes the cost curve. This in turn changes the amount of risk appropriate to take at any given time. When a late fix is very expensive, it is imperative to find the issues early, but finding issues early is expensive. When making a fix is quick and cheap, the value in finding a fix early is not high. It is better to lazy-eval the issues. Wait until they become manifested in the real world before a fix is made. In this way, many latent issues will never need to be fixed. The cost of finding issues late may be lower because broad user testing is much cheaper than paid test engineers. It is also more comprehensive and representative of the real world.

Traditional testers refuse to ship anything without exhaustive testing up front. It is the only way to be reasonable sure the product will not have expensive issues later. Data driven quality encourages shipping with minimum viable quality and then fixing issues as they arise. This means foregoing most of the up front testing. It means giving up the security blanket of a comprehensive test pass.

Big up front testing is metrics-driven. It just uses different metrics than data driven quality. The metrics for success in traditional testing are things like pass rates, bug counts, and code coverage. None of these are important in data driven quality world. Pass rates do not indicate quality. This is potentially a whole post by itself, but for now it suffices to say that pass rates are arbitrary. Not all test cases are of equal importance. Additionally, test cases can be factored at many levels. A large number of failing unimportant cases can cause a pass rate to drop precipitously without lowering product quality. Likewise, a large number of passing unimportant cases can overwhelm a single failing important one.

Perhaps bug counts are a better metric. In fact, they are, but they are not sufficiently better. If quality if the fit of form and function, bugs that do not indicate this fit obscure the view of true quality. Latent issues can come to dominate the counts and render invisible those bugs that truly indicate user happiness. Every failing test case may cause a bug to be filed, whether it is an important indicator of the user experience or not. These in turn take up large amounts of investigation and triage time, not to mention time to fix them. In the end, fixing latent issues does not appreciably improve the experience of the end user. It is merely an onanistic exercise.

Code coverage, likewise, says little about code quality. The testing process in Windows Vista stressed high code coverage and yet the quality experienced by users suffered greatly. Code coverage can be useful to find areas that have not been probed, but coverage of an area says nothing about the quality of the code or the experience. Rather than code coverage, user path coverage is a better metric. What are the paths a user will take through the software? Do they work appropriately?

Metrics in data driven quality must reflect what users do with the software and how well they are able to accomplish those tasks. They can be as simple as a few key performance indicators (KPIs). A search engine might measure only repeat use. A storefront might measure only sales numbers. They could be finer grained. What percentage of users are using this feature? Are they getting to the end? If so, how quickly are they doing so? How many resources (memory, cpu, battery, etc.) are they using in doing so? These kind of metrics can be optimized for. Improving them appreciably improves the experience of the user and thus their engagement with the software.

There is a term called HiPPO (highest paid person's opinion) that describes how decisions are too often made on software projects. Someone asserts that users want to have a particular feature. Someone else may disagree. Assertions are bandied about. In the end the tie is usually broken by the highest ranking person present. This applies to bug fixes as well as features. Test finds a bug and argues that it should be fixed. Dev may disagree. Assertions are exchanged. Whether the bug is ultimately fixed or not comes down to the opinion of the relevant manager. Very rarely is the correctness of the decision ever verified. Decisions are made by gut, not data.

In data driven quality, quality decisions must be made with data. Opinions and assertions do not matter. If an issue is in doubt, run an experiment. If adding a feature or fixing a bug improves the KPI, it should be accepted. If it does not, it should be rejected. If the data is not available, sufficient instrumentation should be added and an experiment designed to tease out the data. If the KPIs are correct, there can be no arguing with the results. It is no longer about the HiPPO. Even managers must concede to data.

It is important to note that the data is often counter-intuitive. Many times things that would seem obvious turn out not to work and things that seem irrelevant are important. Always run experiments and always listen to them.

Data driven quality requires taking risks. I covered this in my post on Try.Fail.Learn.Improve. Data driven quality is about being agile. About responding to events as they happen. In theory, reality and theory are the same. In reality, they are different. Because of this, it is important to take an empiricist view. Try things. See what works. Follow the bread crumbs wherever they lead. Data driven quality provides tools for experimentation. Use them. Embrace them.

Management must support this effort. If people are punished for failure, they will become risk averse. If they are risk averse, they will not try new things. Without trying new things, progress will grind to a halt. Embrace failure. Managers should encourage their teams to fail fast and fail early. This means supporting those who fail and rewarding attempts, not success.

Finally, data driven quality requires a change in the very nature of what is rewarded. Traditional software processes reward shipping. This is bad. Shipping something users do not want is of no value. In fact, it is arguably of negative value because it complicates the user experience and it adds to the maintenance burden of the software. Instead of rewarding shipping, managers in a data driven quality model must reward impact. Reward the team (not individuals) for improving the KPIs and other metrics. These are, after all, what people use the software for and thus what the company is paid for.

Team is the important denominator here. Individuals will be taking risks which may or may not pay off. One individual may not be able to conduct sufficient experiments to stumble across success. A team should be able to. Rewards at the individual level will distort behavior and reward luck more than proper behavior.

The data driven quality culture is radically different from the big up front testing culture. As Clayton Christensen points out in his books, the values of the organization can impede adoption of a new system. It is important to explicitly adopt not just new processes, but new values. Changing values is never a fast process. The transition may take a while. Don't give up. Instead, learn from failure and improve.

1 comment:

  1. I think these phrases in your post are extremely controversial. "Data driven quality encourages shipping with minimum viable quality and then fixing issues as they arise. This means foregoing most of the up front testing. It means giving up the security blanket of a comprehensive test pass." Other gurus of DDB disagree. DDQ is about taking the information you gather from production and improving a pre-production test pass. See