It seems obvious that test pass rates are important. The higher the pass rate, the better quality the product. The lower the pass rate, the more known issues there are and the worse the quality of the product. It then follows that teams should drive their pass rates to be high. I’ve shipped many products where the exit criteria included some specified pass rate—usually 95% passing or higher. For most of my career I agreed with that logic. I was wrong. I have come to understand that pass rates are irrelevant. Pass rates don’t tell you the true state of the product. It is important which bugs remain in the product, but pass rates don’t actually show this.
The typical argument for pass rates is that it represents the quality of the product. This makes the assumption that the tests represent the ideal product. If they all passed, the product would be error-free (or free enough). Each case is then an important aspect of this ideal state and any deviation from 100% pass is a failure to achieve the ideal. This isn’t true though. How many times have you shipped a product with 100% passing tests? Why? You probably rationalized that certain failures were not important. You were probably right. Not every case represents this ideal state. Consider a test that calls a COM API and checks the return result. Assuming you pass in a bar argument and the return result is E_FAIL. Is that a pass? Perhaps. A lot of testers would fail this because it wasn’t E_INVALIDARG. Fair point. It should be. Would you stop the product from shipping because of this though? Perhaps not. The reality is that not all cases are important. Not all cases represent whether the product is ready to ship or not.
Another argument is that 100% passing is a bright line that is easy to see. Anything less is hard to see. Did we have 871 or 872 passing tests yesterday? If it was 871 and today is 871, are they the same 129 failures? Determining this can be hard and it’s a good way to miss a bug. It is easy to remember that everything passed yesterday and no bugs are hiding in the 0 failures. I’ve made this argument. It is true as far as it goes, but it only matters if we use humans to interpret the results. Today we can use programs to analyze the failures automatically and to compare the results from today to those from yesterday.
As soon as the line is not 100% passing, rates do not matter. There is no inherent difference in the quality of a product with 99% passing tests and the quality of a product with 80% passing tests. “Really?“ you say. “Isn’t there a difference of 18%? That’s a lot of test cases.” Yes, that is true. But how valuable are those cases? Imagine a test suite with 100 test cases, only one of which touches on some core functionality. If that case fails, you have a 99% passing rate. You also don’t have a product that should ship. On the other hand, imagine a test suite for the same software with 1000 cases. Imagine that the testers were much more zealous and coded 200 cases that intersected that one bug. Perhaps it was in some activation code. These two pass rates then represent the exact same situation. The pass rate does not correlate with quality. Likewise one could imagine a test case of 1000 cases where 200 were bugs in the tests. That is an 80% pass rate and a shippable product.
The critical takeaway is that bugs matter, not tests. Failing tests represent bugs, but not equally. There is no way to determine, from a pass rate, how important the failures are. Are they the “wrong return result” sort or the “your api won’t activate” sort? You would hold the product for the 2nd, but not the first. Tests pass/fail rates do not provide the critical context about what is failing and without the context, it cannot be known whether the product should ship or not. Test cases are a means to an end. They are not the end in themselves. Test cases are merely a way to reveal the defects in a product. After they do so, their utility is gone. The defects (bugs) become the critical information. Rather than worrying about pass rates, it is better to worry about how many critical bugs are left. When all of the critical bugs are fixed, it is time to ship the product whether the pass rate is high or low.
All that being said, there is some utility in driving up pass rates. Failing cases can mask real failures. Much like code coverage, the absolute pass rate does not matter, but the act of driving the pass rate up can yield benefits.
Indeed. Too many people forget that the reason why we test software is to gain knowledge about the state of the product. The purpose of testing isn't to find bugs or even to raise the quality of the product, it's to obtain information about the product. What we do with that information can affect the quality of a product but in and of itself testing has no bearing on product quality.
ReplyDeleteWith that frame of mind pass rates are only of interest if, as you say, failures are masking the ability to obtain accurate information about the product (i.e. preventing other tests from running).
"Failing cases can mask real failures"
ReplyDeleteI love that sentence.