Friday, March 14, 2008

The soap opera test antipattern

If you are coming from a romantic programmer attitude, or simply didn't care about testing your code, then every single line of tests code is valuable and adds some stability to your system.

After a while, anyway, the testing code mass could increase significantly and become problematic if not correctly managed. I've pointed you to the Coplien vs Martin video in my previous post. Now I won't claim that I've found the solution of the issue, but some thoughts on the topic might be worth sharing.

Starting to test

When embracing TDD or test first, or – less ambitiously – when starting to use xUnit frameworks for testing, you simply have to start from somewhere. You choose the target class or component, define the test goal and code your test using assertions to check the result. If the light is green then the code is fine, if it's red… well, you have a problem. You solve the problem, refactor the solution to make it better, in a green-to-green transition, then move to the next feature, or the next test (which will be the same thing, if you are a TDD purist).

Every test adds stability and confidence to your code base, so it should be a good thing. Unfortunately, when the test code mass reaches a certain weight it starts making refactoring harder, because it looks like extra code to be affected in a refactoring process, making refactoring estimations more pessimistic, and the whole application less flexible.

Why does this happen? I suspect testing skills tend to be a little underestimated. JUnit examples are pretty simple, and some urban legends (like "JUnit is only for unit tests") are misleading. Testing somehow is a lot better that not testing at all. Put it all together in a large scale project and you're stuck.

The soap opera test antipattern

The most typical symptom of this situation is what I call the soap-opera test: a test that looks like an endless script.

@test
public void testSomething() {
// create object A

// do something with this A

// assert something about A

// do something else with A

// assert something about A

// create object B

// assert something about B

// do something with B

// assert something about B

// do something with B and A

// assert something about B and A

}

The main reason why I named this one "soap opera" is straightforward: there is no clear plot, there are many characters whose role is unclear, things are happening slowly, and conversations are filled with a lot of "do you really mean what you said?" and there is no defined end. The second reasons is that I always dreamed to name a pattern, or an antipattern… somehow.

Even if I was too lazy (or sensible) to put some real code in there, some issues are pretty evident:

  • Test looks like a long script;
  • if you're lucky, the purpose of the test is in the method name or in the javadoc, assertions are too many to make the test readable or to make out the purpose by simply reading the code;
  • I bet a beer that 90% of the lines you have on a test like this are simply cut&paste from another test in the same class (if this is the only test you have in your system the bet is not valid);
  • The test can get red for too many reasons;
  • Really looks like the inertial test code mass mentioned before.

What's the point in "looks like a long script"? My opinion is simply that it doesn't have to look like that! A good test has a well defined structure which is

  1. Set up
  2. Declare the expected results
  3. Exercise the unit under test
  4. Get the actual results
  5. Assert that the actual results match the expected results

I grabbed the list from here, the original article talks about many JUnit antipatterns (but calls the soap opera antipattern "the overly complex test" which is a lot less glamorous). Setting up can't be accomplished completely by the setUp() method, cause some preparation is obviously test-specific. Steps 3 and 4 often overlap especially if you're testing a function. But the whole point is that this definitely is a structure, while a script is something less formed.

Multiplying the asserts has a thrilling effect: when something goes wrong all of your test start getting red. In theory a test should test one and only one feature. There are obviously dependent features, but a well formed test suite will help you a lot in problem determination by pointing right to the root cause. If the testing code for a feature is duplicated all over the test suite… you just get a lot of red lights but no hint about where the problem is.

Testing against implicit interfaces

Even if you clean up your testing code and refactor to be in one feature/one test situation you'll still experience some inertia, due to testing code. This definitely smells: we were told that unit tests are supposed to help refactoring, allowing us to change the implementation while controlling behavior on the interface. The problem is that we are often doing it only in step 3 of the above list, while we are depending on application implicit interfaces in creation of test objects and sometimes also in asserting correctness of the result. Creating a test object might me a nontrivial process – especially if the application does not provide you with a standard way of doing it, like Factories or the like – and tends to be repeated all over the testing code. If you're depending on a convention, changing it will have probably a heavier impact.

In general, when writing a test, step 3 is very short. Basically just a line of code, depending on the interface you've chosen. Dependencies and coupling sneak in from test preparation and test verification, but you've got to keep it under control to avoid getting stuck by your test code base.


6 comments:

Giulio Cesare Solaroli said...

Great post!

With a single flaw, IMHO, when you say that "[...] a well formed test suite will help you a lot in problem determination by pointing right to the root cause".

In my experience, the more the structure/architecture of tests and of the real code are independent, the better.

This implies that a single change in the real code could trigger multiple tests and raise a lot of flags. It also implies that you don't have an immediate clue on what is going wrong when a light turn red.

But you are free to improve the architecture of your code without having to keep the test suits updated.

Unit tests are not a debugging tool. They are a confidence building tool.

You sometime add some tests just to help you "debug" your code, but usually these tests are very implementation specific. If you need to change the implementation, better throwing away all such tests and eventually write new ones, if this may help.

But I admit that it is not always easy to sort out which are the architecture specific tests (to keep no matter what) and which are the implementation specific one (that can be safely thrown away).

Unknown said...

Hi Giulio,

the number of flag should tell you something about the depth of the error. And if your tests are layered, then they'll help you locate the error. A surface error should raise few flags, a deeper one should raise a lot, but if you look at the test by layer you'll get a clue about where to watch.

Of course, tests are only one tool in play. Debugger is another one. Knowing what you just typed in helps, like checking the SCM log or the continuous integration system, if you have one.

In general, one single error is easy to find. Tons of error are easy as well. Some uncorrelated errors are a more interesting puzzle.

Giulio Cesare Solaroli said...

"the number of flag should tell you something about the depth of the error"

In my experience this has seldom happened. Possibly because I like to keep the arrangement of tests independent from the real code, in order to avoid tautological (aka useless) tests.

Anonymous said...

I'm totally skeptical about the use of automated testing (and a strong believer in human testers) for web based app, actually the soap opera antipattern seems to me to be the very essence of automated tests for such cases. But what matters here is the name you found: its beautiful, beautiful, you surely deserve to baptize it!

Unknown said...

@Giulio

I like "House M.D." and "C.S.I." and this is my default approach in problem determination. I would like also a walking stick, so it's no surprise if I see (or pretend to) "patterns" in the test red flags.

Unknown said...

@Pietro

Testing on the presentation layer is often performed recording user activity, so it's no surprise it'll look like a script. Testing the service layer, or unit testing doesn't have to.

An interesting point made by Coplien in the video was that Testing "imposed" a service layer, making developers lose the focus on the user interaction happening on the GUI which is more related to concepts like usability or user experience that cannot be easily automated.

I am proud of the name, but your subconscious is telling us that you actually like soap operas, and also which one... ;-)