In my last blog post I described the characteristics of good Acceptance tests and how I tend to use a Domain Specific Language based approach to defining a language in which to specify my Acceptance Test cases. This time I’d like to describe each of these desirable characteristics in a bit more detail and to reinforce why DSL’s help.
To refresh you memory here are the desirable characteristics:
- Relevance – A good Acceptance test should assert behaviour from the perspective of some user of the System Under Test (SUT).
- Reliability / Repeatability – The test should give consistent, repeatable results.
- Isolation – The test should work in isolation and not depend, or be affected by, the results of other tests or other systems.
- Ease of Development – We want to create lots of tests, so they should be as easy as possible to write.
- Ease of Maintenance – When we change code that breaks tests, we want to home in on the problem and fix it quickly.
A good Acceptance Test will assert some specific, high-level function of the system. It is not the job of Acceptance Tests to exercise nearly every line of code, that is the job of unit tests. Acceptance Tests are expensive to run, and to maintain, and so should not be used to explore every corner-case. Instead they should concentrate on the key use-cases that represent the value that the system delivers to its users.
This focus on the value that the system delivers is a key feature of acceptance testing. These tests should be whole system tests, written from the perspective of an external user of the system.
Building a suite of effective, relevant Acceptance Tests is best achieved by making it a part of the process, not an after-thought. This is the role of acceptance tests as “Executable Specifications” of the behaviour of the system. When we first begin work on a new Story, we should write a collection of Acceptance Tests that assert that the value of the Story is met. A good strategy for this is to define a series of “Acceptance Criteria” for each story and make it a team rule, that you aren’t finished until there is at least one Acceptance Test for each Acceptance Criteria. This approach quite quickly builds and effective suite of tests.
If you are working in a legacy environment, code that was written without such tests initially, then code a few defensive tests that assert the principle, high-value, use-cases of your system. Then adopt the “at least one test per Acceptance Criteria” strategy for all new work.
Another important process point is that while it should be ok for anyone to write an Acceptance Test, developers own their maintenance. Developers will write code that breaks the tests and so they MUST own the responsibility to fix them when they do. It is a nasty anti-pattern to leave the maintenance of Acceptance Tests to the “Test team” – don’t do it.
For tests to be both reliable and repeatable they must control the state of the SUT. It is not good enough to take a cut of production data and run a series of tests on that data. You need the test to have full control so that it can put the system into some desired state before it begins making assertions. Another common anti-pattern is test suites that assume that tests will be run in a certain order and where a later test depend upon the successful completion of an earlier test. This is a complex, unmaintainable mess and I have never seen it work, although sadly I have seen lots of teams try – don’t do that either!
Your DSL should have simple ways of establishing common starting points for your system: Defining a product catalogue so that you can place orders; Creating users so that you can login and do things; Initialising an Order-Book in a trading system so that you can test trading; Moving your Game to level 17, and provisioning your troopers so that you can test killing the alien king!
Ideally these common set-the-scene activities should be represented as economically as possible in your DSL so that you can get the system into the desired state with a simple one-line statement.
On a trading system that I worked on, we could create a new OrderBook and initialise it with a simple DSL statement rather like this:
trading.createOrderBook("name: testBook", "bids: 10@20, 11@21, 12@22", "offers: 10@30, 11@31, 12@32");
Behind the scenes of this one-liner there is A LOT going on! We made a call to create the financial instrument on which the OrderBook would be based. The DSL provide lots of sensible default values for this instrument, though if we really cared we could override these defaults to specify something more specific. Once the Instrument was created the OrderBook on it was opened, and then we would parse the bids and offers and place a series of Orders in the market, resulting in an OrderBook in our desired state. Each of these operations was available from the DSL too. So we had powerful flexible control that was compose-able and allowed to us create these higher-level mechanisms. None of this is rocket science, in fact it is pretty simple, providing that you get the level of abstraction right in your DSL.
If you want your tests to be repeatable as well as reliable, then when we created our OrderBook, it would be no good creating an OrderBook called ‘testBook’ in the SUT. If we decided to re-run our test, the second invocation would find that an OrderBook called ‘testBook’ already existed! In addition that testBook is now populated with whatever state your last test left it in when it finished – no good at all. So the DSL uses aliasing. The DSL uses ‘testBook’ as the identifier for the OrderBook within the scope of a test, but maps that to a generated unique name that will be regenerated each time your run the test. So the Instrument and OrderBook, in the example above, were probably called something like ‘instr12937987987817253’ and ‘book872348729384792834’ in the SUT.
If you want to test the significant behaviours of your system, and so control it’s state in a deterministic way, a common problem is time. If your code is getting its time from the system clock you have no control. How will you test daylight-saving time changes, or long running multi-hour or multi-day processes? The answer is simple, fake time, at least from the perspective of your system. I wrote a blog post on managing time in unit tests on my personal blog here. For acceptance tests it is a little more complex. If your system is message based, it is simple. Instead of using the system clock as a source of time, use messages to pulse time out. Then replace the production time source with one that is under the control of your Acceptance Test infrastructure, add some ‘fastForward’ messages to your DSL and you can start building time-travel tests. If your system is not message based, add some back-door, maybe a multi-cast, simple, messaging protocol? Whatever it takes, take time under control and you gain a great deal of power in your ability to assert the state of your system.
Acceptance Tests are expensive things. They take a long time to start and are slow to run, at least in comparison with unit tests. There is little that you can do about the ‘slow-to run’ problem other than speed-up your system – which may be no bad thing. As for being slow to start, the best way to tackle that is to minimise the number of times that you start the system. For small simple systems it may be acceptable, though inefficient, to start the system as part of the setup for each test and then shut it down afterwards. For real systems though, this very quickly becomes untenable.
I think that it is generally a good strategy to separate deployment from testing. We want to test our deployment mechanisms, so that when we are ready to release we know that it will work. We want our Acceptance Tests to take place in a production-like test environment. So this is a good chance for us to rehearse our deployment in our production-like test environment.
So step one in acceptance testing is to deploy the system and start it running using the same deployment tools that we will use to release our software into production. We then run all of our acceptance tests and then shut the system down at the conclusion of the Acceptance test stage.
So now we only incur the start-up, and shutdown costs once. This is much more efficient, the only downside is that our tests must be isolated from one-another to prevent a later test’s expectations of the state of the system being compromised by an earlier one. The best strategy for this is to use the natural partitioning of your system to isolate the tests from one another. Most multi-user systems will have natural ideas that isolate the work of one user from another. A user account is an obvious starting point, but generally there will be other shared resources that you can use too. For example in the trading systems that I have worked on recently, we often use user-account and OrderBook to provide us with our desired isolation. So every test usually begins by creating a new account and a new OrderBook. A good friend of mine jokes that all commercial systems are just based on “People, Stuff and Deals”. So use the People and the Stuff to isolate your tests which will generally be more interested in the Deals 😉
In my experience this has been simple for any multi-user system that I have come across. I can imagine that if you are building a single-user system, perhaps a mobile app, then finding your isolation concepts may be a bit trickier. However, this is such a useful and powerful idea that I think that it is worth expending a bit of ingenuity to see if you can find such an idea in you domain model.
If you follow my advice and start-up your system once for all Acceptance Tests, then isolation is essential even if you are running your tests sequentially. The nice thing though is that if your tests are isolated, at least in a multi-user system, you can now parallelise them. So your cycle-time is dependant only on the duration of your slowest test and the depth of your test-hardware budget.
There is another aspect of test isolation that is important in retaining the ability to define the state of your SUT, external systems. Where your system communicates with an external system beyond your immediate control it is important to isolate your SUT from that external system. My preferred approach is to provide stubs which are under the control of your test infrastructure, again via your DSL. Ideally these stubs should sit just outside the boundaries of you SUT. So that your system communicates with the stubs using precisely the same mechanism when running tests as when running in production. If your system talks to an external system via a REST api over HTTP, run a web-server to host your stub and implement as much of the REST api as you care about. If it talks to a system via multi-cast messaging or TCP/IP make your stub respond to those protocols. To achieve this you will need to implement a back-channel so that your DSL can communicate with the stubs so that it can collect results, and insert inputs.
It is a mistake to include systems outside of your control within the scope of your testing. Limit the scope of your SUT to the code that you are responsible for developing and maintaining. If you are concerned that the third-party end of the interface between you and them may change, implement some tests that assert your understanding of that interface, but these are not Acceptance Tests, they are technical integration tests, something else entirely.
Ease of Development
Any test is really about providing some inputs, collecting some outputs and verifying that for the given inputs the outputs are those that you expect. So our tests need to be able to interact with our system to provide it with inputs and collect useful outputs. Too many testing tools conflate the ideas of writing test cases and defining the technical interacts with your system. This takes two tricky problems and makes you think about both at the same time.
When I am writing Acceptance Test cases I want to be thinking of only one thing, “what is the objective of my test?”. I hope that you have already seen several reasons why I like the DSL approach. For me though, this is the killer feature. An effective DSL allows me to worry only about the state that I want to put the system in, the interactions that I want to have with the system and the state that I expect it to be in afterwards – the classic “Given, When, Then” of BDD.
If I have designed my DSL well, the I should be able to express my desired “Given, When, Then” in the language of the problem domain with absolutely no reference to the technicalities of how that desire will be expressed in terms of system interaction. Lower layers of my DSL implementation will concern themselves with the technicalities. This separation of concerns, alongside ideas like aliasing, is at the heart of an effective Acceptance Test DSL. This is not really a technology problem. I have seen implementation of this pattern implemented in Java, Python, Ruby and running on top of the Fitness testing framework. This is a design approach not a technology.
This may sound daunting, but it is really not difficult. You can write the bare bones of such a DSL in a day or two, maybe less. Enough to get started with some simple tests. You can then evolve the DSL itself as you add more tests. I like the strategy of writing the test case I want, making up any new language that I need to achieve that and then implementing any new additions to the language that the test requires. All the time being careful to retain an appropriate level of abstraction and working hard to ensure that no element of the technicalities of the communication with the SUT creeps into the language of the tests.
Ease of Maintenance
Maintaining such tests is largely about ensuring that any breakages that are introduced by changes to the SUT are easy to identify and simple to fix. The DSL again assists enormously in both of these efforts. By keeping the language of the test as close as possible to the language of the problem domain we have automatically created an ‘air-gap’ between the SUT and the test cases that we define in our DSL. If we define a test to select a product from a catalog, place and order for that product and confirm that we are billed appropriately. This is pretty abstract stuff, unless the business changes fairly radically, it is likely that this test will still make sense whatever the detail of our implementation. So the SUT can change a lot and while the test may become invalidated because one of the drivers that translate the DSL to concrete interactions is broken, this driver will usually be shared by many tests so by fixing the driver will fix a whole load of tests.
We can again use the DSL to help us gain good insight into test failures. At the level of our DSL we can report errors, again in the language of the problem domain. translating failures into clear descriptions of what went wrong. We can decorate this information with lower-level detail of the interactions that took place at the driver-level helping to identify the root cause of the problem.
Over time your DSL and test infrastructure will grow to become big and complex, but the steps to get there are small and simple. The value of such a testing system is hard to over-estimate. It is a liberating thing when you can make any change to your code and trust that you testing will highlight any likely failure. It makes development go very fast, despite the investment in testing.
There is more on Acceptance Testing in my book, Continuous Delivery.