This is the first part in a 3 part blog series on how to write code so you can build testable systems with external dependencies. Link to part 2 is at the end of the post.
There has been some discussion over the last year or so about testing. Some of this discussion has been naive but well meaning. The intent is good, we all want to build reliable and robust systems, but it’s been naive because the chosen path leads to the opposite.
The general thrust of this discussion is about “mocking out external interfaces”. Which, on the surface seems like a sensible thing to do. After all, we want to test our software and we don’t want to have these slow external interfaces (like a database, a message queue, a file system) impact our development. So clearly, the approach is to mock/stub out the calls to AWS/MQ/database in our code and “pow” – tested, reliable software.
What we have now is coupled, fragile software and those tests you spent all that time writing, they’re basically testing “does String.equals() work?”
So, it’s clear that there’s a gulf in both education and experience in this space, so prompted by the recent discussions on Slack, I’ll present to you, a narrative on the design and development of a multi-component software system.
First, we have some context so people can do some reading.
- The assertion that we need “AWS stubs” to test systems that are deployed to AWS
- A wonderful video by Gary Bernhardt about testing which covers a lot of what I’m going to go through Boundaries — Destroy All Software Talks
- A wonderful blog post by Ken Scambler about why mocks and stubs should be avoided. To Kill a Mockingtest | realestate.com.au Tech Blog
Second, I’m going to talk a little bit about testing before launching into some solution design so that people can understand the why part.
Testing is hard. Much harder than people give credit for, and what’s worse, most people think that testing is easier than it is, leading to lots and lots of terrible tests. Mocks and stubs are an indication of a design smell. Ken’s comments in the Slack conversation and his post provide more concrete description of why.
Along with testing being hard, there are different concerns with testing, and this is fundamentally where the big issue occurs. Not all testing is symmetrical, and not all techniques are sensible or desirable.
Within a software system we have 3 main categories of tests.
- Functionality tests – these are tests that ensure that the software WE are writing behaves as WE expect it to. The most notable type of test that are encountered in this space are Unit tests.
- Contract verification – these are fixtures that VALIDATE the components and interfaces that the software validated with unit tests will continue to work. Think of these as a sort of pre-condition. They’re not so much tests as they are contract verification. It just so happens a lot of the testing frameworks that we have in the software ecosystem are very good to be able to build contract verification suites.
- Smoke tests – these are fixtures that VALIDATE the components deployed into an environment are correctly configured, the interfaces are available as expected and they all operate together correctly. This can be a single verification, it can be a sub-set of the contract verification tests, it can be a synthetic transaction e-2-e through the system. So many choices, so many options.
For the purposes of this narrative, I’m going to be only interested in the first 2, they are the general consideration for component design and development. This doesn’t, nor shouldn’t mean that for an operational system the 3rd category isn’t equally, or even more important – it’s just that I’m going to deliberately put them out of scope for now.
Ok, context done. Let’s do some design. Step one is to have a look at the problem we want to solve, and fortunately for us we have a spare one.
The system receives an SQS event which indicate which files in S3 to load and process, the files contain some numbers which we “do math” and the result is written out to S3
Many people would at this point launch into TDD, and while that might seem sensible, I’d always advocate it’s worth some thinking about the problem, and some analysis and preliminary high level design.
30 minutes later, add a small amount of Ken Scambler for sanity and we have the following initial thoughts about how the system design will proceed. Note, this isn’t locked in stone, but when doing TDD, it’s not some random wandering in the dark about where your design will end up – you should be doing science here. Have a hypothesis, and let the code help you work that all out.
We can see the main components, and have identified the basic flows. Nothing too exciting, probably 30mins worth of chatting with Ken. For those interested, he did a “functional style” analysis and having us both work on the design we ended up with substantially the same system components and interaction design. Was a lot of fun. Recommended. Would pair with Ken again.
Now we want to think about one of the most interesting parts of the implementation. How will the use of the data store work with the event queue. Part of the requirements says that the events are only to be removed when the data store items have been successfully processed – so we need some form of signalling between the two. We can couple the two together with some horrible “if” code, and expose the innards of the event queue. Guaranteed this will be hard to test, so we’ll just dependency inject in some processor into the event queue – seems like the best approach. Writing code will test it out, but if you don’t know the direction you’re heading in – you’ll just wander all over the map.
(Note: You’ll see that I’ve put some form of “attach()” method in the interface/contract. This gives me some way of doing “authentication” / “connection” to the external systems. Probably not going to implement in the initial phases, but just a reminder that it’s probably going to be important at some point)
The important part of this is the process(Processor p):boolean method. This enables us to “tell, don’t ask” when processing things on the event queue. For now, we’re only going to get one type of thing on that queue so this is probably the simplest implementation and all that is needed. If there was a bunch of different things on the event queue, I’d probably construct the event queue with some form of Factory that would allow each of the events to be processed, but no need for that now.
The last little bits are pretty similar, and don’t really require any major thought – just simple data sources and sinks.
As stated above, names-not-final. There’s nothing about what I’ve scrawled here that is “forcing” me to do it in this way, and the code may well change my thoughts as I get into it. However, spending the (about) 60 minutes to draw these 4 pictures and talk with Ken gives me confidence I have a robust solution that’s going to fulfil the solution requirements as well as have the right sorts of extension points. The discussion and some thought experiments means that I’m pretty sure I can implement this solution using any underlying implementation technologies. Files, databases, queues, sockets etc. This is the most important thing when designing something – it’s not about “can I build this using technology <X>”, it’s “can I build this in ANY technology”.
Finally, if we look at this now slightly differently, we have the classic “boundaries” model where our business logic (the calculation) is all in the “middle” with our external interfaces providing the interfaces to the horrible messiness of the outside world. Functional core, imperative shell. This is another good indication that our design proposal has merit.
This also helps us understand where our testing styles should be going. We should have our unit/functional tests for the “functional core”, and contract/verification tests for our “imperative shell”. Our code is the core – this is really, really important and is the key point that needs to be made from this entire narrative. Our job is not (NOT!) to test the AWS libraries, the DB connection libraries, the SNS/SQS libraries – these can be verified at run-time using smoke tests, or at various points in the development cycle using contract/validation tests.
For people who worry about the protocols – that’s not a testing job, that’s a business logic task. If the payload in the event queue is “different” – then the system should just fail (gracefully or otherwise). The contract is broken, you no longer have to continue to behave rationally and can make sensible decisions about your own reactions. Under no circumstances should you attempt to “massage/hide” the broken contract. This leads to hard to detect errors and is a significant source of production failures. Just fail early – and in close proximity to the broken contract. This is a fundamental of good software implementations.