How to Squeeze Test Driven Development on Legacy Systems
We all love T.D.D. We know its benefits, we have read a thousand tutorials on how to build a system using this technique. But this not feasible for currently legacy systems.
What is TDD?
Test-driven development (TDD) is a software DEVELOPMENT process that relies on the repetition of a very short development cycle.
We turn requirements into very specific test cases.
We improve software so all tests pass.
This is opposed to incorporating functionality that has not been proven to comply with requirements.
Created in 2003 by Kent Beck (also the xUnit testing Framework testing system author).
1) Add a test (it must fail)
Solely based on behavior. Forgetting everything about accidental implementation.
2) Run all tests. The new test must fail. All the rest should pass.
If the all test passes restart the process or ...
4) (optionally) make a refactor (when code stinks).
NEVER DO BOTH 1 and 4 Together.
Testability and better class interfaces
Isolation on failures (less debugger or logging uses).
Design by contracts.
Bottom up building.
Normal Use cases and Exceptions (Alternate cases) separation.
Full branches coverage (we cannot add code without a test covering it).
Instant feedback / psychological rewards.
Small steps incremental approach.
Based on Wittgenstein learning ideas by incremental examples and Cognitive Behavioral Therapy.
Defer implementation issues and Premature optimization.
Test must be in full environmental control.
TDD can detect coupling problems.
Solving them leads to cleaner code focused on business logic alone and encapsulating implementation decisions.
We must deal with coupling problems using test doubles: mocks, stubs, fake objects, spy, proxies, dummy objects, etc.
Working on existing systems
According to the popular myth, we can't use TDD on existing systems. This is not true. Let's show an example.
The real world example
We have a ticketing system needing to showcase several artists performing live-streaming during COVID-19 pandemic.
Users can search for artists based on a type-ahead selector on a React application.
System performs database queries on a heavy concurrent back-end system.
We need to remove redundant SQL queries matching part of artists names.
Like SQL Operator is very expensive on relational systems, and we are not allowed to change back-end architecture.
We need to simplify redundant searches. That's all.
SELECT * FROM ARTISTS WHERE ((artist.fullname LIKE '%Arcade Fire%') OR (artist.fullname LIKE '%Radiohead.%') OR (artist.fullname LIKE '%Radiohead%') OR (artist.fullname LIKE '%Sigur Ros%') OR (artist.fullname LIKE '%Sigur%'))) …
System will execute just:
SELECT * FROM ARTISTS WHERE ((artist.fullname LIKE '%Arcade Fire%') OR (artist.fullname LIKE '%Radiohead%') OR (artist.fullname LIKE '%Sigur%'))) …
Since this part is redundant and expensive to the database.
OR (artist.fullname LIKE '%Radiohead.%') OR (artist.fullname LIKE '%Sigur Ros%')
Let's get to work
Always start with the simplest problem
1) Add a test (empty case)
- Class LikePatternSimplifier is not created yet.
- No function simplify() is defined.
- We number tests according to definition order.
- First test is the easiest one and also the Zero Case of Zombies methodology.
Test fails (as expected). Let’s create the class and the function.
3) Write the simplest possible solution to make the test pass.
- First solution is always hard-coded.
Continue with another trivial case
1) Add a test (simple expression)
3) And the simplest solution for both cases.
Works like a charm in both cases.
We are taking baby steps, slicing the problem and following divide and conquer principle.
Continue with another (not so simple) case:
1) Add a test (two independent expressions).
Code works correctly without changes. Is this a good test?
We will discuss it on a more advanced article.
Let's move on.
Continue with a desired and juicy business case.
1) Add a test (one expression containing the other).
Let’s make it work.
3) Add the simplest solution for all the already written cases.
This is an ugly algorithmic solution, but we will improve it with a refactoring once we become more confident.
We cannot fake it anymore. We need to make it.
Ugly, not performant, undeclarative and complex.
We don't care. We need to gain confidence and learn on the domain.
Luckily, we will soon have time for better solutions.
Continue with another case.
1) Add a test (left expression containing the right one)
… and we are Green, so we are covering the business rule stating that terms order is not relevant. (Commutative Property).
We make it explicit so no smart refactor can ever break it!
Move on with another (not so simple) case.
1) Add a test (Capitalization is not relevant to MYSQL engine but our users might not be aware of that)
2) We run the tests and the new one is broken. Let’s fix it!
3) The simplest solution for all the already written cases (with the new case).
… and tests are all green again with the ugly improved solution.
Code smells and we have several test cases. We need a better solution.
4) Let’s refactor the solution with a more efficient and readable one
Let’s test first production scenario requested by customers.
1) Add two unrelated redundant prefixes
And it works !
We inject it on our legacy code:
Let's inject it.
What really happened
Up to here, we worked in isolation scenario.
Software development is a group activity.
The quality assurance engineers found additional possible benefits.
Pattern could be in the middle of the string.
Customer agreed to add this functionality.
Lets consider those cases
New cases are broken since they were not represented by a previous one. We keep fixing them.
4) Let’s change the solution to cover all previous cases and the new ones.
The end is the beginning
TDD works in all stages.
Using CI/CD codefix went into production.
Once we submitted the intelligent SQL simplifier something bad happened.
This was actual SQL after terms of bad handling:
SELECT * FROM artists WHERE (())
This SQL generation mistaken as an empty condition.
So we will fix it TDD Way. We isolate the defect and add it as a broken TDD Case
Of Course, it fails since previous implementation brought an empty solution (and a customer complaint).
We can fix it by doing a duplicate's remover case-insensitive pre-processor at the beginning of simplify function:
To see if we must test private methods please visit shoulditestprivatemethods.com
Tests are green again.
Not dealing with case-sensitive duplicate's algorithm worked again.
Lets consider a different order.
Against our intuition we see it fails.
This is because the unit is bringing a ‘Yes’ instead of a ‘yes’.
The solution depends on the product owner. We can:
1) normalize all outputs.
2) change the test based on the property that our SQL Engine is case-insensitive on text fields.
We choose 2)
Tests are green again
We add more tests considering mixed cases
Tests worked with the new Solution and given expected SQL.
Case went to peer review.
One of the reviewers asked about not like comparison finding an improvement opportunity.
We asked our customer-on-site for agreement.
If user chooses not to see ‘head’ => it is choosing not to see ‘Radiohead‘ and Talking heads.
In SQL: NOT LIKE ‘%head%’ implies NOT LIKE ‘%Radiohead%’ which is redundant in an AND condition.
Our simplifier was already aware of that, so we injected in a second place being confident tests were already covering that scenario.
Quality Assurance engineering should add a broken integration test before correcting the implementation that should comply with it.
Implementing on a big system requires special techniques to gradually remove coupling.
TDD influenced all the written code. We have all the new code covered. (17 unit tests and 3 SQL Generation tests).
We gained confidence on every new case add ensuring they didn’t break previous ones.
We should only test private methods using method objects/FunctionAsObject or reflection.
We used TDD on Development, Code Review, QA fixing and Production Defects Life-cycle.
We developed a parallel customer system (the tests). They will always be our first omnipresent user.
Solutions were as simple as possible.
It’s very possible to make TDD on existing projects with lots of code.
TDD does not replace or overlap QA Process and tasks.
Multiple roles were involved and added value: Developers, QA engineers, Customers and Code Reviewers.
We faked it until we made it.
TDD does not guarantee a good design.
We should never change or optimize not covered code.
Legacy Code is all the code without tests.
Part of the objective of this series of articles is to generate spaces for debate and discussion on software design.
We look forward to comments and suggestions on this article.
IF you like this article please let me know, so I can write more on TDD on Legacy systems.