No abstraction is better than the wrong abstraction
The DRY principle is a widely accepted guideline for writing maintainable and readable code. As described by David Thomas and Andrew Hunt in the Pragmatic Programmer:
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
This statement is often boiled down to: “Don’t repeat yourself”. This is an essential heuristic for writing great code (no one wants to have to fix a bug by having to apply the same fix in 10+ different locations in the codebase!), but there’s some important nuance that is often missed when applying this advice dogmatically.
Coding principles and advice like this are incredibly valuable, but they’re often written from the perspective of “what the finished product should look like”. As a consequence, explanations and discussions therein miss crucial details of the messy process on the journey to the finished product – and that’s what more people often need help with!
“Don’t repeat yourself” is unfortunately interpreted as “create an abstraction before you have a chance to copy-paste code”, such as a new function or class for this particular snippet. But there are many pitfalls to this approach:
-
You haven’t taken the time to observe more than one or two use cases for this logic, how can you be more than 50% certain that the interface is what you’ll need in the future?
-
The requirements and responsibilities of this abstraction will likely change as you get deeper into solving the problem for which you’re writing the code for.
-
Premature abstraction can lead to a greater sin: non-orthogonal architecture and heavy coupling.
Designing code for future use is another pitfall, but that’s a topic for another post. In my observation so far, the wrong abstraction is often hastily modified as new requirements come in, and this can become baked into the codebase. Working around this or changing it is often a much harder task than pulling a few duplicated parts later down the line.
The advice given by Sandi Metz states a great pragmatic solution for dealing with this:
If you find yourself in this situation, resist being driven by sunk costs. When dealing with the wrong abstraction, the fastest way forward is back. Do the following:
1. Re-introduce duplication by inlining the abstracted code back into every caller. 2. Within each caller, use the parameters being passed to determine the subset of the inlined code that this specific caller executes. 3. Delete the bits that aren't needed for this particular caller.
This advice makes me think of another proverb which I’ve found useful:
No matter how far down the wrong road you have gone, turn back now!
Re-implementing the duplication (or writing the duplication in the first place) lets a better abstraction coalesce: You have more context, and hopefully a better idea of the requirements and flow of logic in the system you’re trying to design.
Duplication is a natural part of the writing process. You will never write the perfect version on the first attempt (probably not even on the 10th attempt). But you can iterate on your messy code that works and improve it! So embrace it — Get your thoughts on the page. Make it work, then make it maintainable (then make it fast!). Once a working version is implemented, refactor and rewrite parts where necessary.
Once you’ve had a chance to work through the problem a few times, the right abstractions are more likely to jump out at you as you review what you’ve written.