This is an oldie (I first heard it in the 80s) but is one of my all time favorites. While it can be told about any two classes of people it really applies to a lot of code I encounter:
A physicist is showing a thermos to her friend, a programmer.
"It's amazing", she said. "You put a cold drink inside and regardless of how hot it is outside the drink stays cold".
The programmer is suitably impressed.
"But that's not all", she continued. "You can put a *hot* drink inside and no matter how cold it is outside the drink stays hot".
Now the programmer is perplexed.
Plaintively he asks, "But how does it know?"
I think of this whenever I read code that contains a gratuitous state variable that explains the type or content of some data structure rather than make the data structure self-explaining. Even more annoying when it's a class.
Having to coordinate two variables is a recipe for bugs down the road. Seems like it should be a beginner's mistake but I see it all the time in "non beginner" code.
"An engineer, a physicist, a mathematician, and an AI researcher were asked to name the greatest invention of all time.
The engineer chose fire, which gave humanity power over matter. The physicist chose the wheel, which gave humanity the power over space. The mathematician chose the alphabet, which gave humanity power over symbols. The AI researcher chose the thermos bottle.
"Why a thermos bottle?" the others asked. "Because the thermos keeps hot liquids hot in winter and cold liquids cold in summer.", said the AI researcher. "Yes - so what?" "Think about it.", intoned the researcher reverently. "That little bottle - how does it know?"
Similar physics mystery: You're a pool of water in the bottom of a bucket, looking out at the stars. Somebody spins the bucket, making the stars spin. So you (the water) arrange your molecules in a parabolic shape, thicker at the sides of the bucket and thin in the middle.
How do you (the water know)? How do you know that the stars aren't holes in a paper sheet with light shining through? How do you know the universe isn't spinning, and you're standing still?
Love the joke, but to me it points to something else than gratuitous state variables. I think of it as "if all you have is a hammer, everything looks like a nail", with the hammer being "procedurally solving problems".
Could you share an example of this? I could be misinterpreting this but say you're building a survey system and you need to keep track of answer types (e.g. text, date, etc) and the answer itself, how can you collapse that into one field?
Your thermos does not have a switch which you must set to "hot" or "cold" before inserting a liquid, handles solids as well as liquids, and doesn't require you to even think where the theshold might lie between the "hot" and "cold" settings. Instead it just does its best to prevent an energy exchange in either direction without having to even know the variables involved. And the "same subroutine" is basically used for different sized thermoses.
In code I see people not understand this all the time. Here's an example: let's say you present the user with their previous orders, and give them the option to filter those orders by some criterion.
The shitty way to do this is to have two variables:
So instead of just checking if a filter has been assigned, you have a separate boolean. What happens if the boolean is true but the filter is null? What would the vice versa case even mean?
The global codebase is riddled with dumb errors like this.
Even better, let's just have the default orderFilter be the equivalent of '*'.
If there's always a filter, then there's no longer a branch at the point it's used -- that needs to be tested and maintained.
I used to work with a programmer who I found difficult because their code was an ever-increasing number of "if" statements as each new case came along. Conversely he would say coding is relatively easy and that I was overcomplicating a problem by thinking about it; all I needed to do is "add an if statement here".
Yep! On of the things I point out most frequently in code reviews are things like this. However, null is an imperfect system for capturing such state. It's not self-documenting, and it breaks down when you have more than 2 states to represent.
Languages with Sum Types represent this much more elegantly with arbitrary numbers of variants and force you to check which one you have before accessing the more specific data (e.g. the filter) inside.
Our code is riddled with heavily overloaded meanings in a single value/domain, and if you have a variety of filters, you end up with this atrocity:
Foo *f1; // hey, just test for NULL
double f2; // hey, just use isnan()
int f3; // aaaaah, crap
bool f3_specified; // god why
std::string f4;
// requires heavy drugs to solve existential catastrophes
And the variety of code that has to work with either of these examples.
What happens if the boolean is true but the filter is null?
An assertion fails miserably.
What would the vice versa case even mean?
That an assertion failed miserably.
What happens if the filter is one? Minus one? Equals to PC(IP), BP? If NULL filter has to search for NULL values in a dataset? If we are looking for NAN values in a corrupted one (this one is even more tricky)?
I'm not sure they were suggesting using type-specific sentinel values (like null and nan), but to always use null or point to a value.
I like wrapping the information in a data structure exactly as you suggested, and if you never mutate the dereferenced pointer, it's equivalent to what they proposed. Big if, though.
The best is to do this generically with https://en.m.wikipedia.org/wiki/Option_type , since this pattern comes up regularly, not just with this one domain concept of "Filter"
100% agree - it's a quite common trap for inexperienced developers to write down every cases and conditions separately, when they shouldn't. As a similar example, instead of writing "req.source = host + port", people would write:
After a few quarters of services, layers, and people being added and removed, it becomes a 500-line monstrosity and sits in every critical path. And because now it's a 500-line class (half of which is defunct, but good luck figuring out which half), nobody has time to read through it and figure out that it should have been a single assignment statement.
Having the two separated gives the ability to disable/re-enable the filter without losing it. That's often very useful. In other cases you're very right.
In something like typescript you could represent this as a union type. If I'm expecting a location I could take a string ("Baghdad"), lat long pairs, an enum, etc. With a union type I can specify that it could be any of these but they fill the same purpose. In Java I'd either do method overloading or expect to receive an object that implements a method that would give the location in a common format. The wrong way to do it would be to have a function that has both a lat/long input and a string input and just say that they're nullable and we only expect to get one of them.
I think they are referring to this sort of thing (the reader will have to use their imagination and assume there is additional functionality in this class):
class ReadingMaterial:
is_magazine = False
def is_periodical():
return is_magazine
Using a base class (e.g. a Magazine and a Book class, or something) and inheritance, is much clearer than monkeying around with state.
In graphics programming the typical example of this pattern would be when someone does bespoke computations on coordinates with a bunch of if-elses instead of deriving the proper matrix equation that "just works" due to math.
Reminds me of our five value boolean we used to have: is_deleted. Woe to person who expected that to be 1 or 0. Five distinct and overloaded values were possible, though I now forget them. Probably something like pending, fully deleted, being restored, not deleted, and restored.
Having to coordinate two variables is a recipe for bugs down the road. Seems like it should be a beginner's mistake but I see it all the time in "non beginner" code.