Introduction
Hail Mary local variable Initialization is the cargo cult programming practice of pre-initializing a local variable with some default value, “just in case”, even though that value will be overwritten in all code paths before it will be read. It is commonly done under the impression that it reduces the chances of error, but in reality it achieves the exact opposite: it increases the chances of error.
(Useful pre-reading: About these papers)
What it is
Again and again I see programmers writing code like the following:
|
|
In this example, variable a is declared and initialized at line 1, but then it receives another value either at line 3 or line 5, before it is read at line 6.
A surprisingly large number of programmers are under the impression that a plain local variable declaration like int a; is somehow incomplete. They have trained themselves to see such declarations as missing something important, without which bad things might happen. As a result, they believe that when a local is declared it must always be pre-initialized with some value, even when a meaningful value is not yet available.
The belief is so popular, that it enjoys alleged “best practice” status, even “common knowledge” status, despite it being dead wrong.
Why people do it
The practice of indiscriminately pre-initializing all variables was not always wrong. It started back in the dark ages of the first Fortran and C compilers, when it was kind of necessary. Compilers back then had a combination of unfortunate characteristics:
- They required all local variables within a function to be declared up-front.
- They were not smart enough to detect an attempt to read an uninitialized variable.
Back in those days, accidental reading of uninitialized variables was a very common mistake, leading to many a monstrous bug. (See The Mother of All Bugs.) After having to troubleshoot and fix a few bugs of this kind, every new programmer would quickly learn to always pre-initialize every local variable without asking why.
The practice of blindly pre-initializing everything continued well into the 1990s, even though by that time compilers were fully capable of issuing warnings about accessing uninitialized variables. The practice continued because programmers were refusing to believe that they could be out-smarted by a compiler, so they were either not enabling, or deliberately disabling the associated warnings.
After decades of blindly pre-initializing everything, it became a cargo cult habit, so programmers keep doing this today, also in modern languages like Java and C#, without really knowing why they are doing it, nor asking themselves if there are any downsides to this practice.
And as it turns out, there are plenty.
What is wrong with it
A number of things:
It violates the Principle of Least Astonishment1
When I see that a variable is initialized to a certain value, I am tempted to assume, based on the type of the variable and the initial value, that it has a certain role to play in the algorithm which follows. For example, seeing an integer initialized with zero prepares me to see it being used as a counter, or as an accumulating sum; when that is what I expect, it is rather disappointing to look further down only to discover that none of that happens, and the variable is overwritten with something entirely different before it is ever used.
However, that’s just an annoyance.
It confuses syntax highlighting
When a variable receives a value only once, it is an effectively immutable variable. However, when a variable receives a value twice, then it is by definition mutable. If you have any self-esteem whatsoever, you are using a modern Integrated Development Environment (IDE) and you have configured it to syntax-color mutable variables differently from immutable ones. Hail-Mary initialization, will cause many of your local variables to be colored as mutable, even though they were never meant to be mutable. This is a misleading signal, and coping with it causes cognitive overhead.
However, that’s just an annoyance too.
It leads to misuse of the type system
Some data types do not have default values that you can pre-initialize a variable with, so the desire to always pre-initialize everything sometimes leads to misuse of the type system. For example, some languages (e.g. C#) support explicit nullability of reference types. This means that you cannot pre-initialize a non-nullable reference variable with null. If, in your desire to pre-initialize everything, you decide to turn a non-nullable reference into a nullable reference, then you have just committed an act of sabotage against yourself, and anyone else who will ever look at that code, by making the code considerably more complicated than it needed to be. The same applies to enums: people often add special “invalid” or “unknown” values to their enums, for no good reason other than to accommodate their craving for Hail-Mary Initialization. Such counterfeit values add needless complexity to everything.
It prevents the compiler from issuing useful warnings.
Modern compilers of most mainstream programming languages do extensive data flow analysis and are fully capable of pointing out any situation where a variable is used without first having been initialized. Thus, accidental use of uninitialized variables is never a problem today.
- If you say “but I do not see any such warnings” then you are trying to write code without first having figured out how to enable all warnings that your compiler can issue. Do not do that. Stop whatever it is that you are doing, figure out how to enable all warnings, enable them, and only then continue coding.
- If you say “but my compiler does not support issuing such warnings” then you are using the wrong compiler. Stop using that compiler, and start using a different one.
- If you say “but there is no such compiler for the language I use” then throw away everything and start from scratch with a different language. I do not care what it takes; in the 3rd millennium you cannot be programming without flow analysis warnings.
Once you have warnings about uninitialized variables, the superfluous initialization of a variable becomes bad practice, because it circumvents other checks that the compiler does for you, and opens up the possibility of error:
If you begin by pre-initializing a variable with a value which is by definition meaningless, (since a meaningful value is not yet known at that time, otherwise you would have just used that meaningful value and you would be done,) then as far as the compiler can tell, the variable has been initialized. The compiler does not know that the initial value is meaningless. Thus, if you forget further down to assign an actual meaningful value to that variable, the compiler will not be able to warn you. So, you have deliberately sent yourself back in time, to the dark ages of the first compilers, where warnings for uninitialized variables had not been invented yet. Essentially, you have circumvented the safety checks of the compiler and you have achieved the exact opposite of what you were trying to accomplish: instead of decreasing the chances of error, you have increased the chances of error.
Fortunately, modern compilers are not only capable of issuing a warning if you attempt to use an uninitialized variable; they are also capable of issuing a warning when you unnecessarily initialize a variable. Unfortunately, programmers that keep making these mistakes tend to have both of those warnings disabled.
Conclusion
Make sure you have all warnings enabled, and never initialize any variable before you have a meaningful value to assign to it.
(This post has evolved from an original answer of mine on CodeReview.StackExchange.com.)
Old comments
-
Neolisk 2012-01-04 20:34:48 UTC
Your point would be 200% right if only compilers always worked right. For example, when you have an if statement checking AAA Is Nothing, VS warns you that AAA might be Nothing during this call, so be careful… I guess I have to thank Microsoft for that, but the point stands. :)