Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't get the buzz about null pointers. They are not a problem. They become a problem when you start checking for null where null is not acceptable (most of the places null can't be a meaningful input), which is where is where the original intent starts to become unclear.

Let it go. Let it crash. Just assume inputs are non-null (except where null makes sense). Even C crashes safely on null pointer dereference.



The problem is that the type system doesn't encode whether a point can be null or not, so unless your documentation is amazing (I doubt it) you'll eventually end up getting passed a pointer from someone else, or giving a pointer to someone else who has a different assumption about whether null is allowed. Boom segfault.

There are several solutions in C++.

1. Always check for null. Kind of annoying and lots of people don't for whatever reason.

2. Use references. As you say, annoying because then you can't have null (sensibly) even when it would be really useful.

3. Use std::optional<int&> or something like that. I only just thought of this and don't know if it would work, but I bet it's a pain.

So no great solution. Personally I would use a smart pointer type, document it as well as I can, and always check for null.


> so unless your documentation is amazing

As I pointed out I usually don't have null pointers at all. There might be a few places where a value can very obviously be null, but they are far and far between. No need for documentation.

> Boom segfault.

Which is exactly the right thing to happen (it's the C version of Exceptions in dynamic languages) since the code was incorrect.

> Always check for null. Kind of annoying and lots of people don't for whatever reason.

And now what do you do if you detected null but it was not allowed? Throw an exception? You can have that for free by just not checking.

> Use references. As you say, annoying because then you can't have null (sensibly) even when it would be really useful.

C++ references don't really protect you from NULL. It's just a different syntax for the same thing.

> Use std::optional<int&> or something like that. I only just thought of this and don't know if it would work, but I bet it's a pain.

Yes, pain, big pain. So much line noise and typing (in both senses) around what is essentially an int.

Just don't check. I don't get why there's so many places for null pointers. (As I said elsewhere, I don't have a solution for JSON style microscopic programming, and I don't care about that.).


How do C++ references not protect you from NULL ?

From the standard:

> A reference shall be initialized to refer to a valid object or function. [Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field. ]


Better try instead of reading standards (and inevitably misreading, or reading differently than compiler authors)...

    #include <stdio.h>
    void test(int& x)
    {
        printf("Hello, world\n"); fflush(stdout);
        printf("and the number is: %d\n", x);
    }
    int main(void)
    {
        int *x = NULL;
        test(*x);
        return 0;
    }
It's just a syntactic discipline. Null references are undefined in C++, just as NULL dereferences are undefined in C.


> Always check for null. Kind of annoying and lots of people don't for whatever reason.

use assert or some custom always-assert-even-on-release macro.

> Use references. As you say, annoying because then you can't have null (sensibly) even when it would be really useful.

well, only use references for non nullable pointers. Otherwise use plain (or smart) pointers.

> Use std::optional<int&> or something like that. I only just thought of this and don't know if it would work, but I bet it's a pain.

std::optional<T&> is unfortunately not part of C++17, mostly because people couldn't agree on operator= semantics:

int x = 1; int y = 2; std::optional<int&> oint = x;

oint = y; // which one is true? assert(x == y); // deep assign assert(&*oint == &y); // rebind

boost::optional supports references and IIRC rebinds on plain assignment.


For parameters where you aren't transferring ownership, just use raw pointers/references. If it's a required parameter make it a reference. If it's optional, use a pointer and check for null.


The problem in C derivative languages is that it isn't enough.

There is no guarantee that the pointers are valid, even if not null.

So besides checking for null, to be correct we need to call OS APIs to check for pointer integrity as well.


That's on the client code. You can't stop callers from doing something stupid (or cosmic rays from flipping bits). At that point crashing fast is preferable.


I disagree, the premise of design-by-contract and clean code methodologies is that functions must ensure the integrity of the data handled by them.


That's theoretic talk. Try doing it in practice and still get things done / be able to maintain the code / see the forest for the trees.

But I have a feeling that we are lacking a bit of context here. Some people seem to focus on web-application style of programming (understandably) where you have lots of trust issues. Whenever data is carried across trust boundaries it needs to be checked (this applies to integrity in general, of which null safety is just a small part).

(On the other hand, deserialization is not about validation of function arguments. Deserializers should assert integrity on the spot before calling into deeper nested functions).


I did it in practice, during the MFC's glory days, several years ago.

Making use of ASSERT_VALID(), VERIFY(), AfxCheckMemory(), AfxIsValidAddress(), AfxIsMemoryBlock() and many other helper functions.

A style enforced at the company's code reviews, which helped a lot our code quality.


> They become a problem when you start checking for null where null is not acceptable (most of the places null can't be a meaningful input), which is where is where the original intent starts to become unclear.

The trouble is it's not always clear whether null is a meaningful value or not. The type of an expression should tell you what values it might take - that's the whole point of a type. Having the same type for pointers that could be null and pointers that will never be null is like having the same type for integers and floating-point numbers.

> Even C crashes safely on null pointer dereference.

False. A null pointer dereference in C is undefined behaviour and can therefore easily be a security flaw (there's been at least one local root vulnerability in Linux due to this).


In database speak, if your function takes nullable values it is "denormalized". If you design your data structures and flows right, you really don't have many of these.

> False. A null pointer dereference in C is undefined behaviour and can therefore easily be a security flaw (there's been at least one local root vulnerability in Linux due to this).

Yes. in theory... and yes I heard there are also some stupid compiler optimizations around this. Still, the point is, what do you get from checking for a condition that shouldn't happen? At most an assert is justified!

You don't check for the infinitely many other conditions that shouldn't happen, either.


> If you design your data structures and flows right, you really don't have many of these.

So it's really important to be able to tell which ones they are!

> what do you get from checking for a condition that shouldn't happen? At most an assert is justified!

An assert is a check, still takes up a lot of reading time if you have to do it for every parameter on every function.

> You don't check for the infinitely many other conditions that shouldn't happen, either.

I check for all the things that the code allows to happen, because sooner or later they will happen. You have to do this if your code ever accepts user input; any input that is possible will, sooner or later, find its way in. So if something shouldn't happen the best course is to make it impossible in the code.


> I check for all the things that the code allows to happen, because sooner or later they will happen.

So if you know it will happen, why don't you fix the bug instead of "recovering"?

> So if something shouldn't happen the best course is to make it impossible in the code.

Exactly what I was suggesting. But adding a check is not "making it impossible". You need to go to the call site and fix the invalid call.


Code changes all the time; you can fix the calls you know about, right now, but you're relying on everyone who calls your function to infer some extra context about it and get it right every time. The defect will keep reoccurring. Whereas if you make it part of the function's type that it can't be called that way, then the compiler enforces that it will be used correctly, automatically, and you never have to fix the same problem again.


That's programming. Defects come (and often stay) as you type.

Yes, it would help a tiny little bit if the compiler would check for nullable pointers. But this comes at a cost of annotation work / maintenance work / worse modularity, and it only helps a bit. There are so many more invariants (is this integer in the range 3-42 and odd or divisible by 12, is that integer a valid index into this other dynamic array...) that you can't practically check with a static system, and they are much much worse because many of them manifest themselves in much subtler ways.

All these formalisms work in some toy examples, but adhering to them makes a big mess as soon as it gets a little more complicated. Most basic example, const. I use const almost only for pointers to static const strings, because otherwise it always ends up in a big mess somehow (because const in one place is non-const in another place, guaranteed). And let's not talk about the const-mess in C++ where so many functions are duplicated just for const, without any pracitcal benefit.

I probably wouldn't mind a very light-weight opt-in guarantee-not-null syntax. But it doesn't matter for me, at all.


> Yes, it would help a tiny little bit if the compiler would check for nullable pointers. But this comes at a cost of annotation work / maintenance work / worse modularity, and it only helps a bit. There are so many more invariants (is this integer in the range 3-42 and odd or divisible by 12, is that integer a valid index into this other dynamic array...) that you can't practically check with a static system, and they are much much worse because many of them manifest themselves in much subtler ways.

Not my experience. It's no more work since you know when you're writing whether it's nullable, it's better for maintenance since you can immediately see which variables are nullable. As for other invariants: look at how often you see a real app you're using fail, or look at the bugs that manifest in production; they're rarely the subtle ones, most of them are the simple stupid cases.

> All these formalisms work in some toy examples, but adhering to them makes a big mess as soon as it gets a little more complicated. Most basic example, const.

No, const is a poor example; it's a wrong abstraction and it's difficult to use as a result.


>There are so many more invariants [...] that you can't practically check with a static system

You can't statically check the invariants directly, but you can make types that statically enforce that the invariants are dynamically checked.

That way you can be sure your assumptions hold when the data gets to your code, and any invalid data will be recognised as such, rather than manifesting as weird behaviour.


So that's managed languages? They exist and have their justifications, but come at a performance hit, and much like static types can practically only check a subset of invariants. Just because there are so many of them.

The way I think about it is simply that I write a program with code and the compiler statically enforces that what I specified there holds at runtime. Code is a pretty nice way to express invariants :-)


> Let it go. Let it crash. Just assume inputs are non-null (except where null makes sense).

That's just asking for a disaster. Null pointer dereferences are undefined behavior in C, and compilers can and do make these assumptions, leading to all kinds of horrors. (https://software.intel.com/en-us/blogs/2015/04/20/null-point...)


> most of the places null can't be a meaningful input

"Maybe something" is not rare. In fact it's often just as common as "Definitely something" in my experience. A customer maybe has a shipping address. You could bend over backwards and say that either he has an address or else the address reference must point to the special no-address value instad of being null.

But in that scenario you still need to check for whether it's the no-address or an actual adreess. Otherwise you might end up doing things like trying to set the street address of the no-address value which could either throw an exception or (worse) suddenly everyone without address lives at that address.

So the only real solution is to use proper types to model "maybe something". You can use a list of addresses for the customer. That naturally supports 0..N. Or you could use a proper option type (which is just a collection with max count 1).


Data design problem. C / pointer style is not good at this JSON-structs style of programming. I agree that algebraic (like Haskell's) datatypes could help ruling out some invalid objects here.

On the other hand, you can reap huge benefits from data-oriented programming (basically, make a separate email table that has a "foreign key" to the customer table) to do processing in the large. This is where C is a very good fit and you have no null pointers :-)


> Let it go. Let it crash

It must be nice to work in an environment where nobody cares about results.


> It must be nice to work in an environment where nobody cares about results.

Not crashing when there's a bug is not "success", it's just "zero visible errors"!

If your workplace really cares about results, they'll understand how futile it is to keep executing even though there's a bug. You start second-guessing your own program, which makes the code quality worse as the assumptions it makes are no longer clear; you won't be alerted to problems, as the program won't be aware of any; there'll be no guarantee that your code isn't just working "by accident" at the moment.

I'd love it if the code I deal with just crashed instead of pretending everything's alright, because it makes errors far harder to diagnose.


No one is talking about keeping execution instead of crashing. C++ smart pointers do not do that, optional types do not do that, no programming language whose creators are in their right mind has ever pursued that. You have just produced a classic example of a straw man argument.


>> Grandparent: "I think you should just let your code crash."

> Parent: "I think that is a bad idea."

It seems pretty clear that we're talking about crashing here.


There is a third option to crashing vs. not crashing: not letting the code compile if there is any chance it might crash. That is what this "buzz about null pointers" is about.


Oh, right. Yeah, that would be nice, but my company doesn't use languages with that feature, and we can't re-write our software, so we're stuck with Option types.


What magic compiler can certify my program will never abruptly stop, or worse do something unintended, because of hardware failure?


I detect a moving goalpost here, but if you're genuinely concerned that hardware might change the value of your bits then nothing else but lockstep execution will help. e.g. TI Hercules: http://www.ti.com/lsds/ti/microcontrollers-16-bit-32-bit/c20...

(Checking for NULL in your program won't save you from hardware failure either)


I hope I'm just being pedantic rather than moving goalposts, I'm all for better assurances. To me the only compiler "not letting the code compile if there is any chance it might crash" is either magical or never compiles a thing.


"not letting the code compile if there is any chance it might crash"; well, normally we assume correct operation for the hardware here in order to make that achievable, otherwise things could simply crash at any time. You're always vulnerable to single-event-upsets from cosmic rays or even alpha decay inside the chip packaging. Being pedantic about this in a discussion of compiler correctness is useless derailing.

"Rowhammer" over in https://news.ycombinator.com/item?id=15515044 is perhaps the most likely problem of hardware non-correctness to worry about.


When talking correctness (or security) I would think pedantry would be more welcome. I think you have a point on derailment if the comment I replied to had given a specific example that refused to compile anything with a chance of crashing, e.g. through guarantees of its type system, and then I come along asking about cosmic rays. But they didn't, they were vague, and I'm still left wondering of an example because even Haskell programs crash with a segfault or bus error sometimes. It's perfectly fine that it can't cover those cases but a compiler did sign off on a program with a chance of crashing, no cosmic rays involved, at best it only assures less likelihood of crashing compared to another compiler.


> Even C crashes safely on null pointer dereference.

Does it? What if you needed to clean up external resources? What if some data was left in an inconsistent state? What if you were in the middle of writing to a file? Etc.

I do agree that crashing is often the right thing to do and of course the machine can die anyway, but sometimes you want to at least attempt to recover.

Also, in much of the software I've written in recent years, crashing was not acceptable because just because one thing (eg request of some kind, message, task, whatever) failed unexpectedly, does not mean that others will too and we still want others to be attempted. Sometimes you can get away with letting the program crash and then restarting it to run the other work, but often things are done concurrently and restarting would lose any in-progress state. If the work takes a long time, restarting all work because one piece failed is often unacceptable.


Yes, considering communication heavy architectures, it's unfortunate that with C the unit of encapsulation is the process. And there's bad support for inter-process communication, etc.

Maybe have a look at Erlang? I suppose it's more aligned with what you want to do. There you also "crash", but only managed, green, processes with very convenient communication channels.


Yes, indeed.

Sadly I’ve never had the opportunity to use Erlang for work, I’d certainly like to give it a try. I suppose I should try it (or elixir) on a side project sometime since it’s always a bad idea to go blind.


> Let it go. Let it crash. Just assume inputs are non-null (except where null makes sense). Even C crashes safely on null pointer dereference.

If the amount of work done before the crash has some value and is lost by the crash, then crashing just isn't an option.


What everybody is missing here is that null pointers where none was expected are just an instance of wrong code. How do you protect yourself from wrong code? (Hint: not by adding more code!)


Perhaps you're just missing the complexity of the real world?

I'm working on quite big finite element desktop applications, where alone the reading of the data can take up quite a bit of time. The user can then apply all kinds of different operations on this data and if one of these operations fails because of a null pointer - then sure, the code is wrong - but nevertheless the user doesn't want to loose all of his previous work and wants to be able to save the changed data.

Sure there're cases that a failed operation might have corrupted the data, but you just can't tell this in every case, and often the data is still valid and only the operation couldn't be applied.

If I've learned something over the years, then that there's not one solution that works for all cases.


So you weren't expecting bad user input? Bad for you!

This is called validation, and in a validation routine you expect bad invalid data. Check inputs at the trust boundaries, and off you go.

Note this is NOT about just null pointers but about integrity in general.


Well, if you want a serious discussion then don't imply something just to be able to make a point.

But if you just want to win an argument: here you go.


Then you should run the risky operations in a different process. It will not only protect you from the operation crashing your main program, but also from data corruption if you use readonly access for the shared data.


>Even C crashes safely on null pointer dereference.

Uh, that's a rather optimistic way to look at this. In C NULL pointer dereference is UB so you shouldn't rely on this behaviour. For one thing the compiler is allowed to do weird things if it can statically assert that a NULL reference takes place, for instance by dropping any code that can be proved to lead to a NULL dereference. Not so safe or friendly.

Most of the time for userland apps dereferencing a NULL pointer for reading or writing will cause a segfault but that might not be the case for bare metal/MMU-less applications. Furthermore even when it does crash it simply terminates the process without any unwinding which might leave non-volatile data in a non-consistent state.

The real problem with pointers is that they're effectively an algebraic data type, they can point to something or be NULL. But the type system can't enforce that, there's no concept of non-nullable pointers in C.

Take the prototype of nanosleep(2) for instance:

    int nanosleep(const struct timespec *req, struct timespec *rem);
Turns out that the 2nd parameter is NULLable if you don't care for the reminder. The first one isn't however, so passing NULL there is a big no-no.

In rust the equivalent signature would be something like:

    fn nanosleep(req: TimeSpec, rem: Option<&mut TimeSpec>) -> Result<(), i32>;
Which is a lot more explicit (mind you, that's probably not what the API of nanosleep would look like in rust since output parameters are not very common, you'd probably return rem in the Result).

The problem is the same on the library side. When you're passed a pointer, should you assume it's nullable? Should you check for it? If you want to be safe you do, and you end up with a bunch of redundant checks.

It's also easy to end up with a NULL-ish pointer that's not NULL if you happen to offset a NULL pointer by mistake (like a NULL array or NULL struct pointer). That can be pretty tricky to track down since the crash doesn't occur when the offset is calculated but rather when the bogus quasi-NULL pointer is dereferenced:

    #include <stdio.h>
    
    struct s {
      int a;
      int b;
    };
    
    void bar(int *i) {
      if (i == NULL) {
        printf("NULL\n");
      } else {
        printf("%d\n", *i);  /* i is invalid but not NULL, this (probably) crashes */
      }
    }
    
    void foo(struct s *ps) {
      int *b = &ps->b;      /* ps assumed not-NULL here */
    
      bar(b);
    }
    
    int main(void) {
      foo(NULL);            /* This is where the actual error takes place,
                               foo's argument is not NULLable */
      return 0;
    }
Also consider what happens if we're using a NULL pointer to a very large buffer. Dereferencing the beginning of the buffer would probably crash but higher addresses might end up in mapped pages and you start accessing and possibly overwriting random parts of the runtime.


There are so many ways things can break if used incorrectly. I don't disagree with most of the things you listed. But what's the conclusion?

> The real problem with pointers is that they're effectively an algebraic data type

No, they are not. Join my religion and preach: pointers are just data, pointers are just data. And be free.

C is about machine representation, and it's typed just enough to map to hardware (specify ABI of functions). Don't be deluded to think you can represent all (or even many) invariants in types. Some other languages try to do that (and it always ends up in a big mess).

If you want more "safety" - which (for the things you still need to implement) just means crash in a more friendly way, not less - then use a managed language. But they come with their own pros and cons, of course.

To address the issue with nanosleep, this is not what we were talking about. nanosleep does expect a null pointer in the second argument. (And I think in this rare case an assert(req) would be justified. Or maybe just split it into 2 functions).


But they are not just data because the language handles NULL differently from other values. The compiler is allowed to treat NULL pointers differently from other pointer values. NULL is special in C, it's not just a pointer to address 0.

>C is about machine representation, and it's typed just enough to map to hardware (specify ABI of functions).

I don't understand that at all. How do enums map to hardware? How do structs? Clearly the line is arbitrary. Would adding generics mean that C maps less to hardware? What would that even mean?

And if C is all about talking directly to the hardware, how come the language has no first class support for SIMD, for preload, for the various CPU flags? And to get back to our point, how does NULL map to hardware? A CPU has no issue addressing at 0, it's nothing special. An assembler doesn't treat address 0 differently from any other, unlike C explicitly does.

Join my religion and preach: C is not a macro assembler, C is not a macro assembler.

>Don't be deluded to think you can represent all (or even many) invariants in types. Some other languages try to do that (and it always ends up in a big mess).

[citation needed]

>To address the issue with nanosleep, this is not what we were talking about. nanosleep does expect a null pointer in the second argument. (And I think in this rare case an assert(req) would be justified. Or maybe just split it into 2 functions).

My point is that in this case the rem pointer is an algebraic data type. Passing NULL here doesn't mean "please store rem at (void*)0" it means "I don't want rem". Except this is not expressed in C's type system so you don't know that unless you read nanosleep's implementation or its documentation. the rem pointer, unlike the req pointer, is nullable, therefore it's effectively an option type but as far as C is concerned they're the same type.

If you assert() it then you can catch the problem early on at a runtime cost (and only when the situation eventually arises). If you make this part of the type system the compiler can validate that statically at compile time.


> How do enums map to hardware? How do structs?

As integers? Sequentially in memory? And that's about as much as I care.

> Would adding generics mean that C maps less to hardware? What would that even mean?

It means that you have control. That's what you need for good and (equally important) consistent performance and low memory footprint. You have pretty good basic low-level abstractions (integers, pointers, structs, functions) that are very convenient to work with (much more convenient than your typical assembler language), and at the same time as low-level as you care to go most of the time. And even then, toolchains have good support to go even lower when you need to.

> [citation needed]

My personal experience with simple languages like C, Python, and complex typed ones like C++, Haskell. (There are also boilerplate ones like Java (not interested, thanks), or ones that swallow errors too easily to be productive, like sh, Javascript).

Also, look at other projects. Look how successful game programmers use C++ (In my filter bubble I watched people like Jonathan Blow, Mike Acton, Casey Muratori on Youtube, and have been following Sean Barrett or looked at imgui or quake3). What they do is pretty much C. I have also been following the Haskell community quite a bit and watched C++ experiments like boost, and they simply have a different focus. They don't get things done (in general), their builds are time consuming and brittle, their code is hard to understand, etc...

> And if C is all about talking directly to the hardware, how come the language has no first class support for SIMD, for preload, for the various CPU flags?

It's not assembler... It's "portable assembler", as some people like to say. And if you need support for SIMD etc, just take second class support or drop to inline assembler. No point in arguing here.

> And to get back to our point, how does NULL map to hardware? A CPU has no issue addressing at 0, it's nothing special. An assembler doesn't treat address 0 differently from any other, unlike C explicitly does.

There are always these language lawyer type unfortunate exceptions and complexities that can be explained by a little bit of history. Personally I'm mostly on x86 and I sometimes assume it's binary 0, but actually I don't really care that much.

> A CPU has no issue addressing at 0, it's nothing special. An assembler doesn't treat address 0 differently from any other, unlike C explicitly does.

Making my point. It's just data. I think NULL is actually just ((void * )0), but it's the representation of 0 that is special on some architectures. That the representation might not be all binary 0s on some architectures is probably pretty complicated to explain. As far as I'm concerned it's a language lawyer thing and I don't care.

C might technically not be simple, but your use of it can (and maybe should) be simple.

> My point is that in this case the rem pointer is an algebraic data type.

What is is? If you insist on looking at it like that, why don't you use Haskell? Though, you will have to take a performance hit and might have a harder time developping new features, because modularity is very bad due to complex typed interfaces.

By the way, I know how the nanosleep interface works. I don't have any problem understanding and remembering that and using it correctly (and it's not like I have used it more than 3 times in my life). I don't think I would make an error, but I wouldn't care if the null option would go away completely, either.

It's much harder to correctly use struct timeval / timespec than calling nanosleep. The null thing is totally a theoretic non-issue.


Not in many important environments, eg especially those without virtual memory where the zero page can be mapped invalid.

Talk to me about how good an idea it is not to check potentially-NULL pointers in embedded systems.


Sure I know about that... I don't do embedded, but the point is, if your program is wrong, it goes wrong. So? Fix it!


An embedded program using NULL may cause some serious damage, even physical. Code has bugs: avoiding the most serious failure modes is important pragmatically speaking.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: