There is no unified theory of SQL NULL

As I dig deeper into SQL and uncover its warts, one that I keep revisiting is this awful concept of SQL NULL. Way back when, I previously covered this in a way that was, at best simplistic, and at worst, wrong. Now that I’ve had more time to uncover all these little minefields, the best way I can describe null is that any attempt to summarize it will be either wrong or incomplete (e.g., any complete and accurate summary of null would be equivalent to its definition in the spec). But let me try again anyway, if only to illustrate why I think so. Here’s why null eludes description:

Null is a value

Null is a value (you can use it anywhere that requires a value), but not really (values have types). The SQL standard makes references to a “null value” multiple times. Sometimes this means null stands for “unknown value.” This holds when you use it in ternary (three-valued) logic – for example, NULL=NULL is NULL, NULL=1 is NULL, and NULL < 1 is NULL. These all make sense because the results of those expressions aren’t knowable.

Null isn’t a value

Sometimes null means “the condition of having no value.” You might get this when you SUM over nothing. This is surprising, since COUNT(NULL) is 0, but SUM(NULL) is NULL. In other cases (OUTER JOIN, I’m looking at you) – it means, vaguely, “nothing” or “undefined”, because there is no data there. This is not the same as an “unknown value” — if the right hand side of an outer join is empty, those null placeholders aren’t some value that is unknown, they’re undefined.

When stored in the contents of a row, null roughly indicates the state of not having a value — it doesn’t have the properties of other values: type, reflexivity, and so on. Many people advise against giving it an application-defined meaning of “unknown.” Personally, I think this is very useful, and have never run into problems with it when used consistently.

Some (very well informed) people will argue that SQL NULL is a value, full stop. But I think accepting this means that “value” has essentially no meaning as we usually understand. Allowing NULL to stand in for the condition of having no value or an undefined state is more accommodating.

Null isn’t reflexive

Null doesn’t equal anything, not even itself. A clause like WHERE column1 = NULL will always return zero rows. More confusingly, a clause like WHERE column1 != NULL will also always return zero rows. This means that there are a lot cases that might seem to be tautologies, but in fact aren’t: statements like column1 = 'a' OR column1 != 'a' won’t return rows where column1 is NULL. Also, null is not equivalent to itself when it appears as the value of a column for the purpose of unique indexing.

Null is reflexive

For grouping, sorting, and some other set operations, null is considered “not distinct” (which is different from “not being equal to itself”). In other words, null doesn’t equal itself, but is treated like it is for sorting and grouping purposes. So here we are, NULL != NULL even though they are all grouped together in GROUP BY statements and nobody bats an eye ¯\_(ツ)_/¯.

Null is falsey

In almost all logical expressions, null is falsey…

Null isn’t falsey

…But there are random exceptions to this rule – for example, in PostgreSQL, an expression that evaluates to NULL satisfies a check constraint, which has the effect and behavior of an expression that evaluates to True. Takeaway: NULL is falsey, until it isn’t.

Crap, where do we go from here?

The best way to summarize SQL NULL is that it is, essentially, a bag of individual special cases that got plugged in anywhere a special case was needed. There’s no grand unified theory. It isn’t even internally consistent in each domain it appears. You cannot extrapolate behavior from a consistent set of principles. It’s just special cases all the way down.

Read the docs.


Leave a Reply

Your email address will not be published.