Unravelling `not` in Python
For this next blog post in my series of Python's syntactic sugar, I'm tackling what would seem to be a very simple bit of syntax, but which actually requires diving into multiple layers to fully implement: not
.
On the surface, the definition of not
is very straightforward:
The operatornot
yieldsTrue
if its argument is false,False
otherwise.
That seems simple enough, right? But when you begin to dive into what is "true" or "false" – sometimes called "truthy" and "falsey", respectively – you quickly discover that there's a decent amount that goes into that definition.
(As with the other posts in this series, the C code is for those who want to follow all the breadcrumbs, but you can feel free to skip it if you want.)
The implementation of not
Looking at the bytecode, you notice there's a single opcode dedicated to not
called UNARY_NOT
.
The implementation of UNARY_NOT
essentially calls a C function called PyObject_IsTrue()
and returns the inverse of the return value: True
for False
, False
for True
.
Defining what is true
The trickiness with unravelling not
starts with defining what is true. Looking at the C implementation of PyObject_IsTrue()
, you see there are a few possible ways to figure out the truth of an object.
When you look at the C implementation, the rule seems to be:
- If
True
, thenTrue
- If
False
, thenFalse
- If
None
, thenFalse
- Whatever
__bool__
returns as long as it's a subclass ofbool
(that's what callingnb_bool
represents) - Calling
len()
on the object (that's what callingmp_length
andsq_length
represent):- Greater than
0
, thenTrue
- Otherwise
False
- Greater than
- If none of the above applies, then
True
Rules 1 through 3 and 6 are straight-forward, rules 4 and 5 require going deeper into detail.
__bool__
The definition of the special/magic method __bool__
basically says that the method is used "to implement truth value testing" and should return True
or False
. Pretty simple.
len()
The built-in len()
function returns an integer representing how many items are in a container. The implementation of calculating an object's length is represented by the sq_length
slot (length of sequences) and the mp_length
slot (length of dicts/maps).
You might think it would be a simple thing to ask an object to tell you its length, but it turns out there are two layers to this.
__len__
The first layer is the special/magic method __len__
. As you might expect, it "should return the length of the object, an integer >= 0
". But the wrinkle here is that "integer" doesn't mean int
, but actually an object that you can "losslessly convert ... to an integer object". So how do you do that sort of conversion?
__index__
"To losslessly convert the numeric object to an integer object", you use the __index__
special/magic method. Specifically, the PyNumber_Index()
function is used to handle the conversion. The function is a little too long to bother pasting in here, but what it does is:
- If the argument is an instance of
int
, return it - Otherwise, call
__index__
on the object - If
__index__
returns an exact instance ofint
, return it (technically returning a subclass is only deprecated, but let's leave the old ways behind us 😉) - Otherwise raise
TypeError
At the Python level this is exposed via operator.index()
. Unfortunately it doesn't implement PyNumber_Index()
semantics, so it's actually inaccurate from the perspective of not
and len()
. If it were to implement those semantics, it would look like:
len()
implementation
One interesting thing about the implementation of len()
is that it always returns an exact int
. So while __index__()
or __len__()
could return a subclass, the way it's implemented at the C level using PyLong_FromSsize_t()
guarantees that a direct int
instance will always be returned.
Otherwise len()
does some basic sanity checks about what __len__()
and __index__()
return such as being a subclass of int
, being greater or equal to 0
, etc. As such, you can implement len()
as:
def len(obj: Object, /) -> int:
"""Return the number of items in a container."""
# https://github.com/python/cpython/blob/v3.8.3/Python/bltinmodule.c#L1536-L1557
# https://github.com/python/cpython/blob/v3.8.3/Objects/abstract.c#L45-L63
# https://github.com/python/cpython/blob/v3.8.3/Objects/typeobject.c#L6184-L6209
type_ = builtins.type(obj)
try:
__len__ = _mro_getattr(type_, "__len__")
except AttributeError:
raise TypeError(f"type {type!r} does not have a __len__() method")
length = __len__(obj)
# Due to len() using PyObject_Size() (which returns Py_ssize_t),
# the returned value is always a direct instance of int via
# PyLong_FromSsize_t().
index = int(_index(length))
if index < 0:
raise ValueError("__len__() should return >= 0")
else:
return index
Implementing operator.truth()
In a lot of programming languages that define the not
operation, it's a common idiom to turn an object into its comparitive boolean value by passing it to not
twice via not not
: once to get the inverted boolean value, and the second time to invert the inversion to get the boolean value that you originally wanted.
In Python we don't need this idiom. Thanks to bool()
(and specifically bool.__new__()
), we have a function call that we can use to get the boolean value; it's exposed via operator.truth()
. And if you look at that method you will discover it uses PyObject_IsTrue()
to determine the boolean value for an object. Looking at slot_nb_bool
, you will see that it ends up doing what PyObject_IsTrue()
does. What all of this means is that if we can implement the analogue of PyObject_IsTrue()
then we can determine what boolean value an object represents.
Using the outline from earlier and what we have covered up until now, we can implement operator.truth()
for this logic (I'm choosing not to implement bool
because I don't want to have to implement all of its numeric functions and I have not come up with a good way to make True
and False
from scratch that inherit from 1
and 0
, respectively, in pure Python):
Implementing not
With operator.truth()
implemented, getting operator.not_()
to work is just lambda a, /: False if truth(a) else True
. The end result is simple, but getting here took a bit of work. 😉
As always, the code in this post can be found in my desugar project.