Unravelling membership testing
This post in my series on Python's syntactic sugar, I am going to cover membership testing: in
and not in
. As the language reference says, "the operators in and not in test for membership". In other words, in
and not in
are used to check if an object is contained by some container of other objects (e.g. a list, tuple, set, dict, bytes, string, etc.). Much like with not
, Python supports a very direct way for objects to implement membership testing and a logical fallback to something that objects may have already implemented.
But to begin, let's look at the technical details.
How it works at the C level
If you disassemble in
and not in
you will see that they use COMPARE_OP
to implement membership testing.
As covered in the rich comparison blog post, COMPARE_OP
works its way down to cmp_outcome()
which calls PySequence_Contains()
and then returns True
or False
based on the result of that call.
The first thing that PySequence_Contains()
does is it tries calling __contains__()
on the object's type if it exists. If it does, then it's called and the return value is checked to see if it's None
as implemented by slot_sq_contains
. If it happens to be None
, then TypeError
is raised to short-circuit things (which will make sense in a second). For all other return values, the truth value of the object is returned.
But if __contains__()
is not defined, then PySequence_IterSearch()
is used to iterate over the container object to look for the object being searched for. If any item returned via iteration matches by identity or equality then the container is said to contain the object.
(I am purposefully going to punt here on unravelling iteration and save that for when I tackle for
loops.)
Implementation
So, __contains__()
is called and its result is passed to operator.truth()
as long as it isn't None
(which leads to TypeError
). If __contains__()
doesn't exist, though, then the result is any(x is item or x == item for x in container)
is returned (which is literally what the language reference specifies). This is exposed as operator.contains()
.
This means that a in b
unravels to operator.contains(b, a)
and a not in b
unravels to operator.not_(operator.contains(b, a))
.
As always, the code from this blog post can be found in my desugar project (which also includes an implementation of any()
).