11 Dec 2020 2 min read syntactic sugar

Unravelling membership testing

This post in my series on Python's syntactic sugar, I am going to cover membership testing: in and not in. As the language reference says, "the operators in and not in test for membership". In other words, in and not in are used to check if an object is contained by some container of other objects (e.g. a list, tuple, set, dict, bytes, string, etc.). Much like with not, Python supports a very direct way for objects to implement membership testing and a logical fallback to something that objects may have already implemented.

But to begin, let's look at the technical details.

How it works at the C level

If you disassemble in and not in you will see that they use COMPARE_OP to implement membership testing.

>>> import dis
>>> def spam(a, b):
...   b in a
...   b not in a
... 
>>> dis.dis(spam)
  2           0 LOAD_FAST                1 (b)
              2 LOAD_FAST                0 (a)
              4 COMPARE_OP               6 (in)
              6 POP_TOP

  3           8 LOAD_FAST                1 (b)
             10 LOAD_FAST                0 (a)
             12 COMPARE_OP               7 (not in)
             14 POP_TOP
             16 LOAD_CONST               0 (None)
             18 RETURN_VALUE

Bytecode disassembly of in and not in

As covered in the rich comparison blog post, COMPARE_OP works its way down to cmp_outcome() which calls PySequence_Contains() and then returns True or False based on the result of that call.

The first thing that PySequence_Contains() does is it tries calling __contains__() on the object's type if it exists. If it does, then it's called and the return value is checked to see if it's None as implemented by slot_sq_contains. If it happens to be None, then TypeError is raised to short-circuit things (which will make sense in a second). For all other return values, the truth value of the object is returned.

But if __contains__() is not defined, then PySequence_IterSearch() is used to iterate over the container object to look for the object being searched for. If any item returned via iteration matches by identity or equality then the container is said to contain the object.

(I am purposefully going to punt here on unravelling iteration and save that for when I tackle for loops.)

Implementation

So, __contains__() is called and its result is passed to operator.truth() as long as it isn't None (which leads to TypeError). If __contains__() doesn't exist, though, then the result is any(x is item or x == item for x in container) is returned (which is literally what the language reference specifies). This is exposed as operator.contains().

def __contains__(container: Any, item: Any, /) -> bool:
    """Check if the first item contains the second item: `b in a`."""
    container_type = type(container)
    try:
        contains_method = debuiltins._mro_getattr(container_type, "__contains__")
    except AttributeError:
        # Cheating until `for` is unravelled (and thus iterators).
        return debuiltins.any(x is item or x == item for x in container)
    else:
        if contains_method is None:
            raise TypeError(f"{container_type.__name__!r} object is not a container")
        is_contained = contains_method(container, item)
        return truth(is_contained)

Implementation of operator.contains()

This means that a in b unravels to operator.contains(b, a) and a not in b unravels to operator.not_(operator.contains(b, a)).

As always, the code from this blog post can be found in my desugar project (which also includes an implementation of any()).

How it works at the C level

Implementation

You might also like...