I wonder how many people realize that Python has a lot of syntactic sugar? I'm not claiming it's like a Lisp-based language where the syntax is as bare bones as possible (although the Lisp comparison is not entirely unfounded), but much of Python's syntax isn't technically needed as under the hood a good chunk of it is just function calls.

But so what? Why care about how Python devolves into less syntax and more function calls? There's two reasons really. One is it's educational to know how Python actually functions to help you understand/debug when something goes awry. Two, it helps detail the bare minimum you need to implement the language.

And so, to both educate myself and to think about what might be required to implement Python for WebAssembly or a bare bones C API, I am writing this blog post about what attribute access looks like when you look beneath the syntax.

Now you could try to piece together exactly what is going on with attribute access by reading the Python language reference. That might lead you to the attribute reference expression and the data model for customizing attribute access, but there's a lot to try to comprehend and tie together into a single story of how attribute access works. And so I prefer to go through the CPython source code to tease out what is going on in the interpreter (and I'm specifically using the CPython 3.8.3 tag of the repository so I have stable links and am using the latest release at the time of writing).

Now there will be some C code in the beginning of this post, but I don't expect you to fully understand what's going on with it. I will explicitly say what you should get from the C code, so if you don't have any background in C it shouldn't hurt your understanding of what I'm about to talk about.

Looking at the bytecode

OK, so let's try to pull apart the following expression:

obj.attr

Probably the most straightforward place to start is with examining the bytecode for this. So, let's disassemble this line and see what the compiler emits for this:

>>> def example(): 
...     obj.attr
... 
>>> import dis
>>> dis.dis(example)
  2           0 LOAD_GLOBAL              0 (obj)
              2 LOAD_ATTR                1 (attr)
              4 POP_TOP
              6 LOAD_CONST               0 (None)
              8 RETURN_VALUE

The key opcode here is LOAD_ATTR. (In case you're interested, it replaces the object on the top of the stack with the result of accessing the named attribute as specified in co_names[i].)

CPython's interpreter loop is kept in Python/ceval.c. At it's core is a massive switch statement that branches based on the opcode to be executed. Looking there you find the following lines of C for LOAD_ATTR:

        case TARGET(LOAD_ATTR): {
            PyObject *name = GETITEM(names, oparg);
            PyObject *owner = TOP();
            PyObject *res = PyObject_GetAttr(owner, name);
            Py_DECREF(owner);
            SET_TOP(res);
            if (res == NULL)
                goto error;
            DISPATCH();
        }
https://github.com/python/cpython/blob/6f8c8320e9eac9bc7a7f653b43506e75916ce8e8/Python/ceval.c#L2963-L2972

Most of that is just stack manipulation code that we can ignore. The key bit is the PyObject_GetAttr() call which is what truly implements attribute access.

That function name looks like another function name ...

Now that name sure looks like getattr(), but in the convention of C function names that CPython uses. Poking around in Python/bltinmodule.c, which houses all of the built-ins in Python, we can check if this hunch is true. Searching that file for "getattr", you find the line which binds the "getattr" name to the builtin_getattr() function.

static PyObject *
builtin_getattr(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
{
    PyObject *v, *name, *result;


    if (!_PyArg_CheckPositional("getattr", nargs, 2, 3))
        return NULL;


    v = args[0];
    name = args[1];
    if (!PyUnicode_Check(name)) {
        PyErr_SetString(PyExc_TypeError,
                        "getattr(): attribute name must be string");
        return NULL;
    }
    if (nargs > 2) {
        if (_PyObject_LookupAttr(v, name, &result) == 0) {
            PyObject *dflt = args[2];
            Py_INCREF(dflt);
            return dflt;
        }
    }
    else {
        result = PyObject_GetAttr(v, name);
    }
    return result;
}
https://github.com/python/cpython/blob/6f8c8320e9eac9bc7a7f653b43506e75916ce8e8/Python/bltinmodule.c#L1060-L1086

There's a bunch of stuff to tease apart parameters and such that doesn't interest us, but you will notice that if you only pass in two arguments to getattr() it ends up calling PyObject_GetAttr().

What does this mean? Well, it means you can directly desugar obj.attr to getattr(obj, "attr")! And that also means that if we can understand PyObject_GetAttr() then we can understand how that function works and thus how attribute access works in Python.

Unravelling getattr()

At this point I'm going to stop pasting in C code as the complexity of the code just goes up from here and it is no longer serving the purpose of demonstrating that obj.attr is syntax for getattr(obj, "attr"). I will continue to point out the relevant C code as comments in the pseudo-code for those that want to follow along in the bowels of CPython, though. Also note that the Python code should be considered pseudo-code as there is attribute access itself in the code implementing attribute access, but at the C level it isn't passing through normal attribute access machinery. So while you may see a . used syntactically in the pseudo-code, know that at the C level the attribute access is not recursive and is actually functioning the way you might naively assume it would.

What we know so far

At this point we know two things about getattr(). One is it takes (at least) two arguments. Two, the second argument must be a subclass of str and when it isn't then TypeError is raised with a static string argument (which is probably static for performance purposes).

def getattr(obj: Any, attr: str, default: Any) -> Any:
    if not isinstance(attr, str):
        raise TypeError("getattr(): attribute name must be string")

    ...  # Fill in with PyObject_GetAttr().
Function signature for getattr()

Looking up attributes via special methods

Attribute access on an object is implemented via two special methods. The first method is __getattribute__() which is called when trying to access any and all attributes. The second is __getattr__() which is called when __getattribute__() raises an AttributeError. The former method is (nowadays) always expected to be defined while the latter method is optional.

Python looks up special methods on an object's type, not the object itself. To be clear, I am very specifically using the word "type" here; the type of an instance is its class, the type for a class is its type. It's luckily very easy to get the type of something thanks to the type constructor returning an object's type: type(obj).

We also need to know the method resolution order (MRO) of the type. This specifies the order of the type hierarchy for an object. The algorithm used by Python is from the Dylan programming language and it's called C3. From Python code the MRO is exposed by type(obj).mro().

Working off of an object's type is on purpose as this allows for faster lookup and access. In general it eliminates an extra lookup by skipping the instance every time we look for something. At an internal CPython level it allows for having special methods live in a struct field for very fast lookup. So while it might seem a little odd at first glance to be somewhat ignoring the direct object and to use its type instead, it's very much on purpose.

Now in the name of simplicity I am going to cheat a little and have getattr() handle both __getattribute__() and __getattr__() methods explicitly, while in CPython it does some trickery under the hood to make an object handle both methods itself. In the end, though, the semantics are the same for our purposes.

# Based on https://github.com/python/cpython/tree/v3.8.3.
from __future__ import annotations
import builtins

NOTHING = builtins.object()  # C: NULL


def getattr(obj: Any, attr: str, default: Any = NOTHING) -> Any:
    """Implement attribute access via  __getattribute__ and __getattr__."""
    # Python/bltinmodule.c:builtin_getattr
    if not isinstance(attr, str):
        raise TypeError("getattr(): attribute name must be string")

    obj_type_mro = type(obj).mro()
    attr_exc = NOTHING
    for base in obj_type_mro:
        if "__getattribute__" in base.__dict__:
            try:
                return base.__dict__["__getattribute__"](obj, attr)
            except AttributeError as exc:
                attr_exc = exc
                break
    # Objects/typeobject.c:slot_tp_getattr_hook
    # It is cheating to do this here as CPython actually rebinds the tp_getattro
    # slot with a wrapper that handles __getattr__() when present.
    for base in obj_type_mro:
        if "__getattr__" in base.__dict__:
            return base.__dict__["__getattr__"](obj, attr)

    if default is not NOTHING:
        return default
    elif attr_exc is not NOTHING:
        raise attr_exc
    else:
        raise AttributeError(f"{self.__name__!r} object has no attribute {attr!r}")
Pseudo-code implementing getattr()

Unravelling object.__getattribute__()

While getting an implementation of getattr() is nice, it unfortunately doesn't tell us a whole lot about Python's rules for looking up an attribute since so much is handled in an object's __getattribute__() method. As such, I will cover how object.__getattribute__() works.

Looking for a data descriptor

The first substantial thing we are going to do in object.__getattribute__() is look for a data descriptor on the type. In case you have never heard of descriptors, it's a way to programmatically control how an individual attribute works. You may not have heard of descriptors, but if you have been using Python for a while I suspect you have used descriptors: properties, classmethod, and staticmethod are all descriptors.

There are two kinds of descriptors: data and non-data. Both kind of descriptors define a __get__ method for getting what the attribute should be. Data descriptors also define __set__ and __del__ methods while non-data descriptors do not; property is a data descriptor, classmethod and staticmethod are non-data descriptors.

If we can't find a data descriptor for the attribute on the type, the next place we look is on the object itself. This is a straight-forward thing thanks to objects having a __dict__ attribute that stores the attributes of the object itself in a dictionary.

If the object itself doesn't have the attribute then we see if a non-data descriptor exists. Since we already searched for a descriptor previously we can assume that if it was found but not already used when we looked for a data descriptor then it's a non-data descriptor.

And finally, if we found the attribute on the type and it wasn't a descriptor, we return that. So to summarize, the search order for an attribute is:

  1. Data descriptor on the types
  2. Anything on the object itself
  3. Non-data descriptor on the types
  4. Anything on the types

You will notice we first look for some kind of descriptor, then if that fails we look for a regular object that matches the kind of descriptor we wanted. And we first look for data, then we look for something else. All of this makes sense when you think about how self.attr = val in an __init__() method is storing data on an object. Chances are that if you did that then you want that before a method or something. And you want descriptors first since if you bothered to programmatically define an attribute you probably meant for that to always be used.

class object:
    def __getattribute__(self: Any, attr: str) -> Any:
        """Attribute access."""
        # Objects/object.c:PyObject_GenericGetAttr
        self_type = type(self)
        if not isinstance(attr, str):
            raise TypeError(
                f"attribute name must be string, not {type(attr).__name__!r}"
            )

        type_attr = descriptor_type_get = NOTHING
        for base in self_type.mro():
            if attr in base.__dict__:
                type_attr = base.__dict__[attr]
                type_attr_type = type(type_attr)
                if "__get__" in type_attr_type.__dict__:
                    descriptor_type_get = type_attr_type.__dict__["__get__"]
                    # Include/descrobject.h:PyDescr_IsData
                    if "__set__" in type_attr_type.__dict__:
                        # Data descriptor.
                        return descriptor_type_get(type_attr, self, self_type)
                    else:
                        break  # Non-data descriptor.
                else:
                    break  # Plain object.

        if attr in self.__dict__:
            return self.__dict__[attr]
        elif type_attr is not NOTHING:
            if descriptor_type_get is not NOTHING:
                return descriptor_type_get(type_attr, self, self_type)
            else:
                return type_attr
        else:
            raise AttributeError(f"{self.__name__!r} object has no attribute {attr!r}")
Pseudo-code for object.__getattribute__()

Summary

As you can see, there's a bunch of things going on when looking up an attribute in Python. While I would say no individual part is overly complicated conceptually, all together it does lead to a lot going on. This is also why some people try to minimize attribute access in Python when in very performance-critical code to avoid all of this machinery.

And as a historical note, almost all of these semantics came to Python as part of new-style classes compared to "classic" classes. This distinction went away in Python 3 when classic classes were left behind, so if you don't know about classic classes then that's probably a good thing.

Hopefully you found this interesting and educational. If people enjoyed this enough I may write more posts where I unravel more of Python's syntactic sugar. You can find the code above at https://github.com/brettcannon/desugar.