Nov 2, 2024 3 min read Python

Don't return named tuples in new APIs

In my opinion, you should only introduce a named tuple to your code when you're updating a preexisting API that was already returning a tuple or you are wrapping a tuple return value from another API.

Let's start with when you should use named tuples. Usually an API that returns a tuple does so when you only have a couple of items in your tuple and the name of the function returning the tuple id enough to explain what each item in the tuple does. But sometimes your API expands and you find that your tuple is no longer self-documenting purely based on the name of the API (e.g., get_mouse_position() very likely has a two-item tuple of X and Y coordinates of the screen while app_state() could be a tuple of anything). When you find yourself in the situation of needing your return type to describe itself and a tuple isn't cutting it anymore, then that's when you reach for a named tuple.

So why not start out that way? In a word: simplicity. Now, some of you might be saying to yourself, "but I use named tuples because they are so simple to define!" And that might be true for when you define your data structure (and I'll touch on this "simplicity of definition" angle later), but it actually makes your API more complex for both you and your users to use. For you, it doubles the data access API surface for your return type as you have to now support index-based and attribute-based data access forever (or until you choose to break your users and change your return type so it doesn't support both approaches). This leads to writing tests for both ways of accessing your data, not just one of them. And you shouldn't skimp on this because you don't know if your users will use indexes or attribute names to access the data structure, nor can you guarantee someone won't break your code in the future by dropping the named tuple and switching to some custom type (thanks to Python's support of structural typing (aka duck typing), you can't assume people are using a type checker and thus the structure of your return type becomes your API contract). And so you need to test both ways of using your return type to exercise that contract you have with your users, which is more work than had you not used a named tuple and instead chose just a tuple or just a class.

Named tuples are also a bit more complex for users. If you're reaching for a named tuple you're essentially signalling upfront that the data structure is too big/complex for a tuple alone to work. And yet by using a named tuple means you are supporting the tuple approach even if you don't think it's a good idea from the start. On top of that, the tuple API allows for things that you probably don't want people doing with your return type, like slicing, iterating over all the items as if they are homogeneous, etc. Basically my argument is the "flexibility" of having the index-based access to the data on top of the attribute-based access isn't flexible in a good way.

So why do people still reach for named tuples when defining return types for new APIs? I think it's because people find them faster to define a new type than writing out a new class. Compare this:

Point = namedtuple('Point', ['x', 'y', 'z'])

To this:

class Point:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

So there is a clear difference in the amount of typing. But there are three more ways to do the same data structure that might not be so burdensome. One is dataclasses:

@dataclasses.dataclass
class Point:
    x: int
    y: int
    z: int

Another is simply a dictionary, although I know some prefer attribute-based access to data so much that they won't use this option). Toss in a TypedDict and you also get editor support as well:

class Point(typing.TypedDict):
    x: int
    y: int
    z: int

# Alternatively ...
Point = typing.TypedDict("Point", {"x": int, "y": int, "z": int})

A third option is types.SimpleNamespace if you really want attributes without defining a class:

Point = lambda x, y, z: types.SimpleNamespace(x=x, y=y, z=z)

If none of these options work for you then you can always hope that somehow I convince enough people that my record/struct idea is a good one and get into the language. 😁

My key point in all of this is to prefer readability and ergonomics over brevity in your code. That means avoiding named tuples except where you are expanding to tweaking an existing API where the named tuple improves over the plain tuple that's already being used.

You might also like...