Unravelling `for` statements
As part of my series on Python's syntactic sugar, I am going to cover the for
statement. As usual, I will be diving into CPython's C code, but understanding or even reading those parts of this post won't be required in order to understand how the unravelling works.
The bytecode
Let's start with a simple for
statement:
Passing the code through the dis
module gives us:
Looking at the output you may notice that the GET_ITER
and FOR_ITER
opcodes seem to be specific to for
statements while the rest are general opcodes.
Looking up GET_ITER
in the eval loop, it appears to mostly be calling PyObject_GetIter()
. I happen to know that it is mostly equivalent to the built-in function iter()
. Since I also know that there will probably be something to do with next()
, I'm going to handle defining both functions later in this post.
Speaking of next()
, the FOR_ITER
opcode is semantically a call to __next__()
on an object's type (as represented by (*iter->ob_type->tp_iternext)(iter)
in the C code). What that means is that understanding how iter()
and next()
work is critical to understanding how for
statements work.
The built-ins
Before covering how iter()
and next()
work, we should define two important terms. An iterable is a container which can return its contained items one at a time. An iterator is an object which streams those contained items. So while every iterator is conceptually an iterable (and you should always make your iterators also an iterable; it isn't much work), the reverse is not true and not all iterables are an iterator on their own.
iter()
The reason I bothered defining those words is because the purpose of iter()
is to take an iterable and return its iterator. Now, what iter()
considers an iterable depends on whether you give it one or two arguments.
iter(iterable)
Starting with the semantics of the function when there's a single argument, we see that the implementation calls PyObject_GetIter()
to get the iterator. The pseudocode for this function, which I will explain in a moment, is:
The first step is looking for the __iter__()
special method on an iterable. Calling this is meant to return an iterator (which is explicitly checked for on the return value). At the C level, the definition of an "iterator" is an object that defines __next__()
; both the iterable and iterator protocols can also be checked via their requisite collections.abc
classes and issubclass()
(I didn't do it this way in the pseudocode simply because the hasattr()
checks are closer to how the C code is written; in actual Python code I would probably use the abstract base classes).
There is a footnote relating to __iter__()
that says if the attribute on the class is set to None
that it won't be used (although I have not come across any explicit code doing this check). I think this is implicitly supported due to how the implementation is written, i.e. None
is not callable and lacks the appropriate methods that are being checked for.
The second step is what happens if __iter__()
isn't available? In that case there's a check to see if we are dealing with a sequence by looking for the __getitem__()
special method. If the object turns out to be a sequence, then an instance of PySeqIter_Type
is returned whose approximate Python implementation would be:
I say "approximate" because the CPython version supports pickling and I don't want to be bothered. 😁
If all of the above fails, then TypeError
is ultimately raised.
iter(callable, sentinel)
If you recall earlier, I mentioned that iter()
had a two-argument version. In this case iter()
is rather different than what we have already discussed:
As you can see there's a check to see if the first argument is callable, and if it is then an instance of the PyCallIter_Type
iterator is returned. With the function being so short, the important question is what does the PyCallIter_Type
iterator do?
The use-case for this form of iter()
is for when you have a no-argument callable that you persistently call until a certain value is returned representing that there isn't anything more coming from the callable.
def _call_iter(callable, sentinel):
while True:
val = callable()
if val == sentinel:
return
else:
yield val
An example of where this could be useful is if you are reading a file a chunk at a time, stopping once b""
is returned:
with open(path, "rb") as file:
for chunk in iter(lambda: file.read(chunk_size), b""):
...
Pulling it all together and doing everything appropriately, the definition of iter()
is:
next()
To get the next value from an iterator, we pass it to the next()
built-in function. It takes an iterator and can optionally take a default value. If the iterator has a value to return, then it's returned. If the iterator is exhausted, though, either StopIteration
is raised or the default value is returned if it's been provided.
The semantics
For each value returned by the iterable's iterator, they are assigned to the loop's target (sometimes called the loop variant), the statements in the body are run, and this repeats until the iterator is exhausted. If there is an else
clause for the for
loop, then it is executed if a break
statement isn't encountered. Breaking this all down gives us:
- Get the iterable's iterator
- Assign the iterator's
next
value to the loop target - Execute the body
- Repeat until the iterator is done
- If there's an
else
clause and abreak
statement isn't hit, execute theelse
clause's body
Kind of sounds like a while
loop with an assignment, doesn't it?
- Get the iterator with
iter()
- Call
next()
and assign the result to the loop target - As long as calling next didn't raise
StopIteration
, execute the body - Repeat as necessary
- Run the
else
clause as appropriate
for
without an else
clause
Let's start with the simpler case of not having an else
clause. In that case we can translate:
to:
If you squint a bit it sort of looks like the for
loop example. It's definitely a lot more verbose and would be a pain to write out every time you wanted to iterate, but it would get the work done.
for
with an else
clause
Now you might have noticed that we used a break
statement in our unravelling above which would cause a while
loop's else
clause to always be skipped. How do we change the translation to work when an else
clause is present?
Let's update our example first:
To eliminate our use of break
in our original unravelling, we need to come up with another way to denote when the iterator is out of items. A variable simply tracking whether there are more values should be enough to get us what we are after.
_iter = iter(b)
_looping = True
while _looping:
try:
a = next(_iter)
except StopIteration:
_looping = False
continue
else:
c
else:
d
del _iter, _looping
This unravelling has a rather convenient side-effect of getting to rely on the while
loop's own else
clause and its semantics to get what we are after for the for
loop!
And that's it! We can leverage a while
loop to implement for
loops with no discernible semantic changes (except for the temp variables).