Iterators, Iterables and the `next()` Function

Python Assets

2023-03-29

If you have heard about iterators or iterable objects, which is quite common in Python, or read some code that makes use of the next() or iter() functions, and if you are interested in knowing what all that is about, this is the article for you. Iterators and iterable objects (or just iterables) are used all the time, though often indirectly, and form the basis of the for loop and more complex functionality like generators (yield) and asynchronous tasks (async).

Both iterators and iterables are Python objects. Thus, as a first formal definition of these concepts, let's say that an iterator is an object that implements the __next__() method, while an iterable is an object that implements the __iter__() method (in Python, methods whose names begin and end with a double underscore are known as magic methods). Consequently, if a is an iterator and b is an iterable object, both a.__iter__() and b.__next__( ) calls are necessarily valid. Regarding its functionality, the task of an iterator is to fetch items from a container, which is any data type that allows storing more than one value, such as lists, tuples and dictionaries. A container is iterable if there is an iterator capable of fetching its items.

To better understand these formal and functional definitions, it would be good to identify the problem that iterators and iterables come to solve. So let's assume that we are Guido van Rossum, the creator of Python, in the late 1980s and that we are about to invent one of the most popular languages in the world. It's time to implement the functionality of the for loop. As we know, the for keyword is used to loop over the items of a container one by one. For example:

# Print each item on a new line.
my_list = ["A", "B", "C", "D"]
for item in my_list:
    print(item)

If we need to write an implementation to interpret an alike for loop, without going into too much detail about how the Python interpreter works, the most logical thing to do would be to use a while loop. The official Python interpreter is written in C (hence the name "CPython"), and C's while is exactly the same as Python's. So let's take the license to write it in Python as well, resulting in something like this:

	`# Implementation of the former code using while.`
	`my_list = ["A", "B", "C", "D"]`
	`i = 0`
	`while i < len(my_list):`
	`item = my_list[i]`
	`print(item) # <----- For loop body.`
	`i += 1`

It seems then that the internal logic of the for loop to traverse the items of a container would be as follows: (a) create a counter starting at zero (line 3), (b) get the item in that position and store it in some variable (line 5), (c) execute the body of the loop (line 6), (d) increment the counter by one. Then the process starts again from step (b) until the counter is equal to the number of items (line 4).

Great! We have already implemented the for loop in our yet-to-be-created Python. However, we have only tested it with a list. Does it work with a tuple?

my_tuple = ("A", "B", "C", "D")
i = 0
while i < len(my_tuple):
    item = my_tuple[i]
    print(item)         # <----- For loop body.
    i += 1

It does. Does it work with a dictionary?

my_dict = {"a": 1, "b": 2, "c": 3}
i = 0
while i < len(my_dict):
    item = my_dict[i]
    print(item)         # <----- For loop body.
    i += 1

It doesn't. It throws the following exception:

Traceback (most recent call last):
    (...)
    item = my_dict[i]
           ~~~~~~~^^^
KeyError: 0

This happens because our for implementation works with a counter in order to fetch each item. But dictionaries keys might not be numbers at all, as in this case ("a", "b", "c").

Does it work with sets?

my_set = {"A", "B", "C", "D"}
i = 0
while i < len(my_set):
    item = my_set[i]
    print(item)         # <----- For loop body.
    i += 1

It doesn't work either, as sets don't even support the my_set[i] syntax, since their items are not ordered. Instead, my_set.pop() should be used to fetch an item.

However, all these containers and many others are supported by Python's for loop:

my_dict = {"a": 1, "b": 2, "c": 3}
my_set = {"A", "B", "C", "D"}
my_range = range(1, 10)

for key in my_dict:
    print(key)

for item in my_set:
    print(item)

for number in my_range:
    print(number)

So we should modify our original implementation to consider not only lists and tuples, but also dictionaries, sets, ranges, and every other object that can be looped over with for. Although this could be done by checking the data type of the container and running the appropriate code to loop over it (by using keys if it's a dictionary, pop() if it's a set, etc.), it wouldn't be a smart implementation. This solution would not only lead to boilerplate code, it also means that Python would only be able to loop over containers that have been considered while writing the interpreter, and thus programmers wouldn't have possibility of creating their own for-supported (i.e. iterable) objects (and this is something that Python actually makes possible).

This is where the iterators come in. The implementation of the for loop doesn't need to be aware of every container data type. Instead, each data type (lists, tuples, dictionaries, ranges, arrays, strings, etc.) will be responsible for providing its own iterator that implements the logic to loop over its items with a common interface. The iterator interface is quite simple: expose a __next__() function that returns an item from the container or throws the StopIteration exception when there are no remaining items to return. That's why we said at the beginning that the task of an iterator is to fetch items from a container. For example, the list iterator could look something like this:

class ListIterator:

    def __init__(self, lst):
        self.lst = lst
        self.i = 0

    def __next__(self):
        if self.i >= len(self.lst):
            raise StopIteration
        item = self.lst[self.i]
        self.i += 1
        return item

This class receives a list as an argument and returns its items in order on every call to __next__(). When the internal counter indicates that all items have been returned, it raises StopIteration. Let's check this behaviour:

my_list = ["A", "B", "C", "D"]
list_iterator = ListIterator(my_list)
print(list_iterator.__next__())    # A
print(list_iterator.__next__())    # B
print(list_iterator.__next__())    # C
print(list_iterator.__next__())    # D
print(list_iterator.__next__())    # StopIteration

Instead of calling __next__() directly, we would typically use the next() built-in function:

my_list = ["A", "B", "C", "D"]
list_iterator = ListIterator(my_list)
print(next(list_iterator))    # A
print(next(list_iterator))    # B
print(next(list_iterator))    # C
print(next(list_iterator))    # D
print(next(list_iterator))    # StopIteration

The next() wrapper just calls the __next__() method of the iterator it receives as an argument. The sole difference is that next() supports a second argument that allows us to specify a default return value when the iterator throws StopIteration.

Now the for implementation no longer has to be case-by-case and can just call next() until the iterator throws StopIteration. So whenever Python comes across code like this:

my_list = ["A", "B", "C", "D"]
for item in my_list:
    print(item)

The interpreter will internally run:

container = ["A", "B", "C", "D"]
# Get an iterator for this container.
container_iterator = iter(container)
while True:
    try:
        item = next(container_iterator)
    except StopIteration:
        break
    print(item)         # <----- For loop body.

Thus, it doesn't matter what data type the container has, as long as it provides its own iterator, and that is precisely the definition of an iterable object. An object is iterable if it implements the __iter__() method, which must return an iterator for that iterable object. But instead of calling container.__iter__() directly, the iter() built-in is used.

Python's built-in containers (such as lists, tuples, dictionaries, sets, ranges, etc.) provide their own iterators, so they are all iterable objects. We can check this by calling iter() on some container objects:

>>> my_list = ["A", "B", "C", "D"]
>>> my_dict = {"a": 1, "b": 2, "c": 3}
>>> my_set = {"A", "B", "C", "D"}
>>> my_range = range(1, 10)
>>> iter(my_list)
<list_iterator object at 0x000001CB73A657E0>
>>> iter(my_dict)
<dict_keyiterator object at 0x000001CB73C15BC0>
>>> iter(my_set)
<set_iterator object at 0x000001CB73A1BA00>
>>> iter(my_range)
<range_iterator object at 0x000001CB73C5F190>

list_iterator is just a C implementation (in the official Python interpreter) of our ListIterator. The code is even available on GitHub. The equivalent of our ListIterator.__init__() is as follows:

static PyObject *
list_iter(PyObject *seq)
{
    _PyListIterObject *it;

    if (!PyList_Check(seq)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    it = PyObject_GC_New(_PyListIterObject, &PyListIter_Type);
    if (it == NULL)
        return NULL;
    /* This line is equivalent to our `self.i = 0` */
    it->it_index = 0;
    /* `self.lst = lst` */
    it->it_seq = (PyListObject *)Py_NewRef(seq);
    _PyObject_GC_TRACK(it);
    return (PyObject *)it;
}

And the equivalent of ListIterator.__next__():

static PyObject *
listiter_next(listiterobject *it)
{
    PyListObject *seq;
    PyObject *item;

    assert(it != NULL);
    seq = it->it_seq;
    if (seq == NULL)
        return NULL;
    assert(PyList_Check(seq));

    /* if self.i >= len(self.lst): */
    if (it->it_index < PyList_GET_SIZE(seq)) {
        /* item = self.lst[self.i] */
        item = PyList_GET_ITEM(seq, it->it_index);
        /* self.i += 1 */
        ++it->it_index;
        /* return item */
        return item;
    }

    it->it_seq = NULL;
    Py_DECREF(seq);
    return NULL;
}

What is missing here is the part responsible of throwing StopIteration, which is done elsewhere in CPython code since that behaviour is common to all iterators.

dict_keyiterator, set_iterator, and range_iterator do the same for dictionaries, sets, and ranges, respectively. An implementation of the set_iterator in Python might look like this:

class SetIterator:

    def __init__(self, set_):
        self.set = set_

    def __next__(self):
        if not self.set:
            raise StopIteration
        return self.set.pop()

If an object a is iterable, we can get its own iterator by calling iter(a). If it is not iterable, iter() throws TypeError. For example, integers are not iterable:

>>> a = 5
>>> iter(a)
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable

If an object is not iterable, then we can't loop over it either, since Python tries to get an iterator when interpreting a for block. Even the exception is exactly the same:

>>> a = 5
>>> for n in a:
...     print(n)
...
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable

If the it object is an iterator, then you can call next(it) repeatedly to get each item from the container which it is attached to. Once every item has been returned, successive calls to next(it) raise StopIteration. There is no way to reset an iterator in order to retrieve items from the beginning after StopIteration has been thrown. Should you need to loop over the items again, simply create another iterator with a new call to iter().

>>> my_list = ["A", "B", "C"]
>>> it = iter(my_list)
>>> next(it)
'A'
>>> next(it)
'B'
>>> next(it)
'C'
>>> next(it)                          # No more items to return.
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
StopIteration
>>> it = iter(my_list)                # Create a new iterator.
>>> next(it)
'A'

A for loop can be called on either an iterable or an iterator. The following codes are equivalent:

my_list = ["A", "B", "C", "D"]
# Python internally gets the `list_iterator` by calling
# `iter(my_list)` and loops over it.
for item in my_list:
    print(item)

# We might get the iterator first and then loop
# over it.
list_iterator = iter(my_list)
for item in list_iterator:
    print(item)

Note that in the second case, Python will still try to call iter(list_iterator) . That's why iterators, in addition to implementing __next__(), must also implement __iter__() and return a reference to themselves so they can be used in a for. Therefore, our ListIterator, to be considered an iterator according to the interface defined in the documentation, should look as follows:

class ListIterator:

    def __init__(self, lst):
        self.lst = lst
        self.i = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.i >= len(self.lst):
            raise StopIteration
        item = self.lst[self.i]
        self.i += 1
        return item

It would be reasonable to ask: why are iterables and iterators two different objects? Couldn't a list implement __next__() directly, so that it's not necessary to first get its iterator via iter()? Such an implementation is perfectly plausible, but it would have the (unwanted) effect that after traversing a list it would not be possible to traverse it again (unless it is recreated). There are objects where this is the expected behavior, such as a file, which, once all its content have been read, cannot be read again if it is not closed and reopened.

So far the explanation about iterators and iterable objects. Even if we're not actually going to create our own Python interpreter, it's extremely useful to know how it implements many of the mechanisms we use every day in order to create more pythonic and efficient code. Let's finally see some real cases where the manual use or creation of iterators becomes a smart alternative to solve a problem.

Example 1. Using `next()` and `iter()` ¶

See Generating Prime Numbers for an example that uses the next() and iter() functions to solve a factorization algorithm.

Example 2. Implementing an iterator¶

Suppose you have a names.txt file that contains a name on each line:

John
Sophie
Casey
Daniel

If you want to print each name on a new line, you can do:

with open("names.txt", encoding="utf8") as f:
    for name in f:
        # Remove the newline and print the name.
        print(name.strip())

f is an iterator that on each call to next() returns a line from the file, with the benefit of not loading the whole file into memory: each line is flushed from memory once the next one has been read, which makes it possible to read very large files. The alternative method is f.readlines(), which returns a list containing every single line of the file.

Now what if names are separated by hyphens instead of newlines? The file would look like this:

John-Sophie-Casey-Daniel

How would you loop over this new file to print one name per line? Of course, you can load the entire content of the file and separate it via the split("-") method:

with open("names.txt", encoding="utf8") as f:
    names = f.read().split("-")
    for name in names:
        print(name)

This may work for small files, but what if you have a really big file with billions of names? You would quickly run out of memory. But since you know about iterators, you can implement a new iterator that takes care of reading the names separated by hyphens one by one without storing the whole file content in memory.

class NameIterator:

    def __init__(self, f):
        self.f = f

    def __iter__(self):
        return self

    def __next__(self):
        # This list stores read characters until finding a
        # hyphen or the end of file.
        charbuffer = []
        while True:
            # Read char by char.
            c = self.f.read(1)
            # If we found a hyphen, return characters read until now.
            if c == "-":
                break
            # If there are no more characters to be read...
            elif not c:
                if charbuffer:
                    # ...and our buffer is not empty yet, return
                    # the stored characters.
                    break
                else:
                    # ...and our buffer is empty, communicate that
                    # the iterator has been exhausted.
                    raise StopIteration
            charbuffer.append(c)
        # Join the buffer content in a single string.
        return "".join(charbuffer)

with open("names.txt", encoding="utf8") as f:
    for nombre in NameIterator(f):
        print(nombre)

Example 1. Using next() and iter() ¶

Example 2. Implementing an iterator¶

Related Posts

Example 1. Using `next()` and `iter()` ¶