At a glance, the yield
statement is used to define generators, replacing the return
of a function to provide a result to its caller without destroying local variables. Unlike a function, where on each call it starts with new set of variables, a generator will resume the execution where it was left off.
About Python Generators
Since the yield
keyword is only used with generators, it makes sense to recall the concept of generators first.
The idea of generators is to calculate a series of results one-by-one on demand (on the fly). In the simplest case, a generator can be used as a list
, where each element is calculated lazily. Lets compare a list and a generator that do the same thing - return powers of two:
[python]
>>> # First, we define a list
>>> the_list = [2**x for x in range(5)]
>>>
>>> # Type check: yes, it's a list
>>> type(the_list)
<class 'list'>
>>>
>>> # Iterate over items and print them
>>> for element in the_list:
... print(element)
...
1
2
4
8
16
>>>
>>> # How about the length?
>>> len(the_list)
5
>>>
>>> # Ok, now a generator.
>>> # As easy as list comprehensions, but with '()' instead of '[]':
>>> the_generator = (x+x for x in range(3))
>>>
>>> # Type check: yes, it's a generator
>>> type(the_generator)
<class 'generator'>
>>>
>>> # Iterate over items and print them
>>> for element in the_generator:
... print(element)
...
0
2
4
>>>
>>> # Everything looks the same, but the length...
>>> len(the_generator)
Traceback (most recent call last):
File "", line 1, in
TypeError: object of type 'generator' has no len()
[/python]
[python]
>>> # First, we define a list
>>> the_list = [2**x for x in range(5)]
>>>
>>> # Type check: yes, it's a list
>>> type(the_list)
<type 'list'>
>>>
>>> # Iterate over items and print them
>>> for element in the_list:
... print(element)
...
1
2
4
8
16
>>>
>>> # How about the length?
>>> len(the_list)
5
>>>
>>> # Ok, now a generator.
>>> # As easy as list comprehensions, but with '()' instead of '[]':
>>> the_generator = (x+x for x in range(3))
>>>
>>> # Type check: yes, it's a generator
>>> type(the_generator)
<type 'generator'>
>>>
>>> # Iterate over items and print them
>>> for element in the_generator:
... print(element)
...
0
2
4
>>>
>>> # Everything looks the same, but the length...
>>> len(the_generator)
Traceback (most recent call last):
File "", line 1, in
TypeError: object of type 'generator' has no len()
[/python]
Iterating over the list and the generator looks completely the same. However, although the generator is iterable, it is not a collection, and thus has no length. Collections (lists, tuples, sets, etc) keep all values in memory and we can access them whenever needed. A generator calculates the values on the fly and forgets them, so it does not have any overview about the own result set.
Generators are especially useful for memory-intensive tasks, where there is no need to keep all of the elements of a memory-heavy list accessible at the same time. Calculating a series of values one-by-one can also be useful in situations where the complete result is never needed, yielding intermediate results to the caller until some requirement is satisfied and further processing stops.
Using the Python "yield" keyword
A good example is a search task, where typically there is no need to wait for all results to be found. Performing a file-system search, a user would be happier to receive results on-the-fly, rather the wait for a search engine to go through every single file and only afterwards return results. Are there any people who really navigate through all Google search results until the last page?
Since a search functionality cannot be created using list-comprehensions, we are going to define a generator using a function with the yield
statement/keyword. The yield
instruction should be put into a place where the generator returns an intermediate result to the caller and sleeps until the next invocation occurs. Let's define a generator that would search for some keyword in a huge text file line-by-line.
[python]
def search(keyword, filename):
print('generator started')
f = open(filename, 'r')
# Looping through the file line by line
for line in f:
if keyword in line:
# If keyword found, return it
yield line
f.close()
[/python]
Now, assuming that my "directory.txt" file contains a huge list of names and phone numbers, lets find someone with "Python" in the name:
[python]
>>> the_generator = search('Python', 'directory.txt')
>>> # Nothing happened
[/python]
When we call the search function, its body code does not run. The generator function will only return the generator object, acting as a constructor:
[python]
>>> type(search)
<class 'function'>
>>> type(the_generator)
<class 'generator'>
[/python]
[python]
>>> type(search)
<type 'function'>
>>> type(the_generator)
<type 'generator'>
[/python]
This is a bit tricky, since everything below def search(keyword, filename):
is normally meant to execute after calling it, but not in the case of generators. In fact, there was even a long discussion, suggesting to use "gen", or other keywords to define a generator. However, Guido decided to stick with "def", and that's it. You can read the motivation on PEP-255.
To make the newly-created generator calculate something, we need to access it via the iterator protocol, i.e. call it's next
method:
[python]
>>> print(next(the_generator))
generator started
Anton Pythonio 111-222-333
[/python]
[python]
>>> print(the_generator.next())
generator started
Anton Pythonio 111-222-333
[/python]
The debug string was printed, and we got the first search result without looking through the whole file. Now let's request the next match:
[python]
>>> print(next(the_generator))
generator started
Fritz Pythonmann 128-256-512
[/python]
[python]
>>> print(the_generator.next())
generator started
Fritz Pythonmann 128-256-512
[/python]
The generator resumed on the last yield
keyword/statement and went through the loop until it hit the yield
keyword/statement again. However, Fritz is still not the right guy. Next, please:
[python]
>>> print(next(the_generator))
generator started
Guido Pythonista 123-456-789
[/python]
[python]
>>> print(the_generator.next())
generator started
Guido Pythonista 123-456-789
[/python]
Finally, we found him. Now you could call him and say "thank you" for great generators
in Python!
More generator details and examples
As you may noticed, the first time the function runs, it will go from the beginning until it reaches the yield
keyword/statement, returning the first result to the caller. Then, each other call will resume the generator code where is was left of. If the generator function does not hit the yield
keyword/statement anymore, it will raise a StopIteration
exception (just like all iterable objects do when they are exhausted/finished).
To run the yield
on subsequent calls, the generator can contain a loop or multiple yield
statements:
[python]
def hold_client(name):
yield 'Hello, %s! You will be connected soon' % name
yield 'Dear %s, could you please wait a bit.' % name
yield 'Sorry %s, we will play a nice music for you!' % name
yield '%s, your call is extremely important to us!' % name
[/python]
It usually makes more sense to use a generator as a conveyor, chaining functions to work on some sequence efficiently. A good example is buffering: fetching data in large chunks and processing in small chunks:
[python]
def buffered_read():
while True:
buffer = fetch_big_chunk()
for small_chunk in buffer:
yield small_chunk
[/python]
This approach allows the processing function to abstract away from any buffering issues. It can just get the values one by one using the generator that will take care of buffering.
Even simple tasks can be more efficient using the idea of generators. In Python 2.X, a common range()
function in Python is often substituted by xrange()
, which yields
values instead of creating the whole list at once:
[python]
>>> # "range" returns a list
>>> type(range(0, 3))
<class 'list'>
>>> # xrange does not exist in Python 3.x
[/python]
[python]
>>> # "range" returns a list
>>> type(range(0, 3))
<type 'list'>
>>> # xrange returns a generator-like object "xrange"
>>> type(xrange(0, 3))
<type 'xrange'>
>>>
>>> # It can be used in loops just like range
>>> for i in xrange(0, 3):
... print(i)
...
0
1
2
[/python]
And finally, a "classical" example of generators: calculate the first N given number of Fibonacci numbers:
[python]
def fibonacci(n):
curr = 1
prev = 0
counter = 0
while counter < n:
yield curr
prev, curr = curr, prev + curr
counter += 1
[/python]
Numbers are calculated until the counter reaches 'n
'. This example is so popular because the Fibonacci sequence is infinite, making it problematic to fit in memory.
So far the most practical aspects of Python generators have been described. For more detailed info and an interesting discussion take a look at the Python Enhancement Proposal 255, which discusses the feature of the language in detail.
Happy Pythoning!