Last Updated: Sunday 2nd February 2014

At a glance, the yield statement is used to define generators, replacing the return of a function to provide a result to its caller without destroying local variables. Unlike a function, where on each call it starts with new set of variables, a generator will resume the execution where it was left off.

About Python Generators

Since the yield keyword is only used with generators, it makes sense to recall the concept of generators first.

The idea of generators is to calculate a series of results one-by-one on demand (on the fly). In the simplest case, a generator can be used as a list, where each element is calculated lazily. Lets compare a list and a generator that do the same thing - return powers of two:

Iterating over the list and the generator looks completely the same. However, although the generator is iterable, it is not a collection, and thus has no length. Collections (lists, tuples, sets, etc) keep all values in memory and we can access them whenever needed. A generator calculates the values on the fly and forgets them, so it does not have any overview about the own result set.

Generators are especially useful for memory-intensive tasks, where there is no need to keep all of the elements of a memory-heavy list accessible at the same time. Calculating a series of values one-by-one can also be useful in situations where the complete result is never needed, yielding intermediate results to the caller until some requirement is satisfied and further processing stops.

Using the Python "yield" keyword

A good example is a search task, where typically there is no need to wait for all results to be found. Performing a file-system search, a user would be happier to receive results on-the-fly, rather the wait for a search engine to go through every single file and only afterwards return results. Are there any people who really navigate through all Google search results until the last page?

Since a search functionality cannot be created using list-comprehensions, we are going to define a generator using a function with the yield statement/keyword. The yield instruction should be put into a place where the generator returns an intermediate result to the caller and sleeps until the next invocation occurs. Let's define a generator that would search for some keyword in a huge text file line-by-line.

Now, assuming that my "directory.txt" file contains a huge list of names and phone numbers, lets find someone with "Python" in the name:

When we call the search function, its body code does not run. The generator function will only return the generator object, acting as a constructor:

This is a bit tricky, since everything below def search(keyword, filename): is normally meant to execute after calling it, but not in the case of generators. In fact, there was even a long discussion, suggesting to use "gen", or other keywords to define a generator. However, Guido decided to stick with "def", and that's it. You can read the motivation on PEP-255.

To make the newly-created generator calculate something, we need to access it via the iterator protocol, i.e. call it's next method:

The debug string was printed, and we got the first search result without looking through the whole file. Now let's request the next match:

The generator resumed on the last yield keyword/statement and went through the loop until it hit the yield keyword/statement again. However, Fritz is still not the right guy. Next, please:

Finally, we found him. Now you could call him and say "thank you" for great generators in Python!

More generator details and examples

As you may noticed, the first time the function runs, it will go from the beginning until it reaches the yield keyword/statement, returning the first result to the caller. Then, each other call will resume the generator code where is was left of. If the generator function does not hit the yield keyword/statement anymore, it will raise a StopIteration exception (just like all iterable objects do when they are exhausted/finished).

To run the yield on subsequent calls, the generator can contain a loop or multiple yield statements:

It usually makes more sense to use a generator as a conveyor, chaining functions to work on some sequence efficiently. A good example is buffering: fetching data in large chunks and processing in small chunks:

This approach allows the processing function to abstract away from any buffering issues. It can just get the values one by one using the generator that will take care of buffering.

Even simple tasks can be more efficient using the idea of generators. In Python 2.X, a common range() function in Python is often substituted by xrange(), which yields values instead of creating the whole list at once:

And finally, a "classical" example of generators: calculate the first N given number of Fibonacci numbers:

Numbers are calculated until the counter reaches 'n'. This example is so popular because the Fibonacci sequence is infinite, making it problematic to fit in memory.

So far the most practical aspects of Python generators have been described. For more detailed info and an interesting discussion take a look at the Python Enhancement Proposal 255, which discusses the feature of the language in detail.

Happy Pythoning!

  • pinkfloyda

    in the above code the_generator = (x+x for x in range(3)), i think the output should be 0 2 4

    • Jackson Cooper

      Yeah, you’re right. Thanks, it’s been fixed.

  • Bharat Khatri

    ‘generator started’ won’t be printed every time you call next(the_generator), but only the first time.

  • RG1985

    What does mean by “When we call the search function, its body code does not run”. If this statement is true, in the next call, the debug message should not be appear! I try understand what do you say using this example :

    def generator():
    print(“generator start”)
    for x in xrange(10):
    yield x


    generator start
    generator start
    According to your explanation, we have to get :
    generator start