Snake_Bytes #2: Generators

Welcome to the second installment of Snake_Bytes! Today we're going to take a quick glance at generators.

Normal functions produce one result from their body using the return statement. You call the function, computation happens, then a value or reference gets returned. You can put different return statements in different branches (like early exits), but the important point is the exit happens only once:

On the other hand, functions acting as generators return more than once, conceptually. They return things without fully exiting. In order to do this, we can't overload the meaning of the return statement (because that would be confusing), instead we distinguish multiple returns with the yield expression. (Note that yield is an expression and not a statement, and thus can appear in unique places that return cannot). Any function containing the word yield is automatically a generator:

This example yields once at the beginning, then twice for every pass through the loop, then it exits. Where do these yielded somethings end up? Through some compiler magic, generator functions return an iterator that you can consume with any of Python's built-ins, like for or standard list comprehensions.

(Aside: The compiler magic is sometimes referred to as a state-machine transformation, and sometimes as "inversion of control." You can imagine the compiler taking scissors to the function and making cuts at all the yield expressions. Each piece, called a block, is labeled with an integer ID and then wired up behind what is essentially a big switch statement. The integer IDs are designed to mimic the expected control flow.)

Mechanics aside, so what can you use generators for?

  • Lazy streaming: producing one item at a time rather than eagerly returning a complete collection. This helps memory pressure
  • Eliminating the boilerplate of implementing iterators (like the tricky __next__ method and StopIteration exception)
  • Generally good practice: separating the recipe for producing values from the code that consumes them

One of my favorite applications of generators is when using an API that paginates results with limit/offset:

You can imagine generators on top of APIs, database result sets, lines in a file, or all sorts of things that look like streams. We'll leave this edition with a haiku:

About Ghadi Shayban

I work on backend data engineering at PokitDok, and I'm passionate about bringing health care informatics a couple decades forward. If I were stranded on a desert island and had a laptop, I'd open up Emacs and a Clojure REPL. When not on a computer keyboard, I can be found on a piano keyboard, playing around Charleston, SC.

View All Posts