A Subtle Python Threading Bug That Isn't About Threading
I recently encountered a puzzling bug in code that looked something like this:
1import threading
2import numpy as np
3
4def case1(num_threads = 8):
5 nums = np.arange(2000)
6 results = []
7 threads = [threading.Thread(target=lambda: results.append(sum(batch))) for batch in np.array_split(nums, num_threads)]
8
9 for t in threads:
10 t.start()
11
12 for t in threads:
13 t.join()
14 return sum(results)
This is a pretty common pattern for handling embarrassingly parallel problems in Python. We have a worker function (in this case sum) that takes a chunk of our data, processes it, and returns a result. My real use case involved loading a large number of small parquet files from cloud storage. This is an IO bound problem making it a good candidate for threading. Unfortunately, the above code has a fatal flaw, can you spot it?
I'll give you a hint, all the talk about threading is irrelevant.
The issue is how python handles closures, like the lambda function in the example.
Closures in Python
Closure: A closure is a function object that has access to variables in its lexical scope, even after the scope has finished executing.
In the code above, the lambda function is a closure because it captures batch
. Here batch
is being captured by reference, not by value. This means all threads end up using the same batch
value—the last one from the loop. Beyond lambdas, Python creates closures whenever you have nested functions, generator expressions, or decorator functions that access variables from their enclosing scope.
Variable Capturing: The process by which a closure gains access to and retains references to variables from its enclosing scope, allowing it to use those variables even after the enclosing scope has finished executing.
Python creates closures using a mechanism called "cell objects".
When a nested function references a variable from an enclosing scope, Python creates a cell to hold a reference to that variable.
The closure stores these cells in its __closure__
attribute. You can inspect this by calling func.__closure__
on any closure function.
Each cell points to the actual variable object, not a copy of its value at creation time.
This is why all lambdas in a loop often end up with the same value—they're all pointing to the same cell, which references the loop variable after it has finished iterating.
This behavior differs from languages like Rust or Go, where loop variables are typically captured by value. JavaScript has similar late binding but uses block scope with let
to create a new binding for each iteration.
Python's approach is consistent with its general philosophy of variables as name bindings rather than storage locations, but it can surprise developers coming from other languages.
A Minimal Example
Now that we're all up to speed on closures, let's break this problem down to its simplest form. We use a lambda in a list comprehension to initializex
, a list of functions.
On the following line we use a list comprehension to evaluate each function in x
and collect the results in a new list.
>>> x = [lambda: i for i in range(5)]
>>> [f() for f in x]
[4, 4, 4, 4, 4]
As you can see the output is [4, 4, 4, 4, 4]
, not [0, 1, 2, 3, 4]
as you might expect.
To get the expected output, we must capture the value at lambda creation:
>>> x = [lambda i=i: i for i in range(5)]
>>> [f() for f in x]
[0, 1, 2, 3, 4]
A subtle change, but very different behavior.
In this snippet a default argument is used to capture the current value of i
at each iteration.
Default arguments are evaluated when the function is defined so a new object i
that shadows the outer i
is created for each lambda.
This effectively captures the value by creating a new binding in the lambda's local scope.
Late Binding in Python
This behavior is called late binding. Python uses late binding as it simplifies the language design with simple uniform semantics. In Python, names are always resolved at runtime and objects are created when their definitions are executed. This design makes closures lightweight, but can lead to subtle bugs if you're not careful.
Contrast with Other Languages
Most compiled languages use early binding by default, capturing the value at the time the closure is created. This avoids the bug shown above, enables (often significant) compiler optimizations, and reduces runtime errors via compiler checks.
Some argue that late binding offers more expressiveness, allowing programmers to write more flexible and dynamic code. I'm personally unconvinced; intuitive code is good code. The same expressiveness can be achieved with early binding through explicit captures or higher-order functions, without the footgun of unexpected variable sharing. The "flexibility" of late binding is really just ambiguity that leads to bugs like the one we've seen, along with missed optimization opportunities that compiled languages can leverage.
Takeaway
Understanding subtle details of how your programming language works is crucial for writing correct code.
Python's late binding behavior is consistent and predictable once you know about it, but it violates many programmers' intuitions.
When creating closures in loops, always be explicit about what values you want to capture.
This can be achieved with default arguments, factory functions, or functools.partial
.
Remember clear and explicit is good, future readers of your code (statistically likely to be you) will thank you.