Concurrency & Parallelism in Python: Threads, Async, and Multiprocessing Explained
Ever wondered why Python's asyncio typically underperforms JavaScript in I/O-bound scenarios, or why CPU-bound tasks struggle with Python threads due to the GIL? Let’s break it down—without the jargon overload.
Part 1: Cores, Threads, and How Your CPU Works
What is a Core?
A core is an independent processing unit within a CPU that can execute instructions separately, enabling parallel processing—where multiple tasks run simultaneously across cores.
For example, the Intel Core i7-12700H has 14 physical cores (6 high-performance and 8 efficiency cores), allowing it to execute up to 14 tasks in parallel under ideal conditions, though real-world performance depends on workload, thermal limits, and shared resources like memory bandwidth.
What is a Thread?
Hardware Thread:
A physical execution thread on a CPU core. A single core can support multiple hardware threads via Simultaneous Multithreading (SMT) (Intel’s proprietary implementation is branded as Hyper-Threading). Each hardware thread can independently execute instructions, though they share the core’s resources (e.g., ALU, cache).
Software Thread:
A virtual thread managed by the operating system, representing a unit of execution within a process. The OS scheduler maps software threads to hardware threads for execution. Multiple software threads can run concurrently on a single hardware thread (via time-slicing) or in parallel (if multiple hardware threads are available).
Process vs Thread
Process: An instance of a running program with isolated memory space (code, data, heap, stack).
Thread: A lightweight execution unit within a process that shares the process's memory space.
Concurrency
Concurrency is when multiple tasks are in progress at the same time, but not necessarily running at the exact same moment.
Two Types of Concurrency:
Parallel Concurrency:
Tasks actually run at the same time on different cores. Also known as Parallel Processing.
Requires:
Multi-core CPU
Tasks independent and schedulable in parallel
Non-Parallel Concurrency:
Tasks take turns running on a single core. The core rapidly switches between its hardware threads, giving the illusion of simultaneous execution, but only one thread is executing at any given moment.
Achieved via:
Time-slicing: The OS scheduler rapidly switches threads (often every few ms)
Hardware thread switching (when SMT/Hyper-Threading is present)
Part 2: Concurrency in Python — Threads, Multiprocessing, and Asyncio
The GIL (Global Interpreter Lock)
CPython, the standard Python interpreter, uses a Global Interpreter Lock (GIL).
It ensured that only one thread modifies Python objects at a time, preventing race conditions and memory corruption.
It simplified the design and implementation of CPython, making it easier to maintain.
While the GIL's performance trade-off was reasonable when most systems were single-core, it has become a significant bottleneck for CPU-bound multi-threaded workloads on modern multi-core processors.
I/O-bound tasks
Operations where the program spends most of its time waiting for input/output (I/O) operations to complete, rather than performing CPU-heavy calculations.
eg- File I/O operations, HTTP Network Requests etc.
CPU-bound tasks
Operations where the program spends most of its time performing heavy computational work that fully utilizes the CPU.
eg- Image Processing, Machine Learning Model Training etc.
Python Threads: Concurrent but not Parallel
import threading
def task():
print("Running")
for _ in range(5):
threading.Thread(target=task).start()
The program runs as a single process with multiple software threads.
All these threads share the same GIL, so even if you have multiple cores, only one thread executes Python code at any moment.
Threads in Python are useful for I/O-bound tasks but not for CPU-bound operations.
Python Asyncio: Concurrent but not Parallel
import asyncio
async def download():
await asyncio.sleep(1)
print("Done")
async def main():
await asyncio.gather(*(download() for _ in range(5)))
asyncio.run(main())
The program runs as a single process with a single software thread.
All tasks in
asyncio
are managed by an event loop that runs within the same thread, using cooperative multitasking (tasks yield control back to the event loop when they are waiting).asyncio
is useful for I/O-bound tasks but not for CPU-bound operations, as it allows for non-blocking I/O operations, but CPU-bound work would block the event loop.
Python Multiprocessing: True Parallelism
from multiprocessing import Process
def task():
print("Running")
for _ in range(5):
Process(target=task).start()
The program runs as multiple processes (each process has its own thread and memory space).
All tasks in
multiprocessing
are managed by separate processes that run in parallel, utilizing different CPU cores (if available), allowing for true parallelism for CPU-bound operations.multiprocessing
is useful for CPU-bound tasks because it bypasses the Global Interpreter Lock (GIL) by using multiple processes, each with its own interpreter, allowing them to run in parallel. However, it is not as efficient for I/O-bound operations compared toasyncio
since processes are not designed for managing non-blocking I/O in a cooperative manner.
Bonus: Why Python Needs the GIL (When JavaScript Doesn’t)
You might wonder: "If Python’s asyncio
is single-threaded and event-driven like JavaScript, why does it still need the GIL?"
The answer lies in how the two languages handle memory sharing and task isolation:
1. Python’s Problem: Shared Memory Requires the GIL
asyncio
coroutines share the same memory space.
counter = 0
async def increment():
global counter
counter += 1 # Not atomic! Needs GIL protection.
If two coroutines run
increment()
simultaneously, the GIL ensurescounter += 1
isn’t interrupted mid-execution (preventing race conditions).Without the GIL, even
asyncio
would need explicit locks for safety, defeating the simplicity of async programming.
2. JavaScript’s Advantage: No Shared Memory, No GIL Needed
Tasks (event handlers, promises) don’t share memory. Data is passed via:
Closures: Isolated callback scope.
Promises/Async-Await: Chainable, non-blocking operations.
Message Passing (e.g.,
postMessage
in Web Workers).
Example:
let counter = 0;
setTimeout(() => { counter++; }, 0); // Runs to completion
setTimeout(() => { counter++; }, 0); // Runs after, no overlap.
Each
setTimeout
callback executes atomically—no interleaving means no races.
Why Can’t Python Ditch the GIL?
Historical Design: Python’s C API and reference counting rely on the GIL. Many C extensions rely on the GIL for thread safety. Removing the GIL would require significant changes to these core aspects of Python, potentially breaking compatibility with existing code and slowing down single-threaded performance.
JavaScript’s Head Start: Born for the web, where event-driven isolation was mandatory. Python added asyncio later, atop a shared-memory foundation. Javascript's architecture was designed from the beginning to handle concurrency in a way that inherently avoids the need for a global lock like the GIL.
Python Evolution: Recent Python versions (3.12 and beyond) are experimenting with the removal of the GIL, making it an optional feature. This allows for multi-threaded applications to take full advantage of multiple cores without the GIL restriction. However, this is an ongoing effort, and compatibility and performance implications are still being evaluated.