Concurrency & Parallelism in Python: Threads, Async, and Multiprocessing Explained

Apr 17, 2025

Ever wondered why Python's asyncio typically underperforms JavaScript in I/O-bound scenarios, or why CPU-bound tasks struggle with Python threads due to the GIL? Let’s break it down—without the jargon overload.

Part 1: Cores, Threads, and How Your CPU Works

What is a Core?

A core is an independent processing unit within a CPU that can execute instructions separately, enabling parallel processing—where multiple tasks run simultaneously across cores.

For example, the Intel Core i7-12700H has 14 physical cores (6 high-performance and 8 efficiency cores), allowing it to execute up to 14 tasks in parallel under ideal conditions, though real-world performance depends on workload, thermal limits, and shared resources like memory bandwidth.

What is a Thread?

Hardware Thread:
A physical execution thread on a CPU core. A single core can support multiple hardware threads via Simultaneous Multithreading (SMT) (Intel’s proprietary implementation is branded as Hyper-Threading). Each hardware thread can independently execute instructions, though they share the core’s resources (e.g., ALU, cache).
Software Thread:
A virtual thread managed by the operating system, representing a unit of execution within a process. The OS scheduler maps software threads to hardware threads for execution. Multiple software threads can run concurrently on a single hardware thread (via time-slicing) or in parallel (if multiple hardware threads are available).

Process vs Thread

Process: An instance of a running program with isolated memory space (code, data, heap, stack).
Thread: A lightweight execution unit within a process that shares the process's memory space.

Concurrency

Concurrency is when multiple tasks are in progress at the same time, but not necessarily running at the exact same moment.

Two Types of Concurrency:

Parallel Concurrency:
- Tasks actually run at the same time on different cores. Also known as Parallel Processing.
- Requires:
  - Multi-core CPU
  - Tasks independent and schedulable in parallel
Non-Parallel Concurrency:
- Tasks take turns running on a single core. The core rapidly switches between its hardware threads, giving the illusion of simultaneous execution, but only one thread is executing at any given moment.
- Achieved via:
  - Time-slicing: The OS scheduler rapidly switches threads (often every few ms)
  - Hardware thread switching (when SMT/Hyper-Threading is present)

Part 2: Concurrency in Python — Threads, Multiprocessing, and Asyncio

The GIL (Global Interpreter Lock)

CPython, the standard Python interpreter, uses a Global Interpreter Lock (GIL).
It ensured that only one thread modifies Python objects at a time, preventing race conditions and memory corruption.
It simplified the design and implementation of CPython, making it easier to maintain.
While the GIL's performance trade-off was reasonable when most systems were single-core, it has become a significant bottleneck for CPU-bound multi-threaded workloads on modern multi-core processors.

I/O-bound tasks

Operations where the program spends most of its time waiting for input/output (I/O) operations to complete, rather than performing CPU-heavy calculations.
eg- File I/O operations, HTTP Network Requests etc.

CPU-bound tasks

Operations where the program spends most of its time performing heavy computational work that fully utilizes the CPU.
eg- Image Processing, Machine Learning Model Training etc.

Python Threads: Concurrent but not Parallel

import threading

def task():
    print("Running")

for _ in range(5):
    threading.Thread(target=task).start()

The program runs as a single process with multiple software threads.
All these threads share the same GIL, so even if you have multiple cores, only one thread executes Python code at any moment.
Threads in Python are useful for I/O-bound tasks but not for CPU-bound operations.

Python Asyncio: Concurrent but not Parallel

import asyncio

async def download():
    await asyncio.sleep(1)
    print("Done")

async def main():
    await asyncio.gather(*(download() for _ in range(5)))

asyncio.run(main())

The program runs as a single process with a single software thread.
All tasks in asyncio are managed by an event loop that runs within the same thread, using cooperative multitasking (tasks yield control back to the event loop when they are waiting).
asyncio is useful for I/O-bound tasks but not for CPU-bound operations, as it allows for non-blocking I/O operations, but CPU-bound work would block the event loop.

Python Multiprocessing: True Parallelism

from multiprocessing import Process

def task():
    print("Running")

for _ in range(5):
    Process(target=task).start()

The program runs as multiple processes (each process has its own thread and memory space).
All tasks in multiprocessing are managed by separate processes that run in parallel, utilizing different CPU cores (if available), allowing for true parallelism for CPU-bound operations.
multiprocessing is useful for CPU-bound tasks because it bypasses the Global Interpreter Lock (GIL) by using multiple processes, each with its own interpreter, allowing them to run in parallel. However, it is not as efficient for I/O-bound operations compared to asyncio since processes are not designed for managing non-blocking I/O in a cooperative manner.

Bonus: Why Python Needs the GIL (When JavaScript Doesn’t)

You might wonder: "If Python’s asyncio is single-threaded and event-driven like JavaScript, why does it still need the GIL?"

The answer lies in how the two languages handle memory sharing and task isolation:

1. Python’s Problem: Shared Memory Requires the GIL

asyncio coroutines share the same memory space.

counter = 0
async def increment():
    global counter
    counter += 1  # Not atomic! Needs GIL protection.

If two coroutines run increment() simultaneously, the GIL ensures counter += 1 isn’t interrupted mid-execution (preventing race conditions).
Without the GIL, even asyncio would need explicit locks for safety, defeating the simplicity of async programming.

2. JavaScript’s Advantage: No Shared Memory, No GIL Needed

Tasks (event handlers, promises) don’t share memory. Data is passed via:
- Closures: Isolated callback scope.
- Promises/Async-Await: Chainable, non-blocking operations.
- Message Passing (e.g., postMessage in Web Workers).
Example:

let counter = 0;
setTimeout(() => { counter++; }, 0);  // Runs to completion
setTimeout(() => { counter++; }, 0);  // Runs after, no overlap.

Each setTimeout callback executes atomically—no interleaving means no races.

Why Can’t Python Ditch the GIL?

Historical Design: Python’s C API and reference counting rely on the GIL. Many C extensions rely on the GIL for thread safety. Removing the GIL would require significant changes to these core aspects of Python, potentially breaking compatibility with existing code and slowing down single-threaded performance.
JavaScript’s Head Start: Born for the web, where event-driven isolation was mandatory. Python added asyncio later, atop a shared-memory foundation. Javascript's architecture was designed from the beginning to handle concurrency in a way that inherently avoids the need for a global lock like the GIL.
Python Evolution: Recent Python versions (3.12 and beyond) are experimenting with the removal of the GIL, making it an optional feature. This allows for multi-threaded applications to take full advantage of multiple cores without the GIL restriction. However, this is an ongoing effort, and compatibility and performance implications are still being evaluated.

Alphanome Tech Blog

Discussion about this post