🚀 Learn how to use the multiprocessing module to run tasks in parallel and improve performance for CPU-bound operations like heavy computations, image processing, or data transformation.
This section covers:
- 🔍 The difference between threading and multiprocessing
- 📦 Using
Processto create individual processes - 🔄 Sharing work across multiple cores using
Pool - 🧰 Managing process pools with
ProcessPoolExecutor - 🧪 Real-world example – parallel image resizing
- 💡 Hidden notes and best practices for safe multiprocessing
| Concept | Description |
|---|---|
| Multiprocessing | Run code in separate processes (not threads) |
| CPU-bound vs I/O-bound | When to choose multiprocessing over multithreading |
multiprocessing.Process |
Create and manage individual processes |
multiprocessing.Pool |
Distribute tasks across multiple CPU cores |
concurrent.futures.ProcessPoolExecutor |
High-level interface for managing process pools |
| Shared State & IPC | Share data safely between processes |
| Best Practices | Avoiding bottlenees, managing resources, debugging |
| Feature | Thread | Process |
|---|---|---|
| Memory Sharing | Shared within a process | Each has its own memory space |
| Communication | Easy via shared memory | Requires inter-process communication |
| Overhead | Low | Higher due to new process creation |
| Execution Model | Cooperative (GIL limits true parallelism) | True parallelism on multi-core CPUs |
| Ideal For | I/O-bound tasks | CPU-bound tasks |
🔹 In Python, threads are great for waiting, but not for computing — due to the Global Interpreter Lock (GIL). 🔹 Use multiprocessing when you want real parallel execution for compute-heavy jobs.
Let’s build a simple example that performs two heavy calculations in parallel.
import time
def cpu_intensive_task(n):
print(f"Starting task {n}")
result = sum(i * i for i in range(10**6))
time.sleep(2) # Simulate computation time
print(f"Task {n} completed")
return resultfrom multiprocessing import Process
import time
start_time = time.perf_counter()
p1 = Process(target=cpu_intensive_task, args=(1,))
p2 = Process(target=cpu_intensive_task, args=(2,))
p1.start()
p2.start()
p1.join()
p2.join()
end_time = time.perf_counter()
print(f"Total time taken: {end_time - start_time:.2f}s")🔸 Output will show both tasks running concurrently and completing faster than sequential runs.
Use multiprocessing to resize images using Pillow.
pip install pillowfrom PIL import Image
import os
from multiprocessing import Pool
import time
IMAGES_DIR = "images"
OUTPUT_DIR = "resized"
def resize_image(filename):
img_path = os.path.join(IMAGES_DIR, filename)
with Image.open(img_path) as img:
resized_img = img.resize((300, 300))
resized_img.save(os.path.join(OUTPUT_DIR, filename))
print(f"Resized {filename}")
if __name__ == "__main__":
if not os.path.exists(OUTPUT_DIR):
os.makedirs(OUTPUT_DIR)
files = [f for f in os.listdir(IMAGES_DIR) if f.endswith(".jpg")]
start_time = time.time()
with Pool(processes=4) as pool:
pool.map(resize_image, files)
end_time = time.time()
print(f"Total time: {end_time - start_time:.2f}s")🔸 This script resizes all .jpg images in the images/ folder using 4 worker processes.
For a higher-level API, use ProcessPoolExecutor from concurrent.futures.
from concurrent.futures import ProcessPoolExecutor
import time
def square(n):
time.sleep(1) # Simulate CPU-bound task
return n * n
if __name__ == "__main__":
numbers = list(range(1, 11))
with ProcessPoolExecutor(max_workers=4) as executor:
results = list(executor.map(square, numbers))
print("Squares:", results)| Practice | Description |
|---|---|
🧠 Prefer ProcessPoolExecutor unless you need fine-grained control |
Use high-level APIs when possible for simplicity and better error handling |
🧮 Use Pool.map() or executor.map() for homogeneous tasks |
For uniform function calls across an iterable — clean and efficient |
| 📁 Ensure input files/dirs exist before starting workers | Prevent processes from failing due to missing resources |
| 📦 Don’t share large objects — it increases IPC overhead | Copying large data between processes is expensive; prefer shared memory or avoid sharing entirely |
🧾 Always guard if __name__ == '__main__' in Windows |
Prevents infinite spawning of subprocesses on Windows |
🧩 Use multiprocessing.Queue or Manager to share state |
For inter-process communication (IPC) — safe and effective |
| 🧯 Limit number of processes — usually up to number of CPU cores | Avoid overloading the system; match to available hardware |
| 🧽 Clean up output directories after processing | Especially useful in batch jobs like image resizing or file conversion |
| Shared memory adds complexity and potential bugs | |
| 🧪 Use logging inside processes for debugging | Helps trace execution flow and identify bottlenecks or failures |
| 🕒 Handle timeouts and hanging processes gracefully | Use .join(timeout) or .terminate() if needed |
| 🧬 Always close or join() processes explicitly | Ensures all background work completes before main process exits |
| 📝 Use daemon=False and explicit joins unless you want background termination | So processes complete their work before program ends |
🧼 Use context managers (with) for executors and pools |
Automatically handles shutdown and cleanup |
| 🧰 Serialize only picklable data | Not all Python objects can be passed between processes — ensure your data is compatible |
When processes need to share data, use:
from multiprocessing import Process, Value, Array
def modify_values(counter, names):
with counter.get_lock(): # Atomic update
counter.value += 1
names[0] = "Alice"
if __name__ == '__main__':
counter = Value('i', 0)
names = Array('c', b'John Doe')
p = Process(target=modify_values, args=(counter, names))
p.start()
p.join()
print("Counter:", counter.value)
print("Name:", names.value.decode())🔸 Value and Array allow sharing primitive types between processes.
Use Manager() for more complex structures like lists or dicts.
from multiprocessing import Process, Manager
import time
def update_data(data_dict):
time.sleep(1)
data_dict['status'] = 'completed'
if __name__ == '__main__':
with Manager() as manager:
shared_data = manager.dict({'status': 'in progress'})
p = Process(target=update_data, args=(shared_data,))
p.start()
p.join()
print("Final status:", shared_data['status'])🔸 manager.dict() creates a dictionary that can be safely shared between processes.
Let’s hash multiple files in parallel using multiprocessing.
import hashlib
from multiprocessing import Process
import os
def hash_file(filename):
hasher = hashlib.sha256()
with open(filename, 'rb') as f:
while chunk := f.read(8192):
hasher.update(chunk)
print(f"{os.path.basename(filename)}: {hasher.hexdigest()}")
if __name__ == '__main__':
files = ["file1.txt", "file2.txt", "file3.txt"]
processes = [Process(target=hash_file, args=(f,)) for f in files]
for p in processes:
p.start()
for p in processes:
p.join()🔸 This is useful for integrity checks, backups, or batch file processing.
🧠 Hidden Notes & Tips
- 🧠 On Windows, always protect your entry point with
if __name__ == '__main__': - 🧲 Use
Manager()only when needed — it adds overhead. - 🧏♂️ Use
logginginside processes for better debugging. - 🧾
map()blocks until all processes complete;imap()gives partial results early. - 📦 Prefer
ProcessPoolExecutorover rawProcessfor simplicity. - 🧵 Never assume order of process completion — use
pool.apply_async()for async behavior. - 🧩 Use
multiprocessing.Valueormultiprocessing.Arrayfor low-overhead shared variables. - 🧰 Monitor CPU usage during multiprocessing — some tasks may not benefit from it.
| Tool | Purpose |
|---|---|
multiprocessing.Process |
Create and manage individual processes |
multiprocessing.Pool |
Manage a fixed number of worker processes |
concurrent.futures.ProcessPoolExecutor |
High-level interface for process pools |
multiprocessing.Value / Array |
Share basic types between processes |
Manager |
Share complex objects like dicts and lists |
map() / apply_async() |
Dispatch tasks to worker processes |
| Best Practice | Use if __name__ == '__main__' in Windows |
| Real-world | Batch image processing, data crunching, encryption/hash generation |
🎉 Congratulations! You now understand how to use Python multiprocessing to execute CPU-bound tasks in parallel, including:
- Creating and managing
Processinstances - Using
Poolfor efficient parallelization - Choosing between shared memory and message passing
- Applying multiprocessing in real applications like image resizing and file hashing
This completes our journey through Python concurrency and parallelism!