Section_29_Multiprocessing

🧬 Section 29: Multiprocessing

Running CPU-bound Tasks in Parallel with Python

🚀 Learn how to use the multiprocessing module to run tasks in parallel and improve performance for CPU-bound operations like heavy computations, image processing, or data transformation.

This section covers:

🔍 The difference between threading and multiprocessing
📦 Using Process to create individual processes
🔄 Sharing work across multiple cores using Pool
🧰 Managing process pools with ProcessPoolExecutor
🧪 Real-world example – parallel image resizing
💡 Hidden notes and best practices for safe multiprocessing

🧠 What You'll Learn

Concept	Description
Multiprocessing	Run code in separate processes (not threads)
CPU-bound vs I/O-bound	When to choose multiprocessing over multithreading
`multiprocessing.Process`	Create and manage individual processes
`multiprocessing.Pool`	Distribute tasks across multiple CPU cores
`concurrent.futures.ProcessPoolExecutor`	High-level interface for managing process pools
Shared State & IPC	Share data safely between processes
Best Practices	Avoiding bottlenees, managing resources, debugging

🔍 Processes vs Threads – Key Differences

Feature	Thread	Process
Memory Sharing	Shared within a process	Each has its own memory space
Communication	Easy via shared memory	Requires inter-process communication
Overhead	Low	Higher due to new process creation
Execution Model	Cooperative (GIL limits true parallelism)	True parallelism on multi-core CPUs
Ideal For	I/O-bound tasks	CPU-bound tasks

🔹 In Python, threads are great for waiting, but not for computing — due to the Global Interpreter Lock (GIL). 🔹 Use multiprocessing when you want real parallel execution for compute-heavy jobs.

🧱 Basic Example – Running Two CPU-bound Tasks in Parallel

Let’s build a simple example that performs two heavy calculations in parallel.

🚀 Define a Heavy Task

import time

def cpu_intensive_task(n):
    print(f"Starting task {n}")
    result = sum(i * i for i in range(10**6))
    time.sleep(2)  # Simulate computation time
    print(f"Task {n} completed")
    return result

🧩 Create and Start Processes

from multiprocessing import Process
import time

start_time = time.perf_counter()

p1 = Process(target=cpu_intensive_task, args=(1,))
p2 = Process(target=cpu_intensive_task, args=(2,))

p1.start()
p2.start()

p1.join()
p2.join()

end_time = time.perf_counter()
print(f"Total time taken: {end_time - start_time:.2f}s")

🔸 Output will show both tasks running concurrently and completing faster than sequential runs.

🧰 Advanced Example – Resize Images in Parallel

Use multiprocessing to resize images using Pillow.

🖼️ Install Pillow First

pip install pillow

🛠️ Image Resizing Script

from PIL import Image
import os
from multiprocessing import Pool
import time

IMAGES_DIR = "images"
OUTPUT_DIR = "resized"

def resize_image(filename):
    img_path = os.path.join(IMAGES_DIR, filename)
    with Image.open(img_path) as img:
        resized_img = img.resize((300, 300))
        resized_img.save(os.path.join(OUTPUT_DIR, filename))
    print(f"Resized {filename}")

if __name__ == "__main__":
    if not os.path.exists(OUTPUT_DIR):
        os.makedirs(OUTPUT_DIR)

    files = [f for f in os.listdir(IMAGES_DIR) if f.endswith(".jpg")]

    start_time = time.time()

    with Pool(processes=4) as pool:
        pool.map(resize_image, files)

    end_time = time.time()
    print(f"Total time: {end_time - start_time:.2f}s")

🔸 This script resizes all .jpg images in the images/ folder using 4 worker processes.

🧩 Using `ProcessPoolExecutor` – Cleaner Interface

For a higher-level API, use ProcessPoolExecutor from concurrent.futures.

from concurrent.futures import ProcessPoolExecutor
import time

def square(n):
    time.sleep(1)  # Simulate CPU-bound task
    return n * n

if __name__ == "__main__":
    numbers = list(range(1, 11))

    with ProcessPoolExecutor(max_workers=4) as executor:
        results = list(executor.map(square, numbers))

    print("Squares:", results)

✅ Best Practices for Safe Multiprocessing

Practice	Description
🧠 Prefer `ProcessPoolExecutor` unless you need fine-grained control	Use high-level APIs when possible for simplicity and better error handling
🧮 Use `Pool.map()` or `executor.map()` for homogeneous tasks	For uniform function calls across an iterable — clean and efficient
📁 Ensure input files/dirs exist before starting workers	Prevent processes from failing due to missing resources
📦 Don’t share large objects — it increases IPC overhead	Copying large data between processes is expensive; prefer shared memory or avoid sharing entirely
🧾 Always guard `if __name__ == '__main__'` in Windows	Prevents infinite spawning of subprocesses on Windows
🧩 Use `multiprocessing.Queue` or `Manager` to share state	For inter-process communication (IPC) — safe and effective
🧯 Limit number of processes — usually up to number of CPU cores	Avoid overloading the system; match to available hardware
🧽 Clean up output directories after processing	Especially useful in batch jobs like image resizing or file conversion
⚠️ Avoid shared state where possible	Shared memory adds complexity and potential bugs
🧪 Use logging inside processes for debugging	Helps trace execution flow and identify bottlenecks or failures
🕒 Handle timeouts and hanging processes gracefully	Use `.join(timeout)` or `.terminate()` if needed
🧬 Always close or join() processes explicitly	Ensures all background work completes before main process exits
📝 Use daemon=False and explicit joins unless you want background termination	So processes complete their work before program ends
🧼 Use context managers (`with`) for executors and pools	Automatically handles shutdown and cleanup
🧰 Serialize only picklable data	Not all Python objects can be passed between processes — ensure your data is compatible

📦 Inter-Process Communication (IPC)

When processes need to share data, use:

🧾 Shared Memory – `Value`, `Array`

from multiprocessing import Process, Value, Array

def modify_values(counter, names):
    with counter.get_lock():  # Atomic update
        counter.value += 1
    names[0] = "Alice"

if __name__ == '__main__':
    counter = Value('i', 0)
    names = Array('c', b'John Doe')

    p = Process(target=modify_values, args=(counter, names))
    p.start()
    p.join()

    print("Counter:", counter.value)
    print("Name:", names.value.decode())

🔸 Value and Array allow sharing primitive types between processes.

🧭 Manager – Share Complex Data

Use Manager() for more complex structures like lists or dicts.

from multiprocessing import Process, Manager
import time

def update_data(data_dict):
    time.sleep(1)
    data_dict['status'] = 'completed'

if __name__ == '__main__':
    with Manager() as manager:
        shared_data = manager.dict({'status': 'in progress'})

        p = Process(target=update_data, args=(shared_data,))
        p.start()
        p.join()

        print("Final status:", shared_data['status'])

🔸 manager.dict() creates a dictionary that can be safely shared between processes.

🧪 Real-World Example – Parallel File Hashing

Let’s hash multiple files in parallel using multiprocessing.

import hashlib
from multiprocessing import Process
import os

def hash_file(filename):
    hasher = hashlib.sha256()
    with open(filename, 'rb') as f:
        while chunk := f.read(8192):
            hasher.update(chunk)
    print(f"{os.path.basename(filename)}: {hasher.hexdigest()}")

if __name__ == '__main__':
    files = ["file1.txt", "file2.txt", "file3.txt"]

    processes = [Process(target=hash_file, args=(f,)) for f in files]

    for p in processes:
        p.start()
    for p in processes:
        p.join()

🔸 This is useful for integrity checks, backups, or batch file processing.

🧠 Hidden Notes & Tips

🧠 On Windows, always protect your entry point with if __name__ == '__main__':
🧲 Use Manager() only when needed — it adds overhead.
🧏‍♂️ Use logging inside processes for better debugging.
🧾 map() blocks until all processes complete; imap() gives partial results early.
📦 Prefer ProcessPoolExecutor over raw Process for simplicity.
🧵 Never assume order of process completion — use pool.apply_async() for async behavior.
🧩 Use multiprocessing.Value or multiprocessing.Array for low-overhead shared variables.
🧰 Monitor CPU usage during multiprocessing — some tasks may not benefit from it.

📌 Summary

Tool	Purpose
`multiprocessing.Process`	Create and manage individual processes
`multiprocessing.Pool`	Manage a fixed number of worker processes
`concurrent.futures.ProcessPoolExecutor`	High-level interface for process pools
`multiprocessing.Value` / `Array`	Share basic types between processes
`Manager`	Share complex objects like dicts and lists
`map()` / `apply_async()`	Dispatch tasks to worker processes
Best Practice	Use `if __name__ == '__main__'` in Windows
Real-world	Batch image processing, data crunching, encryption/hash generation

🎉 Congratulations! You now understand how to use Python multiprocessing to execute CPU-bound tasks in parallel, including:

Creating and managing Process instances
Using Pool for efficient parallelization
Choosing between shared memory and message passing
Applying multiprocessing in real applications like image resizing and file hashing

This completes our journey through Python concurrency and parallelism!

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

🧬 Section 29: Multiprocessing

Running CPU-bound Tasks in Parallel with Python

🧠 What You'll Learn

🔍 Processes vs Threads – Key Differences

🧱 Basic Example – Running Two CPU-bound Tasks in Parallel

🚀 Define a Heavy Task

🧩 Create and Start Processes

🧰 Advanced Example – Resize Images in Parallel

🖼️ Install Pillow First

🛠️ Image Resizing Script

🧩 Using `ProcessPoolExecutor` – Cleaner Interface

✅ Best Practices for Safe Multiprocessing

📦 Inter-Process Communication (IPC)

🧾 Shared Memory – `Value`, `Array`

🧭 Manager – Share Complex Data

🧪 Real-World Example – Parallel File Hashing

🧠 Hidden Notes & Tips

📌 Summary

FilesExpand file tree

Section_29_Multiprocessing

Directory actions

More options

Directory actions

More options

Latest commit

History

Section_29_Multiprocessing

Folders and files

parent directory

README.md

🧬 Section 29: Multiprocessing

Running CPU-bound Tasks in Parallel with Python

🧠 What You'll Learn

🔍 Processes vs Threads – Key Differences

🧱 Basic Example – Running Two CPU-bound Tasks in Parallel

🚀 Define a Heavy Task

🧩 Create and Start Processes

🧰 Advanced Example – Resize Images in Parallel

🖼️ Install Pillow First

🛠️ Image Resizing Script

🧩 Using ProcessPoolExecutor – Cleaner Interface

✅ Best Practices for Safe Multiprocessing

📦 Inter-Process Communication (IPC)

🧾 Shared Memory – Value, Array

🧭 Manager – Share Complex Data

🧪 Real-World Example – Parallel File Hashing

🧠 Hidden Notes & Tips

📌 Summary

🧩 Using `ProcessPoolExecutor` – Cleaner Interface

🧾 Shared Memory – `Value`, `Array`