Python's flexibility and rich library support make it perfect for applications such as web development, data analysis, artificial intelligence, and automation. However, as data and computing needs increase, the constraints of Python's single-threaded execution become more apparent, particularly for activities like data processing and machine learning. Parallel processing overcomes this by allowing for parallel task execution, which dramatically improves speed.
Python's multiprocessing module enables separate processes to execute concurrently across many CPU cores, avoiding the Global Interpreter Lock (GIL) and achieving true parallelization. This makes multiprocessing a strong tool for developers to take advantage of multi-core CPUs and improve program efficiency for computationally heavy activities.
Ready to put your Python skills to work? Join Index.dev and work remotely on high-paying projects in the UK and US!
What is Multiprocessing?
Multiprocessing refers to the system's capacity to execute many operations simultaneously. Put simply, multiprocessing involves utilising two or more central processing units (CPUs) within a single computer system. This approach can also distribute jobs across many processes.
Processing units concurrently utilise the main memory and peripherals to execute programs. A multiprocessing application is divided into smaller components that operate autonomously. The operating system assigns each process to a processor.
Python includes a built-in library called multiprocessing that facilitates process shifting. Prior to dealing with multiprocessing, it is essential to have a thorough understanding of the process object.
Multiprocessing for Python Developers
As the complexity and scope of modern applications have increased, multiprocessing has become a vital tool for Python developers. Because operations like data processing, machine learning, and simulations require more computational capacity, the ability to run several processes concurrently is critical.
This method not only improves performance but also assures efficient use of multi-core computers, circumventing the limits imposed by Python's Global Interpreter Lock (GIL). However, finding qualified Python developers capable of properly applying multiprocessing methods has been difficult.
The increased demand for Python knowledge in numerous industries, along with rapid technological improvements, has resulted in a very competitive job market. Companies compete for top talent, making it tough to hire Python developers that understand Python's capabilities and can enhance speed through multiprocessing. As a result, employing skilled Python developers today necessitates a deliberate strategy and a sharp eye for selecting people with the ideal combination of abilities and experience.
Read more: Top 10 Python Libraries For Data Visualization
Why Use Multiprocessing in Python?
Executing many processes on a single processor is a significant challenge. As the quantity of processes continues to rise, the processor will need to interrupt the current process and transition to the next in order to maintain their progress. Consequently, it will be necessary to pause each task, impeding its performance.
It may be conceptualised as an employee inside a company who is assigned to carry out tasks across many departments. If the individual is responsible for overseeing sales, accounts, and backend operations, he will need to pause sales activities while working on accounts, and vice versa.
Assume that there are several employees, each assigned to a distinct task. Does it get more straightforward? Therefore, multiprocessing in Python is crucial. The smaller task threads function as distinct personnel, facilitating the handling and administration of diverse procedures. A multiprocessing system can be shown as:
- A system that incorporates many central processors
- A multi-core processor refers to a single computing unit that has numerous independent core processing units.
Within the context of multiprocessing, the system has the capability to partition and allocate work to several processors.
Benefits of Multiprocessing
- Concurrent Execution: Allows numerous processes to execute simultaneously, increasing efficiency over sequential execution.
- Processes vs. Threads: Processes differ from threads in that they execute independently and have their own memory. Threads share memory inside a process.
- Bypassing GIL: Multiprocessing avoids Python's Global Interpreter Lock (GIL) by employing distinct interpreters for each process, as opposed to threading.
- Optimized Resource Allocation: The operating system schedules tasks separately, allowing for more efficient use of CPU resources.
- Ideal for CPU-bound tasks: Multiprocessing performs particularly well in CPU-intensive applications.
- Threading for I/O-Bound Tasks: Threading can be useful in applications that rely significantly on I/O operations.
What Is the Multiprocessing Module?
Python's multiprocessing module offers an API akin to that of the threading module, but it achieves local and remote concurrency by avoiding the Global Interpreter Lock (GIL) by using subprocesses rather than threads. Multiprocessing controls the execution of the complete program, which is resource-intensive and runs in segregated memory, whereas threading controls the execution of lightweight code that shares memory. The Process class is the main tool for managing numerous processes in applications, although this module also provides other classes and parallelism-related utilities.
Read more: Understanding Data Types in Python Programming
What Is the Process Class?
This section aims to provide a comprehensive understanding of the concept of a process and how to effectively discover, utilise, and manage processes in Python. As elucidated in the GNU C Library: "Processes are the fundamental entities responsible for distributing system resources." Every process possesses its individual address space and typically operates with a single thread of control. A process carries out the execution of a program. It is possible to have several processes running the same program. However, each process possesses its own distinct copy of the program, residing within its own address space, and executes it autonomously from the other copies.
However, how does that appear in Python? Up to this point, we have provided explanations and examples of what a process is and how it differs from a thread. However, we have not yet delved into any actual code. Let's modify that and provide a straightforward example of a Python process:
#!/usr/bin/env python
import os
# A very, very simple process.
if __name__ == "__main__":
print(f"Hi! I'm process {os.getpid()}")Which will produce the following output:
[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 144112Every executing Python script or program functions as an independent process.
Basic Multiprocessing
Let's utilise Python's Multiprocessing module to create a simple application that shows concurrent programming.
Consider the following code, task(), which sleeps for 0.5 seconds and prints before and after the sleep:
import time
def task():
print('Sleeping for 0.5 seconds')
time.sleep(0.5)
print('Finished sleeping')To define a process, we simply use the multiprocessing module:
...
import multiprocessing
p1 = multiprocessing.Process(target=task)
p2 = multiprocessing.Process(target=task)The target parameter to Process() provides the target function that the process will execute. However, these processes do not begin instantly until we start them:
...
p1.start()
p2.start()
A full concurrent program would be the following:
import multiprocessing
import time
def task():
print('Sleeping for 0.5 seconds')
time.sleep(0.5)
print('Finished sleeping')
if __name__ == "__main__":
start_time = time.perf_counter()
# Creates two processes
p1 = multiprocessing.Process(target=task)
p2 = multiprocessing.Process(target=task)
# Starts both processes
p1.start()
p2.start()
finish_time = time.perf_counter()
print(f"Program finished in {finish_time-start_time} seconds")
We must enclose our main program if __name__ == "__main__", or else the multiprocessing module will complain. This safety technique ensures that Python completes its analysis of the program before creating the sub-process.
However, there is an issue with the code since the program timer is shown before the processes we established are run. Here's the result of the code above:
Program finished in 0.012921249988721684 seconds
Sleeping for 0.5 seconds
Sleeping for 0.5 seconds
Finished sleeping
Finished sleepingTo get the two processes to execute before the time is printed, we must use the join() method. This is because three processes are running concurrently: p1, p2, and the main process. The primary process is the one that keeps track of time and reports out how long it took to complete. We should run the finish_time line only after the processes p1 and p2 have completed. We only need to add this bit of code immediately after the start() function calls.
...
p1.join()
p2.join()The join() method allows us to make other processes wait until the processes that were called by join() are completed. Here's the output after adding the join statements:
Sleeping for 0.5 seconds
Sleeping for 0.5 seconds
Finished sleeping
Finished sleeping
Program finished in 0.5688213340181392 secondsUsing similar reasoning, we can run multiple processes. The following is the whole code, changed from above to have 10 processes:
import multiprocessing
import time
def task():
print('Sleeping for 0.5 seconds')
time.sleep(0.5)
print('Finished sleeping')
if __name__ == "__main__":
start_time = time.perf_counter()
processes = []
# Creates 10 processes then starts them
for i in range(10):
p = multiprocessing.Process(target = task)
p.start()
processes.append(p)
# Joins all the processes
for p in processes:
p.join()
finish_time = time.perf_counter()
print(f"Program finished in {finish_time-start_time} seconds")High-paying jobs, flexible work, and top companies await! Become part of Index.dev’s elite talent network and secure remote projects with excellent pay!
Python Multiprocessing Process, Queue and Locks
There are several classes in the Python multiprocessing module for developing a parallel program. There are three basic classes: Process, Queue, and Lock. These classes will let you create a parallel program. But first, let's start with some simple code. To make a parallel program functional, you need to know how many cores your computer has. The Python Multiprocessing module allows you to learn this. The following simple code will display the number of cores on your computer.
import multiprocessing
print("Number of cpu : ", multiprocessing.cpu_count())The output shown below may vary depending on your computer.
Python Multiprocessing Process Class
Python Multiprocessing A process class is an abstraction that creates another Python process, gives it the ability to run code, and allows the parent program to manage its execution. There are two crucial methods in the Process class: start() and join. First, we must construct a function that will be executed by the process. Then we need to create a process object. If we construct a process object, nothing will happen until we instruct it to begin processing using the start() function. The procedure will then execute and report its results. Following that, we use the join() method to signal that the operation is complete.
Without the join() function call, the process will stay idle and not end. As a result, if you establish a large number of processes and fail to terminate them, you may run out of resources. Then you may have to kill them personally. One crucial note is that if you wish to send an argument across the procedure, you must use the args keyword parameter. The code below will help you learn how to use the Process class.
from multiprocessing import Process
def print_func(continent='Asia'):
print('The name of continent is : ', continent)
if __name__ == "__main__": # confirms that the code is under main function
names = ['America', 'Europe', 'Africa']
procs = []
proc = Process(target=print_func) # instantiating without any argument
procs.append(proc)
proc.start()
# instantiating process with arguments
for name in names:
# print(name)
proc = Process(target=print_func, args=(name,))
procs.append(proc)
proc.start()
# complete the processes
for proc in procs:
proc.join()The output of the following code will be:
Python Multiprocessing Queue Class
You have a basic knowledge of computer data structures, and you are probably familiar with queues. Python Multiprocessing modules include the Queue class, which is exactly a First-In-First-Out data structure. They may hold any pickle Python object (though basic ones are preferred) and are highly handy for data transfer between processes. Queues are especially valuable when supplied as a parameter to a Process' target function, which allows the Process to consume data. We can input data into a queue using the put() function and retrieve items from it using get(). For a simple example, see the code below.
from multiprocessing import Queue
colors = ['red', 'green', 'blue', 'black']
cnt = 1
# instantiating a queue object
queue = Queue()
print('pushing items to queue:')
for color in colors:
print('item no: ', cnt, ' ', color)
queue.put(color)
cnt += 1
print('\npopping items from queue:')
cnt = 0
while not queue.empty():
print('item no: ', cnt, ' ', queue.get())
cnt += 1Read more: 13 Python Algorithms Every Developer Should Know
Python Multiprocessing Lock Class
Lock class has a straightforward duty. It permits code to claim a lock, which means that no other process may run equivalent code until the lock is released. So Lock class mostly has two tasks. The first is to claim the lock, while the second is to release it. To claim a lock, use the acquire() method; to release a lock, use the release() function.
In Python, the lock is used to lock processes when multiprocessing is enabled. Its acquire() and release() functions allow you to lock and restart processes. As a result, you may execute specified activities based on their priority while pausing the other operations. The code below implements the lock mechanism of an ATM-like machine.
mport multiprocessing
# Withdrawal function
def wthdrw(bal, lock):
for _ in range(10000):
lock.acquire()
bal.value = bal.value - 1
lock.release()
# Deposit function
def dpst(bal, lock):
for _ in range(10000):
lock.acquire()
bal.value = bal.value + 1
lock.release()
def transact():
# initial balance
bal = multiprocessing.Value('i', 100)
# creating lock object
lock = multiprocessing.Lock()
# creating processes
proc1 = multiprocessing.Process(target=wthdrw, args=(bal,lock))
proc2 = multiprocessing.Process(target=dpst, args=(bal,lock))
# starting processes
proc1.start()
proc2.start()
# waiting for processes to finish
proc1.join()
proc2.join()
# printing final balance
print("Final balance = {}".format(bal.value))
if __name__ == "__main__":
for _ in range(10):
# performing transaction process
transact()Output:
Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
>Python Multiprocessing Pipe Class
When employing multiprocessing in Python, Pipes serves as the communication route. Pipes are useful when you need to begin communication between different programs. They return two connection objects, one for each end of the pipe, and interact using the send() and recv() methods. Let's look at an example to have a better grasp. In the code below, you will utilise a Pipe to transport data from the child to the parent connection.
import multiprocessing
from multiprocessing import Process, Pipe
def exm_function(c):
c.send(['Hi! This is child info'])
c.close()
if __name__ == '__main__':
par_c, chi_c = Pipe()
mp1 = multiprocessing.Process(target=exm_function, args=(chi_c,))
mp1.start()
print (par_c.recv() )
mp1.join()Output:
['Hi! This is child info'1
>Python Multiprocessing Pool
The Python multiprocessing pool is required for simultaneous execution of a function with multiple input values. It is also used to spread input data across several processes (data parallelism). Consider the following example of a multiprocessing pool.
Example:
from multiprocessing import Pool
import time
w = (["V", 5], ["X", 2], ["Y", 1], ["Z", 3])
def work_log(data_for_work):
print(" Process name is %s waiting time is %s seconds"
(data_for_work[0], data_for_work[1]))
time.sleep(int(data_for_work[1]))
print(" Process %s Executed." % data_for_work[0])
def handler():
p = Pool(2)
p.map(work_log, w)
if __name__ == '__main__':
handler() Output:
Process name is V waiting time is 5 seconds
Process V Executed.
Process name is X waiting time is 2 seconds
Process X Executed.
Process name is Y waiting time is 1 seconds
Process Y Executed.
Process name is Z waiting time is 3 seconds
Process Z Executed.Proxy Objects
Proxy objects are shared items that live in a separate process. This object is also referred to as a proxy. Multiple proxy objects may share a referent. A proxy object consists of several methods that are used to call the equivalent methods of its referent. The following is an example of proxy objects.
Example:
from multiprocessing import Manager
manager = Manager()
l = manager.list([i*i for i in range(10)])
print(l)
print(repr(l))
print(l[4])
print(l[2:5]) Output:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
<ListProxy object, typeid 'list' at 0x7f063621ea10>
16
[4, 9, 16]The proxy objects are pickable, which allows us to transmit them between processes. These items are also used to specify the amount of control over synchronisation.
Read more: Designing Generative AI Applications: 5 Key Principles to Follow
Difference Between Multithreading and Multiprocessing in Python
Multithreading refers to a processor's capacity to execute numerous threads concurrently, each of which runs a process. Multiprocessing refers to a system's capacity to operate many processors simultaneously, each of which can execute one or more threads.
The picture above shows that with multithreading (middle diagram), several threads share the same code, data, and files while running on distinct registers and stacks. Multiprocessing (the right figure) multiplies a single processor by repeating the code, data, and files, resulting in increased overhead.
Definitions:
- Executing several threads at once while sharing memory inside a single process is known as multithreading.
- Running several processes at once, each with its own memory and resources, is known as multiprocessing.
Features:
- Multithreading utilizes distinct registers and stacks but shares files, data, and code.
- Code, data, and files are duplicated during multiprocessing, which raises overhead.
Use cases:
- For IO-bound activities (like network or database operations), multithreading works best.
- For CPU-bound operations (such calculations and simulations), multiprocessing works best.
Concurrency vs. Parallelism:
- Because of Python's GIL, multithreading accomplishes concurrency (tasks interleaving) but not genuine parallelism.
- With the use of many CPU cores, multiprocessing makes parallelism possible.
Performance:
- Because of resource contention and GIL restrictions, multithreading may cause CPU-bound processes to lag.
- Although multiprocessing uses all of the CPU cores for demanding tasks, it has greater overhead.
Overhead:
- Managing threads is lightweight, but performance may be constrained.
- Managing processes requires a lot of resources, but it leads to more CPU use and more parallelism.
Work from anywhere, get paid well: Apply for remote Python projects on Index.dev. Sign up now →
Conclusion
Python's limitations as a single-threaded language become apparent as data and computation needs increase. By providing genuine parallelism over several CPU cores and getting beyond the limitations of the Global Interpreter Lock (GIL), the multiprocessing module provides a reliable solution. Multiprocessing performs better in CPU-bound activities, increasing speed and efficiency, in contrast to threading, which controls concurrency but lacks parallelism. It enables developers to design, coordinate, and manage processes, improving performance for intricate activities and making the most use of available resources. To properly utilize the power of contemporary multi-core processors and optimize the performance of their programs, developers must become proficient in Python's multiprocessing features.
For Python Developers:
Ready to build amazing things with Python? Become part of Index.dev's remote developer network and work on full-time, long-term projects.
For Clients:
Got a project? Let’s talk! Index.dev can assist you in selecting and hiring highly-skilled Python developers who understand how to manage data structures and functions in building high-performance applications.
Index developers can help you:
- Use the best performing algorithms and data structures of processing.
- Take advantage of Python and its attributes and libraries for efficient data manipulation.
- Use clean, readable, and scalable code in order to solve any challenge.
With a vast database of resources and stringent vetting techniques, you get Python specialists with good command of data manipulation, data structures, and functions.