Multithreading in Python

November 30, 2018

Often we build applications which might require several tasks to run simultaneously within the same application. This is where the concept of multithreading comes into play. This post provides a comprehensive explanation of using the Multithreading(Threading) module in Python.

Introduction

Multithreading a.k.a Threading in python is a concept by which mutliple threads are launched in the same process to achieve parallelism and multitasking within the same application. Executing different threads are equivalent to executing different programs or different functions within the same process.

What is Multithreading(Threading) ?

Multithreading is can be understood as executing multiple threads simultaneously within the same process. These threads share the same memory space as the process.
For example, a Python GUI such as Pycharm or a Jupyter Notebook keeps autosaving your code as and when you make changes, which clearly illustrates multiple tasks being performed within the same process.

Threading is generally preferred for lightweight processes since, multiple tasks are run within the same process and uses the memory allocated for that process (i.e.) uses the memory space of that process.

Multithreading shouldn’t be confused with multiprocessing, because multiprocessing is where two or more processes run for a single application without having a shared state within them as they run as different independent processes.

When to use Multithreading ?

Threads are most suitable for lightweight tasks.
When you have objects being shared across the same application.
When you want create responsive web UIs.
When you have multiple tasks which are more I/O bound.
Creating a lag free application with a backend database connection.

Threading Example

To create a thread, you simply do the following,

If this your function to be called on your thread,

import threading

def new_function(a, b, kw2=10, kw1=None) print("Hello") new_thread = threading.Thread(target=function_name, args=(arg1, arg2), kwargs={'kw1': 1, 'kw2': '2'})

To start the above thread,

new_thread.start()

Let’s take a simple example of an application where you compute the negative and positive elements of a list in 2 parallel threads.

Note: This is just a just a simple example for the purpose of demonstrating threading.

Let’s write 2 functions, * compute_negative() - to get all the negative elements from the list * compute_positive() - to get all the positive elements from the list

import threading

def compute_negative(arr):
    new_arr = []
    for i in arr:
        if i<0:
            new_arr.append(i)
    print(f"Negative Elements: {new_arr}")

def compute_positive(arr):
    new_arr = []
    for i in arr:
        if i>0:
            new_arr.append(i)  
    print(f"Positive Elements: {new_arr}")

if __name__ == '__main__':
    
    a = [1, 2, -4, 5, -7, 6, 10, -50, 100, -87, 20]
    print(f"Input List: {a}")
    
    t1 = threading.Thread(target=compute_negative, args=(a,)) # Create Thread 1
    t2 = threading.Thread(target=compute_positive, args=(a,)) # Create Thread 2

    t1.start()  # Thread 1 starts here
    t2.start()  # Thread 2 starts here

Input List: [1, 2, -4, 5, -7, 6, 10, -50, 100, -87, 20]
Negative Elements: [-4, -7, -50, -87]
Positive Elements: [1, 2, 5, 6, 10, 100, 20]

What is join in Threading ?

Before we define what threading is, let’s analyse our problem statement.

There is an application running with 2 threads in parallel. Thread 1 completes 30 seconds faster than thread 2. So, upon completion of the first thread, will the program exit or wait for thread 2 to complete ?

Well, if you guessed the program would terminate, then you are right. This is where the magic of join method of a thread comes into play.

Join basically makes the program to wait for the thread to finish. So, an additional join after starting threads would make the application wait to successfully complete all the threads in it.

Let’s make more sense of join with the below 2 examples,

Example 1 : Threading without Join

Let’s create a program to display the time 3 times with a 0.5 second delay and have some code after the threads and look at the execution.

import threading
import time
import sys


def print_time(val):
    """
    Display the time 3 times 
    with a 0.5 second delay
    """
    for i in range(3):
        time.sleep(0.5)
        print("Process:{0} Time is {1}".format(val, time.time()))


if __name__ == '__main__':
    
    t1 = threading.Thread(target=print_time, args=(1,))
    t2 = threading.Thread(target=print_time, args=(2,))

    t1.start()
    t2.start()
    
    print("Threading Complete. We are at the end of the program.")

Threading Complete. We are at the end of the program.
Process:1 Time is 1543436647.7216341
Process:2 Time is 1543436647.722194
Process:1 Time is 1543436648.2265742
Process:2 Time is 1543436648.227299
Process:1 Time is 1543436648.729373
Process:2 Time is 1543436648.731555

It is evident from the above example that even before the 2 threads complete their execution, the line of code present after starting the 2 threads is being executed.

So, how do you wait for the threads to complete before your continue with the execution of the rest of the program.

This is where, join comes in handy. Now let’s analyze the same example along with join.

Here the program waits for the threads to complete before arriving to the end.

Example 2 : Threading with Join

import threading
import time
import sys


def print_time(val):
    """
    Display the time 3 times 
    with a 0.5 second dh elay
    """
    for i in range(3):
        time.sleep(0.5)
        print("Process:{0} Time is {1}".format(val, time.time()))


if __name__ == '__main__':
    
    t1 = threading.Thread(target=print_time, args=(1,))
    t2 = threading.Thread(target=print_time, args=(2,))

    t1.start()
    t2.start()
    
    t1.join()
    t2.join()
    
    print("Threading Complete. We are at the end of the program.")

Process:1 Time is 1543436975.869845
Process:2 Time is 1543436975.8704278
Process:2 Time is 1543436976.37433
Process:1 Time is 1543436976.37479
Process:2 Time is 1543436976.87863
Process:1 Time is 1543436976.878934
Threading Complete. We are at the end of the program.

Multithreading Practical Use Case

Let’s visualize a simple practical example where multithreading comes into play.

Say, for example you build a frontend GUI where you show an exported version of the data from a database to the user. So, as and when the data is refreshed in the database, the same must be performed in the UI as well. If this is performed as a sequential task, the UI would freeze for sometime when the backend data is being refreshed.

Approach with Multithreading:
For our above use case, we can have 2 threads running at the same time, where one of them refreshes the data in the background and the other displaying the available data at the moment. This would not cause any hindrance to the user while the backend data is being refreshed in a separate thread.

Shared Objects - Thread Lock

Now, what if you wish to share data among the running threads. This is the most useful part of the threading module and how data can be shared across 2 or more threads.

What happens if two or more threads try to make changes to a shared object at the same time ?
This would result in unexpected and asynchronous results. Thread locks help to combat this issue.

The thread lock is designed in such a way that at a single time only one thread can make changes to a shared object.

This locking mechanism ensures a clean synchronization is established between the threads thereby helping to avoid unexpected results due to this simultaneous execution.

Practical Use Case: For example, sharing objects would be very useful in a case where there is a frontend UI to display a table’s data and this table’s data is manipulated from 2 data sources being refreshed periodically for every 5 minutes. So, if there’s a delay in any of these 2 data refreshes, and if both threads try to manipulate the same object at the same time, it might lead to inconsistent results.

Example 1 : Threading without Lock

Let’s create a scenario which leads to a deadlock situation leading to inconsistent results.

Our program would consist of 2 functions,

refresh_val() - increment val by 100000 times
main() - create 2 threads which call refresh_val simultaneously

We will call this main function 10 times in our code

import threading

val = 0 # global variable val

def refresh_val():
    """
    Increment val 10000 times
    """
    global lock, val
    counter = 100000
    while counter > 0:
        val += 1
        counter -= 1


def main():
    global val
    val = 0
    
    # creating threads
    t1 = threading.Thread(target=refresh_val)
    t2 = threading.Thread(target=refresh_val)

    # start threads
    t1.start()
    t2.start()

    # wait until threads complete
    t1.join()
    t2.join()


if __name__ == "__main__":
    for i in range(1,11):
        main()
        print("Step {0}: val = {1}".format(i, val))

Step 1: val = 200000
Step 2: val = 191360
Step 3: val = 200000
Step 4: val = 200000
Step 5: val = 200000
Step 6: val = 199331
Step 7: val = 200000
Step 8: val = 200000
Step 9: val = 157380
Step 10: val = 200000

Let’s perform the above same operation, with the locking mechanism present in threading.

Here is where the threading module introduces 2 methods, * Acquire - Block until the lock is released * Release - Release

When a lock is acquired by a thread for a shared object, no other thread can make changes to this object at the same time. After a lock is acquired, if another thread attempts to access an object, it would have to wait until the lock is released.

Methods to create a lock

Method 1 :

import threading

lock = threading.Lock() # create a lock
try:
    lock.acquire() # Block the lock
    # code goes here
finally:
    lock.release() # Release the lock

Method 2 :

import threading

lock = threading.Lock() # create a lock
with lock:
    # code goes here

Example 2 : Threading with Lock

Let’s perform the above same operation, using lock

As you can see below, there the value is being incremented as expected without any inconsistency.

import threading

val = 0 # global variable val

lock = threading.Lock() # create a lock

def refresh_val():
    """
    Increment val 10000 times
    """
    global lock, val
    counter = 100000
    while counter > 0:
        lock.acquire() # Block the lock
        val += 1
        lock.release() # Release the lock
        counter -= 1


def main():
    global val
    val = 0
    
    # creating threads
    t1 = threading.Thread(target=refresh_val)
    t2 = threading.Thread(target=refresh_val)

    # start threads
    t1.start()
    t2.start()

    # wait until threads complete
    t1.join()
    t2.join()


if __name__ == "__main__":
    for i in range(1,11):
        main()
        print("Step {0}: val = {1}".format(i, val))

Step 1: val = 200000
Step 2: val = 200000
Step 3: val = 200000
Step 4: val = 200000
Step 5: val = 200000
Step 6: val = 200000
Step 7: val = 200000
Step 8: val = 200000
Step 9: val = 200000
Step 10: val = 200000

When to not use Multithreading

Not suitable for CPU intensive tasks.
Having multiple heavyweight threads can slow down your main process.
Individual threads are not killable.
Creating too many threads for a single application might make your code longer and process slower.

Conclusion

We can summarise by our learning that Multithreading can be used in cases where you would like to perform multiple tasks within the same application accessing some shared objects.

To get rid of inconsistency during deadlock situations, the threading lock mechanism can be used.

Hope, by the end of this post, you can leverage multithreading based on your requirements.

Comments and feedback are welcome. Cheers!