JavaScript Required

We're sorry, but we doesn't work properly without JavaScript enabled.

How to make Python run parallel (making faster) using PoolExecutor?

In this tutorial, we will check out how to make process run parallel in Python. As in Python by default program makes use of a single core of the CPU. So, by making program run parallel we will use multiple cores of CPU. I have 2 cores in my system, you may have 4 or more. Using multiple cores of CPU will result in increased speed of execution. So, let’s start.

Let’s learn how to take advantage of a whole lot of CPU cores or processing power of your system by making your program rum parallel. It’s Python’s module, which will convert the normal program into a parallel code with just change in 2-3 lines of code. Though there are some other ways also to run parallel but this is one of the good ways to do this such as we can use multithreading and multiprocessing module of Python as well. Now let’s look at concurrent.futures module of Python

What is concurrent.futures?

It is the standard module of Python which helps in achieving asynchronous execution of Program. This module is available in Python 3.0+ which is providing a high-level interface for the asynchronous task. It can be performed either with threads using ThreadPoolExecutor or segregate processes using ProcessPoolExecutor. These both defined under Executor Class of this module.

What is Executor class?

It is the abstract class of concurrent.futures module. It can be used directly, will use it with its subclasses and they are as follows -:

  • ThreadPoolExecutor
  • ProcessPoolExecutor

For more information about this module please refer to -: https://docs.python.org/3/library/concurrent.futures.html

article

Python is a good programming language for a repetitive task or in automation. But it will go slower if you want to suppose process large amount of data or fetching data from URL etc. Suppose if you want to process hell lot of images and if will make use of parallel processing it will definitely reduce computation time. Here in this tutorial will discuss examples of fetching data using URL.

Fetching data from URL

Normal Approach

1. import concurrent.futures 2. import time 3. from bs4 import BeautifulSoup 4. import requests 5. 6. ## Defining urls 7. urls = [ 8. "http://www.google.com", 9. "http://www.wikipedia.org", 10. "http://www.youtube.com", 11. "http://www.quora.com", 12. "http://www.python.org" 13. ]*10 14. 15. 16. 17. 18. def get_html(url): 19. ## getting the request object 20. site = requests.get(url).text 21. ## Parse the request object into the html file 22. html = BeautifulSoup(site,'lxml') 23. 24. return html 25. 26. 27. start = time.time() 28. 29. ## simple parsing all urls 30. for e in map(get_html, urls): 31. pass 32. 33. end = time.time() 34. print(end - start)

So, here what we are doing is fetching data/html code of following URLs for any reason supposed to want to scrape data from websites. It can be one of the major application in data science as well. Because many times there is a need to fetch data from websites.

If we will run this program without parallel implementation then it will take so long. In my case Output is -:

Output(Normal Approach) -: Time taken to fetch data is 69.38949847221375 secs

Let’s see what will be the difference if will use a parallel approach.

1. import concurrent.futures 2. import time 3. from bs4 import BeautifulSoup 4. import requests 5. 6. urls = [ 7. "http://www.google.com", 8. "http://www.wikipedia.org", 9. "http://www.youtube.com", 10. "http://www.quora.com", 11. "http://www.python.org" 12. ]*10 13. 14. 15. 16. def get_html(url): 17. ## getting the request object 18. site = requests.get(url).text 19. ## Parse the request object into the html file 20. html = BeautifulSoup(site,'lxml') 21. 22. return html 23. 24. 25. start = time.time() 26. ## Using concurrent.futures module ## Here we are using ThreadPoolExecutor not Pool Executor 27. with concurrent.futures.ThreadPoolExecutor() as executor: 28. for e in executor.map(get_html, urls): 29. pass 30. 31. end = time.time() 32. print("Time taken to get data is {} secs".format(finish - begin))

Output (Pooled Approach) -: Time taken to fetch data is 19.51672649383545 secs

So, if we will check the difference then it is 3 times faster. There we are taking less data but in real use cases, there may be thousands of URLs or more than 10k images to a process where you will really find this approach useful.

Will it always boost up speed?

Generally, it is noticed that this parallel execution is really helping in boosting up the speed of a certain program. But for sure! There may be the case it will fail. As, if we have independent processes then we can make use of ProcessPoolExecutor and it will be useful when CPU bound is there and ThreadPoolExecutor will be useful when I/O bound is there.

As in Python Django development concept of “Global Interpreter lock” (GIL) is also there. Hence by keeping that concept in mind will make better use of Pool Executor.

Hope, this gives you a clear idea of Pool Executor and how to use it in practical.

Applications for parallel/asynchronous processing -:

  • Parsing data from CSV, Excel, URL
  • Processing whole bunch of images for Image Processing

Conclusion

In this blog, we have seen with the help of concurrent.futures module of Python, how to run program parallel and helps in boosting up the speed of it. With the help of example able to illustrate the use of this module. In many applications, this parallel processing is really helpful. As many of the times, your CPU sits idle and using multiple cores of your CPU you can really make it run faster.

Hope you enjoyed, reading this blog.

Ast Note

Some of our clients

team