Enable Javascript

Please enable Javascript to view website properly

Toll Free 1800 889 7020

Looking for an Expert Development Team? Take two weeks Trial! Try Now

How to Make Python Run Parallel Using PoolExecutor?

In this article, we will check out how to make the process run parallel in Python. And also, we will learn what concurrent.futures is, How to fetch data from URL, etc. By default, the program uses a single core of the CPU, as in Python. So, by making the program run parallel, we will use multiple cores of the CPU. I have two cores in my system. You may have 4 or more. Using multiple cores of the CPU will result in an increased speed of execution. So, let’s start.

Let’s learn how to take advantage of a whole lot of CPU cores or processing power of your system by running your program in parallel. It is a module of Python which will convert the normal program into parallel code with changes in 2-3 lines of code. Though there are some other ways also to run parallel, this is one of the good ways to do this, such as we can use the multithreading and multiprocessing module of Python as well. Now let’s look at concurrent.futures module of Python.

What is concurrent.futures?

It is the standard module of Python, and it helps in achieving asynchronous execution of the Program. This module is available in Python 3.0+ providing a high-level interface for the asynchronous task. It can either be performed with threads using ThreadPoolExecutor or segregate processes using ProcessPoolExecutor. These are both defined under the Executor Class of this module.

What is the Executor class?

It is the abstract class of concurrent.futures module. It can be used directly and will use with its subclasses, and they are as follows:

  • ThreadPoolExecutor
  • ProcessPoolExecutor

For more information about this module, please refer to https://docs.python.org/3/library/concurrent.futures.html

article

Python is a good programming language for repetitive tasks or automation. But it will be slow if you want to assume processing a large amount of data or fetch data from URL etc. Suppose you want to process a hell lot of images, and if will use parallel processing, it will reduce computation time. Here this tutorial will discuss examples of fetching data using URL.

Fetching data from URL

Normal Approach

1. import concurrent.futures 2. import time 3. from bs4 import BeautifulSoup 4. import requests 5. 6. ## Defining urls 7. urls = [ 8. "http://www.google.com", 9. "http://www.wikipedia.org", 10. "http://www.youtube.com", 11. "http://www.quora.com", 12. "http://www.python.org" 13. ]*10 14. 15. 16. 17. 18. def get_html(url): 19. ## getting the request object 20. site = requests.get(url).text 21. ## Parse the request object into the html file 22. html = BeautifulSoup(site,'lxml') 23. 24. return html 25. 26. 27. start = time.time() 28. 29. ## simple parsing all urls 30. for e in map(get_html, urls): 31. pass 32. 33. end = time.time() 34. print(end - start)

So, here we are fetching data/Html code of the following URLs for any reason supposed to want to scrape data from websites. It can be one of the major applications in data science as well. Because many times there is a need to fetch data from websites.

If we will run this program without parallel implementation, it will take so long. In my case Output is

Output(Normal Approach) -: Time taken to fetch data is 69.38949847221375 secs

Let’s see what the difference will be if will use a parallel approach.

1. import concurrent.futures 2. import time 3. from bs4 import BeautifulSoup 4. import requests 5. 6. urls = [ 7. "http://www.google.com", 8. "http://www.wikipedia.org", 9. "http://www.youtube.com", 10. "http://www.quora.com", 11. "http://www.python.org" 12. ]*10 13. 14. 15. 16. def get_html(url): 17. ## getting the request object 18. site = requests.get(url).text 19. ## Parse the request object into the html file 20. html = BeautifulSoup(site,'lxml') 21. 22. return html 23. 24. 25. start = time.time() 26. ## Using concurrent.futures module ## Here we are using ThreadPoolExecutor not Pool Executor 27. with concurrent.futures.ThreadPoolExecutor() as executor: 28. for e in executor.map(get_html, urls): 29. pass 30. 31. end = time.time() 32. print("Time taken to get data is {} secs".format(finish - begin))

Output (Pooled Approach) -: Time taken to fetch data is 19.51672649383545 secs

So, if we will check the difference, it is three times faster. There we are taking less data, but in real use cases, there may be thousands of URLs or more than 10k images to a process where you will really find this approach useful.

Will it always boost up speed?

Generally, it is noticed that this parallel execution is really helping in boosting the speed of a certain program. But for sure! There may be the case that it will fail. As such, if we have independent processes, then we can make use of ProcessPoolExecutor, and it will be useful when CPU bound is there, and ThreadPoolExecutor will be useful when I/O bound is there.

As in Python Django development concept of “Global Interpreter lock” (GIL) is also there. Hence keeping that concept in mind will make better use of Pool Executor.

I hope this gives you a clear idea of Pool Executor and how to use it practically.

Applications for parallel/asynchronous processing:

  • Parsing data from CSV, Excel, URL
  • Processing whole bunch of images for Image Processing

Conclusion

In this blog, we have seen with the help of concurrent.futures module of Python, how to run program parallel and helps in boosting up the speed. We have also explained the usage of this module with the help of some examples. In many applications, this parallel processing is really helpful. The more times your CPU is idle, you can make it run faster by using multiple cores of the CPU.

Recent Blogs

Categories

NSS Note
Some of our clients
team