Enable Javascript

Please enable Javascript to view website properly

Toll Free 1800 889 7020

Looking for an Expert Development Team? Take two weeks Trial! Try Now

Reduce time-complexity using Joblib in Python

Joblib in Python

Are you a machine learning enthusiast? Or a freaking nerd who is always concerned about time and space optimization in your code? Well, anyway, it’s a blog for making developers wiser during coding.


With the advancement of AI era, many new machine learning algorithms and optimization techniques are invented to cut a throat for best time and space complexity. Keeping this in mind, let me introduce not very old yet very powerful Python library, Joblib. Joblib brought a breakthrough in various contexts of Python activities such as loading up large Numpy arrays, serializing and persisting python object or performance of python functions you build, seeking with the help of parallel computing, memorization (not typo mistake,’r’ doesn’t exists) and caching mechanism in addition to multi-processing, loky(default) and threading backend.

This blog will give you shivers and shrieks when you will be awed by the performance of Joblib. Let’s cut to the chase. We will be crossing the following milestones.

  • Dive into Joblib

    Features which made Joblib an Avenger

    How to get started?

    Main Conceptual Features of Joblib

    Joblib functional areas of optimization in Python

  • Implementations of Joblib

  • Conclusion


Joblib is a library built purely in Python by scikit-learn developers. It entirely focuses on optimizing python-based persistence and functions. An awesome library that has become popular due to its optimized time-complexity feature, especially for handling big data. It provides lightweight pipelining in Python development services.

Problem: Many challenges we face while dealing with large data. Call it taking huge time and space when working with intensive computational functions or persisting then loading huge data as a pickle.

Solution: Joblib

Features which made Joblib an Avenger in reducing time-complexity:

  • 1. Fast Disk-caching and lazy-evaluation using hashing technique as well
  • 2. Capable of distributing jobs (parallelization) using a Parallel helper
  • 3. Compression feature during persistence containing large data
  • 4. Best known for handling large data
  • 5. Specific optimization for handling large Numpy arrays
  • 6. Memoization where function called with same argument won’t re-compute, instead, output loads back from cache using memmapping Cherry on the cake
  • 7. No dependent library (except Python itself)

Later, we will look over the practical examples of the above features one by one. Stay tuned!

How to get started?

You can install Joblib using pip as follows:

pip install joblib

Main Conceptual Features of Joblib

Joblib in Python

Parallel Computing:

  • 1. Parallel class

    Normally, concurrent computing achieves by the n_jobs argument referring to different concurrent processes which mean OS lets those jobs run at the same time. Generally, it refers to CPU (processor) cores whose value is determined by a task. Suppose a task of intensive I/O but not with a processor, then processes can be more.

    classjoblib.Parallel(n_jobs=None, backend=None, verbose=0, timeout=None, pre_dispatch='2 * n_jobs', batch_size='auto', temp_folder=None, max_nbytes='1M', mmap_mode='r', prefer=None, require=None)

    Also, you can explore more about “backend” which gives you options like multi-processing and multi-threading. For more info, check out the documentation.

  • 2. delayed decorator

    A delayed is a decorator mainly to get the arguments of a function by creating a tuple with function call syntax.

    joblib.delayed(function, check_pickle=None)

Caching (Memoization)

  • 1. Memory class

    Lazy evaluation of Python function in simple terms means a code though assigned to a variable that will execute only when its result is needed by other computations. Caching the result of a function is termed memorization to avoid recomputing.

    classjoblib.memory.Memory(location=None, backend='local', cachedir=None, mmap_mode=None, compress=False, verbose=1, bytes_limit=None, backend_options={})

    Also avoids rerunning the function with the same args. Memory class stores result in a disk that loads back the output cached by using hashing technique when a function called with same args. Hashing will check out whether if output for inputs is already computed or not, if not then recomputed or else loads cache value. It is mainly featured for large NumPy arrays.

    Output is saved in a pickle file in the cache directory.

  • 2. Memory.cache()

    Callable object furnishing a function for stashing its return value each time it is called.

Data Persistence

Joblib offers help in persisting any data structure or your machine learning model. It has proved to be a better replacement for Python’s standard library, Pickle. Unlike the Pickle library, Joblib can pickle Python objects and filenames.

Breakthrough is optimizing space complexity during pickling which is achieved by joblib’s compression techniques to save a persisted object in compressed form. Joblib compresses data before saving it into a disk. Various compression extensions like gz, z, etc have their respective compression methods. For more info, visit the following link. http://gael-varoquaux.info/programming/new_low-overhead_persistence_in_joblib_for_big_data.html


Joblib in Python


  • 1. Run over loops (Embarrassingly parallel for loops)

    Joblib in Python
  • 2. Reload large numpy (Memoize pattern)

    As mentioned previously, memorize refers to just loading up the output from cache for the function called with the same arguments again.

    Memory class context is also used over NumPy when it comes to the long calculation of NumPy or loading large NumPy. This can be achieved using mmap_mode (memory map) or just the decorator function.

    Using Memmapping (memory mapping) mode is helpful while reloading large NumPy arrays by speeding up the cache to find out. Can also use memory.cache decorator.

    Joblib in Python

    The Square function is called again with the same argument which is now using memorize technique using mmap_mode (Memory mapping) which again uses hashing technique so as to speed up with the cache.

    Joblib in Python

    Using decorator function, here is fun1 is called again with same arguments, will follow a memoize pattern.

    Joblib in Python
  • 3. Python function (Memorize (caching) +Parallelism)

    Caching: While working with the custom python function as demonstrated below, it took 5.01 s.

    Joblib in Python

    The same function when called again, it takes 0 s since to load output from the cache.

    Joblib in Python
  • 4. Serialization and Persistence

    With Joblib you can persist filenames or even file objects. Python objects can be any data structure object or even your machine or deep learning model. Let’s look at the ways to dump and load objects.

    - Normal persisting and loading of list object

    Joblib in Python

    Persisting file is compressed using compress argument, hence achieving space-complexity will indirectly effecting time-complexity during loading up of the object.

    Joblib in Python

    - In below example, ‘.z’ compressed file is dumped.

    Joblib in Python

    - In below example, .gz compressed file is dumped which has gzip compression method with compression level of 3.

    Joblib in Python

    - As you can see below, difference of storage between varied forms of pickle files.

    Joblib in Python


Every beginning has an end. Well, that’s a cycle of nature! Likewise, our blog came to an end. We have seen how Joblib is a life savior in the context of handling huge data which could have taken a lot of space and time, if not without Joblib. The blog has immensely described this lightweight pipelining library which is capable enough to optimize time and space. Features like parallelism, memorization, and caching or file compression outperformed all ML/AI libraries.

In machine learning, a huge model pickle file now can no more consume a lot of space and load the same file more quickly. But, life isn’t fair every time, right? Jokes apart!

Joblib can at times not be quicker when a small amount of data comes into view. But, above all, it is recommended over the Pickle library for object persistence and can be considered when in need to perform Parallel tasks.

Recent Blogs


NSS Note

Some of our clients