If you worked on core Machine Learning applications like Churn Prediction for any industry, Classifying people in categories will come across loading and looping over tons of data. Hence, Python Development services provide an amazing library for handling panel data i.e., multidimensional data sets which are Pandas but still it is lacking if don’t use the correct method to loop over for larger datasets. Therefore in this blog, will look at how to optimize your python code to avoid taking a long time.
Different methods of Pandas will explore in this blog:
 Crude Looping with indices
 Looping with iterrows() function available in pandas
 Lopping with apply() method for dataframe
 Vectorization in Pandas
This blog will use a dataset of Churn modeling that contains many variables such as SEX, EDUCATION, MARRIAGE, AGE, PAY_0, PAY_2, PAY_3, PAY_4, etc.
You can check out the dataset here : https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
Loading the dataset using pandas
 Import pandas as pd
 data = pd.read_csv('./Churn_Data.csv')
It will try to optimize one part of the calculation supposed for building a machine learning model. In which we have to derive a new variable which comprises the sum of the payment in the last 2 months.
So, the payment function is defined as follows:
 def sum_payment(pay1,pay2):
 return (pay1 + pay2)

Crude Looping with indices
This section will see how crude looping i.e., iterating with indices for the whole dataframe. Here in our dataset, we have 30k data points. Let’s create a new variable in dataframe for the payment of the last two months.
1. def total_pay_last_two(): 2. pay_last_two = [] 3. ## looping over length of data 4. for i in range(0, len(data)): 5. ## adding last 2 months payment i.e, pay_amt5 and pay_amt6 6. s = sum_payment(data.iloc[i]['PAY_AMT5'],data.iloc[i]['PAY_AMT6']) 7. pay_last_two.append(s) 8. return pay_last_two In the above code, we are using indices form to get values from data. Let’s see how much will it take, we will use a magic command %%timeit in the jupyter notebook otherwise timeit or time module can be used as well.
1. %%timeit 2. data['pay_last_two'] = total_pay_last_two() Output :
14.6 s ± 407 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This output says that 1 loop takes about 14s which is very high.

iterrows() method
Now, we will explore the iterrows() method of Pandas. It will enhance the speed of execution. iterrows() is a generator that gives an index to loop over along with the current row of datafarme. It is optimized for working with pandas. Let’s check out with the help of an example.
1. def total_pay_last_two(): 2. pay_last_two = [] 3. for i, row in data.iterrows(): 4. ## Now we can access the data row without index 5. s = sum_payment(row['PAY_AMT5'], row['PAY_AMT6']) 6. pay_last_two.append(s) 7. return pay_last_two We will see how much time this iterrows() function takes:
1. %%timeit 2. data['pay_last_two'] = total_pay_last_two() Output :
3.99 s ± 229 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Iterrows() function is faster than crude looping through indices. Which is around 5 times faster.

apply () method
At last, we will explore the apply method for the dataframe. apply () method access the whole dataframe at an instance and it uses anonymous lambda function to fetch rows.
1. %%timeit 2. data['pay_last_two'] = data.apply(lambda row: sum_payment(row['PAY_AMT5'], row['PAY_AMT6']), axis=1) Output :
1.51 s ± 57.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
apply() method is faster than the iterrows() method. Here it shows, 3 times faster. Hence, in many cases, we have to use the apply() method which makes the program more efficient.
There are several other methods, such as panda series vectorization, or we can use Nampy for vectorized implementation.
Conclusion
By integrating all the methods above we can say that we can use the optimized version of Panda for larger datasets. More commonly, the apply() method or vectorized implementation is used for quick calculations. Now, you added one new thing to your toolbox.