Hi all! I’m Melissa and today in this video I'll explain how to optimize the adding
of multiple rows in Pandas DataFrame. I faced this problem a lot of times and I
was so confused on which method to use because there are a lot of ways to do it.
But when you have to build an application you are interested in
in performance, so I decided to try the most popular and check their speed. The first
option is the append function. Let’s create a DataFrame with five rows and
four columns full of random numbers. We want to append
10,000 rows using a for loop. In each iteration we create a dictionary and
append it to the DataFrame. In the meanwhile we evaluate the time needed to perform this
task. The result shows that the append function needs 18 seconds. The second
option is the loc function. As before we create a DataFrame with 5 rows and 4
columns full of random numbers and the rows to add are the same 10,000.
To perform this method we create a NumPy array for each iteration and then we add
it to the DataFrame using the loc function. As before we evaluate the time
needed to perform this task and the result shows that the loc function
works worse than the append function. Now let’s dive in the last method. We create an
empty list called row_list and we append a dictionary to it 5 times. This dictionary
has four key-value pairs. Now we enter in the for loop and in each step we
create a dictionary. Instead of adding the dictionary to the DataFrame, we append it to
row_list. At the end of the loop we add the list to the DataFrame. The result
shows that appending a dictionary to a list in the loop and then insert the
list into the DataFrame is a better choice,
in fact it only takes less than one second. Next time, when you will deal with
adding rows to DataFrame, don’t forget this useful method. For today we have done,
thank you for watching this video, don’t forget to like and subscribe. We see you
next time bye
