Analysis of Top 50 Spotify Songs using Python

Which songs were most popular in 2019?

Dec 27, 2019

3 min read

What were the most popular songs of 2019? Which genres were most popular in 2019? Which artists were most popular?

In this post we will analyze the Top 50 Spotify Songs data set, which can be found on Kaggle. The data set provides the 50 most listened to songs on Spotify in 2019. It was extracted from the Organize Your Music site.

The data set contains the following fields:

Track.Name – Name of Track
Artist.Name – Name of the Artist
Genre – Genre of Track
Beats.Per.Minute – Tempo of the Song
Energy – The energy of Song – the higher the value the more energetic
Danceability – Thee higher the value, the easier it is to dance to the song
Loudness..dB.. – The higher the value, the louder the song.
Liveness – The higher the value, the more likely the song is a live recording.
Valence. – The higher the value, the more positive mood for the song.
Length. – The duration of the song.
Acousticness.. The higher the value the more acoustic the song
Speechiness. – The higher the value the more spoken word the song contains
Popularity – The higher the value the more popular the song is.

To get started let’s read the data into a pandas dataframe and print the first five rows:

import pandas as pd

df = pd.read_csv("top50.csv", encoding="ISO-8859-1")

print(df.head())

👁 Image

We can sort the dataframe in descending order to see the top 5 most popular songs:

df.sort_values('Popularity', ascending = False, inplace = True)

print(df.head())

👁 Image

The top five popular songs and the corresponding artist are:

Bad Guy, Billie Eilish
Goodbyes (Feat. Young Thug), Post Malone
Callaita, Bad Bunny
Money In The Grave (Drake ft. Rick Ross), Drake
China, Anuel AA

For the rest of this post we will focus on ‘Genre’ and ‘Artist.Name’ but feel free to do further analysis on some of the other columns.

Let’s start by looking at how frequent each genre appears in the top 50 list with the Counter method from the python collections module. Let’s import ‘Counter’ from collections and print the frequencies for each genre:

from collections import Counter 
print(Counter(df['Genre'].values))

👁 Image

We can also limit the output to the most common 5 genres:

print(Counter(df['Genre'].values).most_common(5))

👁 Image

We can then use matplotlib to display the results on a bar chart:

import matplotlib.pyplot as plt
bar_plot = dict(Counter(df['Genre'].values).most_common(5))
plt.bar(*zip(*bar_plot.items()))
plt.show()

👁 Image

We see that, in the top 50 list, songs that fall under the genre of dance pop appear 8 times, pop appears 7 times, latin appears 5 times, Canadian hip hop appears 3 times and edm appears 3 times.

We can do the same for artist:

print(Counter(df['Artist.Name'].values))

👁 Image

and restrict to the most common five artists:

print(Counter(df['Artist.Name'].values).most_common(5))

👁 Image

And plot the results:

bar_plot = dict(Counter(df['Artist.Name'].values).most_common(5))
plt.bar(*zip(*bar_plot.items()))
plt.show()

👁 Image

We can see that songs by Ed Sheeran appear 4 times in the top 50 list and Billie Ellish, Post Malone, Sech and Lil Nas X each appear twice.

For code reuse purposes we can define a function that takes in the column name for a categorical variable and prints a dictionary with column values and the number of times that value appears as well as a bar chart displaying the frequencies for each value:

def get_frequencies(column_name):
 print(Counter(df[column_name].values))
 print(dict(Counter(df[column_name].values).most_common(5)))
 bar_plot = dict(Counter(df[column_name].values).most_common(5))
 plt.bar(*zip(*bar_plot.items()))
 plt.show()

Now if we call this function with ‘Genre’ we get: