Analysis of Top 50 Spotify Songs using Python
Which songs were most popular in 2019?
What were the most popular songs of 2019? Which genres were most popular in 2019? Which artists were most popular?
In this post we will analyze the Top 50 Spotify Songs data set, which can be found on Kaggle. The data set provides the 50 most listened to songs on Spotify in 2019. It was extracted from the Organize Your Music site.
The data set contains the following fields:
- Track.Name โ Name of Track
- Artist.Name โ Name of the Artist
- Genre โ Genre of Track
- Beats.Per.Minute โ Tempo of the Song
- Energy โ The energy of Song โ the higher the value the more energetic
- Danceability โ Thee higher the value, the easier it is to dance to the song
- Loudness..dB.. โ The higher the value, the louder the song.
- Liveness โ The higher the value, the more likely the song is a live recording.
- Valence. โ The higher the value, the more positive mood for the song.
- Length. โ The duration of the song.
- Acousticness.. The higher the value the more acoustic the song
- Speechiness. โ The higher the value the more spoken word the song contains
- Popularity โ The higher the value the more popular the song is.
To get started letโs read the data into a pandas dataframe and print the first five rows:
import pandas as pd
df = pd.read_csv("top50.csv", encoding="ISO-8859-1")
print(df.head())
We can sort the dataframe in descending order to see the top 5 most popular songs:
df.sort_values('Popularity', ascending = False, inplace = True)
print(df.head())
The top five popular songs and the corresponding artist are:
- Bad Guy, Billie Eilish
- Goodbyes (Feat. Young Thug), Post Malone
- Callaita, Bad Bunny
- Money In The Grave (Drake ft. Rick Ross), Drake
- China, Anuel AA
For the rest of this post we will focus on โGenreโ and โArtist.Nameโ but feel free to do further analysis on some of the other columns.
Letโs start by looking at how frequent each genre appears in the top 50 list with the Counter method from the python collections module. Letโs import โCounterโ from collections and print the frequencies for each genre:
from collections import Counter
print(Counter(df['Genre'].values))
We can also limit the output to the most common 5 genres:
print(Counter(df['Genre'].values).most_common(5))
We can then use matplotlib to display the results on a bar chart:
import matplotlib.pyplot as plt
bar_plot = dict(Counter(df['Genre'].values).most_common(5))
plt.bar(*zip(*bar_plot.items()))
plt.show()
We see that, in the top 50 list, songs that fall under the genre of dance pop appear 8 times, pop appears 7 times, latin appears 5 times, Canadian hip hop appears 3 times and edm appears 3 times.
We can do the same for artist:
print(Counter(df['Artist.Name'].values))
and restrict to the most common five artists:
print(Counter(df['Artist.Name'].values).most_common(5))
And plot the results:
bar_plot = dict(Counter(df['Artist.Name'].values).most_common(5))
plt.bar(*zip(*bar_plot.items()))
plt.show()
We can see that songs by Ed Sheeran appear 4 times in the top 50 list and Billie Ellish, Post Malone, Sech and Lil Nas X each appear twice.
For code reuse purposes we can define a function that takes in the column name for a categorical variable and prints a dictionary with column values and the number of times that value appears as well as a bar chart displaying the frequencies for each value:
def get_frequencies(column_name):
print(Counter(df[column_name].values))
print(dict(Counter(df[column_name].values).most_common(5)))
bar_plot = dict(Counter(df[column_name].values).most_common(5))
plt.bar(*zip(*bar_plot.items()))
plt.show()
Now if we call this function with โGenreโ we get:
get_frequencies('Genre')
and with โArtist.Nameโ:
get_frequencies('Artist.Name')
Iโll stop here but feel free to play around with the data and code yourself. The code from this post is available on GitHub. Thank you for reading!
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS