VOOZH about

URL: https://towardsdatascience.com/analysis-of-top-50-spotify-songs-using-python-5a278dee980c/

โ‡ฑ Analysis of Top 50 Spotify Songs using Python | Towards Data Science


Analysis of Top 50 Spotify Songs using Python

Which songs were most popular in 2019?

3 min read
๐Ÿ‘ Photo by John Tekridis on Pexels
Photo by John Tekridis on Pexels

What were the most popular songs of 2019? Which genres were most popular in 2019? Which artists were most popular?

In this post we will analyze the Top 50 Spotify Songs data set, which can be found on Kaggle. The data set provides the 50 most listened to songs on Spotify in 2019. It was extracted from the Organize Your Music site.

The data set contains the following fields:

  1. Track.Name โ€“ Name of Track
  2. Artist.Name โ€“ Name of the Artist
  3. Genre โ€“ Genre of Track
  4. Beats.Per.Minute โ€“ Tempo of the Song
  5. Energy โ€“ The energy of Song โ€“ the higher the value the more energetic
  6. Danceability โ€“ Thee higher the value, the easier it is to dance to the song
  7. Loudness..dB.. โ€“ The higher the value, the louder the song.
  8. Liveness โ€“ The higher the value, the more likely the song is a live recording.
  9. Valence. โ€“ The higher the value, the more positive mood for the song.
  10. Length. โ€“ The duration of the song.
  11. Acousticness.. The higher the value the more acoustic the song
  12. Speechiness. โ€“ The higher the value the more spoken word the song contains
  13. Popularity โ€“ The higher the value the more popular the song is.

To get started letโ€™s read the data into a pandas dataframe and print the first five rows:

import pandas as pd
df = pd.read_csv("top50.csv", encoding="ISO-8859-1")
print(df.head())
๐Ÿ‘ Image

We can sort the dataframe in descending order to see the top 5 most popular songs:

df.sort_values('Popularity', ascending = False, inplace = True)
print(df.head())
๐Ÿ‘ Image

The top five popular songs and the corresponding artist are:

  1. Bad Guy, Billie Eilish
  2. Goodbyes (Feat. Young Thug), Post Malone
  3. Callaita, Bad Bunny
  4. Money In The Grave (Drake ft. Rick Ross), Drake
  5. China, Anuel AA

For the rest of this post we will focus on โ€˜Genreโ€™ and โ€˜Artist.Nameโ€™ but feel free to do further analysis on some of the other columns.

Letโ€™s start by looking at how frequent each genre appears in the top 50 list with the Counter method from the python collections module. Letโ€™s import โ€˜Counterโ€™ from collections and print the frequencies for each genre:

from collections import Counter 
print(Counter(df['Genre'].values))
๐Ÿ‘ Image

We can also limit the output to the most common 5 genres:

print(Counter(df['Genre'].values).most_common(5))
๐Ÿ‘ Image

We can then use matplotlib to display the results on a bar chart:

import matplotlib.pyplot as plt
bar_plot = dict(Counter(df['Genre'].values).most_common(5))
plt.bar(*zip(*bar_plot.items()))
plt.show()
๐Ÿ‘ Image

We see that, in the top 50 list, songs that fall under the genre of dance pop appear 8 times, pop appears 7 times, latin appears 5 times, Canadian hip hop appears 3 times and edm appears 3 times.

We can do the same for artist:

print(Counter(df['Artist.Name'].values))
๐Ÿ‘ Image

and restrict to the most common five artists:

print(Counter(df['Artist.Name'].values).most_common(5))
๐Ÿ‘ Image

And plot the results:

bar_plot = dict(Counter(df['Artist.Name'].values).most_common(5))
plt.bar(*zip(*bar_plot.items()))
plt.show()
๐Ÿ‘ Image

We can see that songs by Ed Sheeran appear 4 times in the top 50 list and Billie Ellish, Post Malone, Sech and Lil Nas X each appear twice.

For code reuse purposes we can define a function that takes in the column name for a categorical variable and prints a dictionary with column values and the number of times that value appears as well as a bar chart displaying the frequencies for each value:

def get_frequencies(column_name):
 print(Counter(df[column_name].values))
 print(dict(Counter(df[column_name].values).most_common(5)))
 bar_plot = dict(Counter(df[column_name].values).most_common(5))
 plt.bar(*zip(*bar_plot.items()))
 plt.show()

Now if we call this function with โ€˜Genreโ€™ we get:

get_frequencies('Genre')
๐Ÿ‘ Image

and with โ€˜Artist.Nameโ€™:

get_frequencies('Artist.Name')
๐Ÿ‘ Image

Iโ€™ll stop here but feel free to play around with the data and code yourself. The code from this post is available on GitHub. Thank you for reading!


Written By

Sadrach Pierre, Ph.D.

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles