FIFA Women’s World Cup 2023 visualized with Plotly
A data scientist's review in five plots
In July and August 2023, Australia and New Zealand hosted the FIFA Women’s World Cup. A total of 32 national teams competed, with Spain taking home the trophy for the first time. Major sporting events always generate a lot of data, and I took this as an opportunity to learn how to use Plotly.
Plotly is an open source graphing library for creating interactive plots. It can be used offline or online and integrates with various programming languages. I am using Python, as I am most familiar with it, and creating static plots. The code is available on GitHub.
In five data stories, we will try out different plotly features and illustrate interesting facts about the history of the World Cup and this year’s tournament:
- Historical World Cup participation
- Player age and tournament performance
- Players’ clubs
- Countries sending the most active players
- Men’s and Women’s World Cup prize money in comparison
1: Participation in the Women’s World Cup
Historically, women were banned from playing soccer in many countries. The German Football Association (DFB) decided in 1955 that "the lack of elegance could damage the delicate body and soul of women and would be an attack on morals and society". It was not until 1970 that the ban was lifted. Today, women play soccer around the world, with restrictions remaining in only a few countries.
Our first data story shows the participation of national women’s soccer teams in the World Cup as a bar chart. Since the first edition in 1991, the tournament has been held nine times. Nations from every continent have participated. Some have participated in every World Cup, while others have participated only once. I was surprised to see that North Korea participated four times!
This is the code to generate the static bar chart. The legend is moved from its default position on the right to the top of the graph to accommodate the many entries.
2: Team age and performance
The national team rosters are public on Wikipedia. We show the age of the national players in box plots. The color code refers to the team’s ranking in the tournament. As far as I can see, there is no clear pattern – teams with all kinds of median ages have similar chances to reach the knockout phase of the tournament. Haiti and Zambia stand out with very young squads.
To generate this plot, I used the plotly box function and added grid lines on the categorical axis.
3: Where do the players spend their careers?
The rosters also provide information on the players’ usual clubs, where they play when they are not part of their national team. We aggregate the number of players per club and show only the Top 30. Well-known European and British clubs such as Barcelona, Chelsea, PSG, and Arsenal dominate the list. The top Asian clubs are Incheon Hyundai Steel and Wuhan Jianghan University, which are associated with a large number of players from their respective countries.
This is a standard lineplot, with the legend turned off since there is only one line.
4: How are the players distributed around the world?
We sum up the players according to the country where their club is located. Countries with less than 10 players are grouped as "Others". Many World Cup squad members play for in clubs in England and the US, where women’s soccer is well represented. World Cup winner Spain also attracts many top players.
Here I created a stacked bar chart with the categorical axis ordered by the total number of each category. The color scheme is the qualitative G10 scheme, which is part of plotly’s default color schemes.
5: Prize money
Finally, let’s take a look at whether it pays to enter and win a World Cup tournament. I was surprised to see that until the 2007 Women’s World Cup, no prize money was paid out at all. In the 2023 Australia/New Zealand tournament, there was a total of $100 million US-$ available to be paid out. Compared to the $1 billion US-$ distributed in Qatar during the 2022 Men’s World Cup, this number is still comparatively low. If we plot both curves on a logarithmic scale, we can see that the growth rate of the women’s prize money appears to be outpacing the growth rate of the men’s prize money. So at some point in the future, there may be equal pay.
For this graph, I created two subplots, each containing two lines. The y-axis scale is set to logarithmic. The zero payments the women received in the early years of their tournament cannot be shown on this scale.
Reflection
I found Plotly to be both convenient and intuitive to use. All the standard statistical plots are available. It will take me some time to get used to the plot customization, which is a little different from what I am used to in matplotlib. The interactive features are great for exploring data, and readers are encouraged to download the full notebook and play around with it. For the purposes of a blog post, I found it challenging to embed interactive plots, so I resorted to static plots.
References
- Notebook on GitHub: https://github.com/crlna16/medium_notebooks/blob/384a0d07e0aa65e35e7086bf8fe67c1d8e5e679e/plotly/fifa23.ipynb
- All historical and squad data is taken from Wikipedia: https://en.wikipedia.org/wiki/2023_FIFA_Women’s_World_Cup_squads
- FIFA Men’s World Cup Prize Money: https://www.totalsportal.com/football/fifa-world-cup-prize-money/
- https://towardsdatascience.com/how-to-create-a-plotly-visualization-and-embed-it-on-websites-517c1a78568b
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS