VOOZH about

URL: https://towardsdatascience.com/fifa-womens-world-cup-2023-visualized-with-plotly-a7277edf6278/

⇱ FIFA Women's World Cup 2023 visualized with Plotly | Towards Data Science


FIFA Women’s World Cup 2023 visualized with Plotly

A data scientist's review in five plots

5 min read
👁 Photo by Your Lifestyle Business on Unsplash
Photo by Your Lifestyle Business on Unsplash

In July and August 2023, Australia and New Zealand hosted the FIFA Women’s World Cup. A total of 32 national teams competed, with Spain taking home the trophy for the first time. Major sporting events always generate a lot of data, and I took this as an opportunity to learn how to use Plotly.

Plotly is an open source graphing library for creating interactive plots. It can be used offline or online and integrates with various programming languages. I am using Python, as I am most familiar with it, and creating static plots. The code is available on GitHub.

In five data stories, we will try out different plotly features and illustrate interesting facts about the history of the World Cup and this year’s tournament:

  1. Historical World Cup participation
  2. Player age and tournament performance
  3. Players’ clubs
  4. Countries sending the most active players
  5. Men’s and Women’s World Cup prize money in comparison

1: Participation in the Women’s World Cup

Historically, women were banned from playing soccer in many countries. The German Football Association (DFB) decided in 1955 that "the lack of elegance could damage the delicate body and soul of women and would be an attack on morals and society". It was not until 1970 that the ban was lifted. Today, women play soccer around the world, with restrictions remaining in only a few countries.

Our first data story shows the participation of national women’s soccer teams in the World Cup as a bar chart. Since the first edition in 1991, the tournament has been held nine times. Nations from every continent have participated. Some have participated in every World Cup, while others have participated only once. I was surprised to see that North Korea participated four times!

👁 World cup participation per country. Data: Wikipedia. Image: Author.
World cup participation per country. Data: Wikipedia. Image: Author.

This is the code to generate the static bar chart. The legend is moved from its default position on the right to the top of the graph to accommodate the many entries.

2: Team age and performance

The national team rosters are public on Wikipedia. We show the age of the national players in box plots. The color code refers to the team’s ranking in the tournament. As far as I can see, there is no clear pattern – teams with all kinds of median ages have similar chances to reach the knockout phase of the tournament. Haiti and Zambia stand out with very young squads.

👁 Age of the players in a nation's roster. Data: Wikipedia. Image: Author.
Age of the players in a nation’s roster. Data: Wikipedia. Image: Author.

To generate this plot, I used the plotly box function and added grid lines on the categorical axis.

3: Where do the players spend their careers?

The rosters also provide information on the players’ usual clubs, where they play when they are not part of their national team. We aggregate the number of players per club and show only the Top 30. Well-known European and British clubs such as Barcelona, Chelsea, PSG, and Arsenal dominate the list. The top Asian clubs are Incheon Hyundai Steel and Wuhan Jianghan University, which are associated with a large number of players from their respective countries.

👁 Top 30 soccer clubs represented in the World Cup. Data: Wikipedia. Image: Author.
Top 30 soccer clubs represented in the World Cup. Data: Wikipedia. Image: Author.

This is a standard lineplot, with the legend turned off since there is only one line.

4: How are the players distributed around the world?

We sum up the players according to the country where their club is located. Countries with less than 10 players are grouped as "Others". Many World Cup squad members play for in clubs in England and the US, where women’s soccer is well represented. World Cup winner Spain also attracts many top players.

👁 World Cup squad members' clubs. Data: Wikipedia. Image: Author.
World Cup squad members’ clubs. Data: Wikipedia. Image: Author.

Here I created a stacked bar chart with the categorical axis ordered by the total number of each category. The color scheme is the qualitative G10 scheme, which is part of plotly’s default color schemes.

5: Prize money

Finally, let’s take a look at whether it pays to enter and win a World Cup tournament. I was surprised to see that until the 2007 Women’s World Cup, no prize money was paid out at all. In the 2023 Australia/New Zealand tournament, there was a total of $100 million US-$ available to be paid out. Compared to the $1 billion US-$ distributed in Qatar during the 2022 Men’s World Cup, this number is still comparatively low. If we plot both curves on a logarithmic scale, we can see that the growth rate of the women’s prize money appears to be outpacing the growth rate of the men’s prize money. So at some point in the future, there may be equal pay.

👁 Prize money for the Women's and Men's World Cup. Data: FIFA / Wikipedia. Image: Author.
Prize money for the Women’s and Men’s World Cup. Data: FIFA / Wikipedia. Image: Author.

For this graph, I created two subplots, each containing two lines. The y-axis scale is set to logarithmic. The zero payments the women received in the early years of their tournament cannot be shown on this scale.

👁 Woman waiting for equal pay. Photo by Magnet.me on Unsplash
Woman waiting for equal pay. Photo by Magnet.me on Unsplash

Reflection

I found Plotly to be both convenient and intuitive to use. All the standard statistical plots are available. It will take me some time to get used to the plot customization, which is a little different from what I am used to in matplotlib. The interactive features are great for exploring data, and readers are encouraged to download the full notebook and play around with it. For the purposes of a blog post, I found it challenging to embed interactive plots, so I resorted to static plots.

References


Written By

Caroline Arnold

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles