Efficient Frontier in Python – Detailed Tutorial
Implementing Modern Portfolio Theory in Python
Without Packages
Introduction
Harry Markowitz introduced modern portfolio theory in his 1952 paper titled _Portfolio Selection_. He begins by outlining that portfolio selection is a two-step process; firstly, an investor must consider the future performance of the available assets (in terms of both risk and return) and subsequently, a decision can be made about how to construct the portfolio (i.e. how much money to allocate to each asset).
Markowitz focuses on the portfolio construction aspect and leaves the more speculative task of predicting future performance to the reader. In fact, throughout the paper, returns are assumed to follow a simple Gaussian (Normal) distribution. This assumption is the foundation upon which the whole of modern portfolio theory is based but has been the cause for much criticism as stock price returns have been shown not to follow a normal distribution.
However, let’s put the critics to one side and write some Python code to plot the Markowitz bullet and understand better the mathematical theory that earned Markowitz his Nobel prize.
The Data
Our data consists of daily returns for a universe of six stocks, as shown below. The returns range from the beginning of 2000 to late 2018.
We can immediately get the expected annualised returns, variance, and covariance matrix using two lines of code from this data. We note that by using past data like this, we assume that the future will follow past trends. This is a highly contentious assumption in the financial markets, but it will do for now.
I also made a YouTube tutorial, using the same code as in this article, if video is your preferred Medium of learning!
Getting The Portfolios – Simplistic Version
Without using any packages other than numpy, we can quickly and easily create a mean-variance plot using the following code. A mean-variance plot allows us to see the trade-off between risk and return for each portfolio.
Some things to note about this code:
- We use the following formula for expected return
- And the following formula for portfolio variance (which includes covariances as well as individual variances)
- You can adjust the number of assets to be considered in each portfolio.
Using the list of mean-variance pairs we have generated, we can now plot these portfolios 🎉
In this plot, each point represents a portfolio. It is an important note here that the majority of portfolios shown here are not on the Efficient Frontier. We address this in the next section of this post.
I am using the graphing library Plotly here; if you have not tried it, I strongly recommend giving it a chance! Plots are fully interactive in Jupyter Notebook, which is a great help for data exploration. See here for the interactive version of this plot.
More Complex Version
Previously we were randomly sampling portfolio weights; let’s try to be more efficient (pun intended). We can introduce the concept of domination so that we are not unnecessarily sampling ‘bad’ portfolios.
As a (somewhat) rational investor, if presented with two portfolios with equal return, we will choose the one with lower risk, and given two portfolios with equal risk we will choose the one with a higher return. Yes? 🤞 What does this mean in terms of the plot above? The top left is the best place to be, and any portfolio with another portfolio above it and to the left is dominated and would not be chosen by any rational investor.
So, let’s ensure our code does not include any portfolios dominated by a portfolio we have already sampled.
In this version, we also store the asset names and weights for each portfolio. Using the interactive feature of Plotly, we can hover over each point on our plot, and it will tell us the assets and weights for each. You can see this statically in the screenshot below but click here to see the interactive plot!
Using this sampling technique, we end up very quickly sampling portfolios along the efficient frontier. Some of the early sampled portfolios are clearly visible below the efficient frontier; however, this can be remedied very easily by only adding portfolios after a certain iteration (similar to the idea of a ‘burn-in period’ in MCMC).
To those who made it this far, thank you for reading! Please feel free to ask any questions in the comments below. The full code in Jupyter Notebook form can be viewed here. You can also download the data and notebook used from my Github.
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS