Introducing my book: Python for Finance Cookbook
My short story of going from Medium articles to a book contract
In this article, I wanted to share my story of how writing articles on Towards Data Science started quite an adventure, which ended up as a published book. I will also describe what I learned on the way and what tools helped me in the process. Let’s start!
How it began
It was sometime in February of 2019 that I was contacted by an Acquisitions Editor with a question if I would be interested in authoring a book. I had already written a few articles on Medium, and I was a bit overwhelmed by the scale of the project. However, it sounded like an exciting idea and a possibility to scale up my little writing hobby. Soon, I agreed to take on the challenge.
The proposal
The first step of the entire process was the proposal. The publisher already had an idea about the general topic of the book (the main theme, not the content itself), what type of a book it would be and whom it should target.
This is a good place to mention the format of the book. As already indicated in the title, it is a cookbook and focuses on the hands-on approach. Each chapter contains a selection of recipes, and each of them is a standalone task that an analyst can encounter while working in the financial domain.
While writing the proposal, I had the freedom of choosing what recipes I would like to include in the chapters. The proposal had to be quite detailed, for each recipe I had to include the title, a short summary, and the expected page count. This proved to be quite tricky, but I will get back to it later.
After a few iterations and incorporating some constructive feedback from the editor, the proposal was ready, I signed the contract and started writing.
The process
I received some introductory documents from the publisher. They included some general information about the style the book should follow, but also some tips on the writing process itself. According to them, by writing ~1–2 pages a day, it should not be a problem to meet the deadlines and complete the book in time.
At least for me, most of the times it did not really look like that. Writing the book can be similar to having another job, at least time-wise. And it can be challenging to combine both of them and also have enough time for personal life.
To make it easier for me, I tried to do as much work during the commute as I could. As I have to take a 40 min train ride twice a day, it only makes sense to utilize this time as efficiently as possible. Before I read books at that time or worked on articles, so I was already used to working in such conditions.
Even when I managed to work on the book twice a day during the commute, I have rarely produced enough content to fill the 1–2 pages quota. Most of the time, I also worked in the evenings. It often happened that I was simply too tired after the day at work and the long commute to do anything constructive in the evening. As a result, I had a few delays with respect to the initial schedule. Thankfully, the editors were supportive and encouraging in the process.
I was writing the book from March 2019 to early January 2020. For each chapter, I followed a similar workflow. First I gathered resources for the chapter I was currently working on. Then, I started by stating the problem at hand and writing the code that solves it. After the coding part was complete, I would write the introduction to the recipe and describe in detail how the code worked (step by step). In most of the recipes, I also wrote a There’s more section, which contained some more advanced topics concerning that recipe or focused on another possible approach to the task.
Most of the time, I was writing the content of the recipes in the same Jupyter Notebooks as the code. This seemed most convenient, as I would transfer everything (text + code) to the publisher’s Markdown-based text editor all at once. However, there were also problems connected to it. More on that in the section below.
Another thing worth mentioning is that while writing the book, the outline evolved. Some of the recipes I planned in the proposal were merged together, some were replaced. I think it is really difficult to write a proposal that will not change when the writing actually starts.
What I learned…
In this short section, I will describe some key takeaways I learned while writing the book and what I would have done differently. I will mostly write about the process itself, not coding in particular.
- Giving a realistic page count while writing the proposal was much harder than expected. I gravely underestimated the number of pages required for each chapter. First of all, I did not account for the size of the images, as I had no clear idea of how many images will be in each chapter. Second, I underestimated the space occupied by the code. Given that the code was broken into numbered steps and each step had its short description, this significantly impacted the page count.
- More frequent reality checks with the publisher’s editor. As I mentioned before, I was mostly writing the content of the chapters in the Jupyter Notebook, where I also presented the solution to the given task. I only moved the chapter to the publisher’s editor when I thought I was done with it. The problem with this approach is that I had no real indicator of page count in Jupyter. So it often happened that I was well above the target when I moved all the content to the editor. Also, when moving the code to the editor, it often did not fit the page width nicely and required further refactoring – a back and forth between the Notebook and the editor.
- 9 months in data science is really a lot, as the industry is constantly developing. I experienced this in a few ways. First, by the time I finished the book, some of the Python libraries were renamed, which required correcting the code in multiple chapters. Second, some functionalities (around downloading data) were deprecated and as a result, I removed 2 full recipes. Third, while writing the later chapters I stumbled upon some new and cool things I would like to use in the previous chapters, however, usually there was no time (and pages) to incorporate this into the already written parts. The last thing resulted in multiple article ideas which I hope to publish soon 🙂
- The code required many rounds of refactoring. The reasons for that included correcting the formatting (so it matches the page width), accounting for changes in libraries, etc. Also, as I was developing during the writing process, I often went back to the earlier chapters to refactor the code so that it matched the standards of the later ones. One thing to remember is that modifying the code did not only happen in the Jupyter Notebook but also in the publisher’s editor where the book was stored, so it was crucial to correctly update both sources.
- One thing I had to learn to work with was delayed gratification. In the case of Medium articles, as soon as the article is finished, it can be published within seconds and I can see its reception. I know, this is vanity speaking. But with the book, it took close to a year to finish and required some determination. In the end, the satisfaction of delivering the final version of the book and it being processed for printing was great and worth the wait. Additionally, during the writing of the book, I managed to find some spare time to prepare a few articles. As I mentioned before, I had quite a few ideas. For example, I was investigating libraries used for backtesting trading strategies in Python. While I used one library in the book (and stuck with it for consistency), I did quite some research on the other one and it would be a shame not to share it as a series of articles (started by this one).
Tools that made my life easier
- Trello – it allows us to create Kanban-style lists of tasks. It is a great tool for organizing projects, both for personal use and for entire teams. For the book I created a board and divided it into sections such as backlog, doing, revision, done, ideas. This way, whenever I came up with a new idea or wanted to keep track of something I still had to refactor, I could easily use Trello for organizing the tasks. I also use it for organizing my blog writing 🙂
- GitHub – arguably the most popular platform for version control. You can find the book’s GitHub repository here.
- Grammarly – a great tool for making sure your writing is relatively "correct". It saves a lot of time by showing potential mistakes. However, a thorough read is also important as Grammarly can miss some complex edge cases. Grammarly can also be installed as a Chrome extension and works well with multiple websites (Medium included!).
- Hemmingway App – this web-app is great for analyzing the text and showing its complexity. It offers additional help by pointing out which sentences are too long or complex and could use some tuning (for example, breaking them down into two separate sentences). In general, the suggestions by Hemmingway App make the text easier to read and understand.
- draw.io – a really handy tool for creating publication-quality diagrams. It works as a browser drag-and-drop tool, offering a lot of icons and images. On top of that, it allows for saving your projects, loading them again for potential corrections and exporting to multiple formats (including PDF).
- Jupyter Notebook / Visual Studio Code – most of the time, I used Jupyter Notebooks (with nbextensions such as the table of contents, spellchecker, etc.) for both coding and writing. Whenever I had a need for preparing a separate
.pyscript, I used VS Code, which is my go-to text editor.
The book itself
After all the information about the process itself, it is time to finally present the book. As mentioned before, the title is Python for Finance Cookbook and the book contains over 50 hands-on recipes.
The book targets people who have some working knowledge of Python and also some knowledge of quantitative finance/machine learning/deep learning. In the recipes, I provide a high-level overview of the theory behind the techniques used and often refer to papers/books for an in-depth read. But generally, I focus on explaining the implementation in Python rather than the underlying theory.
The book is divided into 10 chapters:
Chapter 1, Financial Data and Preprocessing, explores how financial data is different from other types of data commonly used in machine learning tasks. I show how to download data from different sources and preprocess it for further analysis.
Chapter 2, Technical Analysis in Python, demonstrates some fundamental basics of technical analysis as well as how to quickly create elegant dashboards in Python. The reader will be able to draw some insights into patterns emerging from a selection of the most commonly used metrics (such as MACD and RSI).
Chapter 3, Time Series Modeling, introduces the basics of time series modeling (including time series decomposition and statistical stationarity). Then, I look at two of the most widely used approaches to time series modeling – exponential smoothing methods and ARIMA class models. Lastly, I present a novel approach to modeling a time series using the additive model from Facebook’s Prophet library.
Chapter 4, Multi-Factor Models, shows how to estimate various factor models in Python. I start with the simplest one-factor model and then explain how to estimate more advanced three-, four-, and five-factor models.
Chapter 5, Modeling Volatility with GARCH Class Models, introduces the reader to the concept of volatility forecasting using (G)ARCH class models, how to choose the best-fitting model, and how to interpret the results.
Chapter 6, Monte Carlo Simulations in Finance, introduces the reader to the concept of Monte Carlo simulations and how to use them for simulating stock prices, the valuation of European/American options, and for calculating the VaR.
Chapter 7, Asset Allocation in Python, introduces the concept of Modern Portfolio Theory and shows how to obtain the Efficient Frontier in Python. Then, I look at how to identify specific portfolios, such as the minimum variance or the maximum Sharpe ratio. I also show how to evaluate the performance of such portfolios.
Chapter 8, Identifying Credit Default with Machine Learning, presents a case of using machine learning for predicting credit default. The chapter presents the complete pipeline from loading the data, through various preprocessing stages to estimating the classifier.
Chapter 9, Advanced Machine Learning Models in Finance, introduces a selection of advanced classifiers (including stacking multiple models). Additionally, I look at how to deal with class imbalance, use Bayesian optimization for hyperparameter tuning, and retrieve feature importance from a model.
Chapter 10, Deep Learning in Finance, demonstrates how to use deep learning techniques for working with time series and tabular data. The networks are trained using PyTorch (with possible GPU acceleration).
Conclusions
Summing up, I am very happy to have undertaken writing the book and I feel proud to have completed it. It required a significant amount of effort and self-determination. I had moments of doubt during the process, but the words of support from my close ones helped me get through with it.
I hope that the book will be helpful for people wanting to learn how to use Python for solving practical tasks in the financial domain. If you are interested in purchasing the book, you can get it on Amazon or Packt’s website.
If you have any questions regarding the process of writing the book or some feedback on the book itself, I would be happy to read it in the comments. You can also reach out to me on Twitter.
Until next time!
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS