Turn VS Code into a One-Stop Shop for ML Experiments

How to run and evaluate experiments without leaving your IDE

Nov 21, 2022

17 min read

Image generated using Midjourney

One of the biggest threats to productivity in recent times is context switching. It is a term originating from computer science but applied to humans it refers to the process of stopping work on one thing, performing a different task, and then picking back up the initial task.

During a work day, you might want to check something on Stack Overflow, for example, which normalization technique to choose for your project. While doing so, you start exploring the documentation of scikit-learn to see which approaches are already implemented and how they compare against each other. This might lead to you some interesting comparison articles on Medium or video tutorials on YouTube. And you know how it goes from there…

While working with data, a potential threat appears anytime we need to leave our IDE and move to the browser. It might be to look up a tricky programming problem we are working on or see the results of our experiments using a platform of our choice. Wouldn’t it be nice to not only code our experiments in the IDE but also evaluate and compare them there as well?

In this article, I will show how to utilize DVC’s new extension to run and evaluate experiments directly from VS Code.

DVC extension for VS Code

You have probably heard about DVC (data version control) and you know that it is a Git-like system used for versioning your data and models. Essentially, it allows us to track data with Git without actually storing the data in the Git repository.

What you might have not heard about is that DVC also facilitates running ML experiments. In our projects, we might run dozens if not hundreds of experiments, each one slightly different from the previous ones. With the rapid growth of the number of experiments, DVC can track those and version them. Then, it also allows us to compare their most relevant dependencies (data, scripts, etc.), parameters, and metrics. Once we are happy with the outcome of the experiments, we can commit only the relevant ones to Git.

Recently, iterative.ai (the company behind DVC) has also released an open-source VS Code extension that brings a lot of very useful functionalities to one of the most popular IDEs out there. I will mention the two that stand out most for me.

First, the extension offers experiment bookkeeping with an emphasis on reproducibility. We can quickly run new experiments, track their versions (code, models, parameters, data, etc.), and compare their results in a comprehensive table. Then, with a click of a button, we can switch our codebase and artifacts to any of the experiments we have run. You can see a quick preview of the evaluation table in the following GIF.

👁 Image

Second, the extension has extended plotting functionalities. Using DVC, we can track, visualize and evaluate the performance of our experiments using interactive plots. For example, we can show two ROC curves or confusion matrices side by side to inspect which experiment achieved better performance. And to make it even better, the extension also offers live plotting of certain metrics. For example, we can visualize the model’s current loss or validation performance over epochs/estimators while it is still being trained.

In practice, the DVC extension extends the functionalities of some of the tabs, adds a brand new tab used for running and evaluating experiments, and registers new commands in the commands palette (on macOS, accessed with cmd + shift + p). I highly recommend using the DVC: Get Started command to inspect all of the extension’s functionalities.

Lastly, the DVC extension is currently in the beta stage, but we can already use it to improve our workflow.

Hands-on example

Problem definition

In this toy example, we attempt to identify fraudulent credit card transactions. The dataset (available on Kaggle) can be considered highly imbalanced, with only 0.17% of the observations belonging to the positive class.

We will use Random Forest and LightGBM classifiers and evaluate their performance using recall, precision, and the F1 Score.

The following diagram represents the simplified steps of the experimentation workflow using DVC. First, we set up the experiment by adjusting the scripts (for example, different ways of splitting data, selecting different preprocessing steps, tracking different metrics, etc.) or the considered parameters (this includes choosing the model, its hyperparameters, etc.). Then, we run the project’s pipeline, that is, all the steps that are between the input data and the final output. Lastly, we evaluate the tracked metrics and inspect potential plots. If we are happy with the outcome, we can stop at this step, commit the changes to Git/DVC and focus on putting everything into production. If we want to continue experimenting, we go back to the first step and iterate.

👁 Image

Below, we present the structure of this project’s codebase:

📦 vscode_exp_tracking_with_dvc
┣ 📂 .dvc
┣ 📂 data
┃ ┣ 📂 processed
┃ ┃ ┣ 📜 X_test.csv
┃ ┃ ┣ 📜 X_train.csv
┃ ┃ ┣ 📜 y_test.csv
┃ ┃ ┗ 📜 y_train.csv
┃ ┣ 📂 raw
┃ ┃ ┗ 📜 creditcard.csv
┃ ┣ 📜 .gitignore
┃ ┗ 📜 raw.dvc
┣ 📂 models
┃ ┗ 📜 model.joblib
┣ 📂 src
┃ ┣ 📂 stages
┃ ┃ ┣ 📜 data_split.py
┃ ┃ ┣ 📜 eval.py
┃ ┃ ┗ 📜 train.py
┃ ┗ 📂 utils
┃ ┃ ┣ 📜 __init__.py
┃ ┃ ┗ 📜 params.py
┣ 📜 .dvcignore
┣ 📜 .gitignore
┣ 📜 README.md
┣ 📜 dvc.lock
┣ 📜 dvc.yaml
┣ 📜 metrics.json
┣ 📜 params.yaml
┗ 📜 requirements.txt

The codebase might look a bit daunting, but it actually represents the state of the project at its end. I believe it is beneficial to spend a minute inspecting it, as it shows a potential setup that makes it easy to experiment with different models/parameters.

In order to show the entire experimentation workflow, I will take you through all the steps I had to follow in order to arrive at this final project’s structure. This way, it will be easier for you to reproduce the workflow for your own projects.

To follow along, I would suggest creating a new virtual environment, installing the contents of requirements.txt, and downloading the DVC VS Code extension.

Setting up DVC

After setting up the project’s directory, we need to initialize DVC tracking. Please bear in mind that this step is not required if we are cloning a project that already has DVC initialized in it. For the sake of this tutorial, we assume that we are starting the project from scratch. To do so, we run the following command in the terminal:

dvc init

Running this command creates 3 files: .dvc/.gitignore, .dvc/config, and .dvcignore. Then, we then need to commit those files using Git and push them to our repository.

We also need to indicate a remote for DVC, that is, the place where the objects (data, models, etc.) will actually be stored and versioned. The easiest solution is to store them locally. We can set it up using the following commands:

mkdir -p /tmp/dvc-storage
dvc remote add local /tmp/dvc-storage
dvc push -r local

Naturally, we can also use cloud solutions as the remote (Amazon S3, Azure Blob Storage, Google Cloud Storage, DagsHub, etc.).

Tracking data and setting up a DVC pipeline

As the next step, we download the data from Kaggle and place the CSV file in the data/raw directory. As the name suggests, this is the raw data file. All the transformations we apply to it (for example, preprocessing or splitting) will be executed by separate Python scripts and the changes to the dataset will be tracked using DVC.

Then, we start tracking the data. To do so, we run the following command:

dvc add data/raw

Executing the command creates a small text file called raw.dvc that stores information on how to access the data. As the text file itself is small, we version it with Git instead of the original data file (which is versioned by DVC). After versioning the raw.dvc file with Git, we run one more command to push the data to the local remote:

dvc push -r local

Now our raw data is tracked using DVC. We can proceed to create the DVC pipeline, which stores information about all the steps that are executed within our project, including their respective dependencies and outputs.

There are different possible ways of creating a DVC pipeline. One of them would be to use a collection of dvc run commands (executed in the terminal) and specify everything as parameters of those commands. I have used that approach for another project, which you can read more about here. Alternatively, we can directly create the dvc.yaml file containing the very same information. For this project, I chose the second approach. Personally, I find it a bit easier to create the YAML file directly instead of writing very long and parametrized commands using CLI.

We are almost at the experimentation stage! Just a few more things need clarification. Our pipeline consists of 3 steps:

data_split – this step consumes the raw data (located in data/raw directory) and splits it into training and test sets. Those sets are stored as CSV files in the data/processed directory.
train – trains the ML model of our choice and stores it as a .joblib file in the models directory.
eval – evaluates the model’s performance on the test set and outputs the tracked metrics to the metrics.json file.

Our project’s pipeline looks as follows.

stages:
 data_split: 
 cmd: python src/stages/data_split.py --config=params.yaml
 deps:
 - src/stages/data_split.py
 - data/raw
 params:
 - base
 - data_split
 outs:
 - data/processed
 train:
 cmd: python src/stages/train.py --config=params.yaml
 deps:
 - src/stages/train.py
 - data/processed
 params:
 - base
 - train
 outs:
 - models/model.joblib
 eval:
 cmd: python src/stages/eval.py --config=params.yaml
 deps:
 - src/stages/eval.py
 - data/processed
 - models/model.joblib
 params:
 - base
 - data_split
 - train
 metrics:
 - metrics.json:
 cache: false

After creating the file, we run the dvc repro command to reproduce the entire pipeline. As we have not run it before, all 3 steps will be executed in a sequence, while storing and versioning their outputs.

All of the steps are configurable, that is, they have their dedicated sets of parameters that we can tweak while running experiments. For the training stage, those are quite intuitive as they are simply the hyperparameters of the models. For the splitting stage, it can be the fraction of the dataset that we want to use for the test set.

The parameters are stored in the params.yaml file and we use a combination of argparse and python-box libraries to load them into our scripts. For brevity’s sake, we do not cover those parts here, but I highly recommend inspecting the .py scripts in this project to see how everything is set up.

Our initial params.yaml file contains the following settings:

base:
 project: credit_card_fraud_detection
 data_path: data/raw
 data_file: creditcard.csv
 data_processed_path: data/processed
 exclude_cols:
 - Time
 target_col: Class
 random_state: 42

data_split:
 test_size: 0.2
 processed_path: data/processed

train:
 model_type: rf
 model_dir: models
 model_path: models/model.joblib
 params:
 n_estimators: 100
 max_depth: 10
 class_weight: null

Most of the settings can be quite straightforward. However, it is worth mentioning that the null value in the YAML file (used for the class_weight hyperparameter) will be interpreted as None in Python once the parameters are loaded.

Experimenting with DVC

Before diving into the experiments, I believe it is worth clarifying that running and evaluating experiments is not a new functionality of the VS Code extension. We could have used the command line to carry out almost all of the tasks available via the extension. What the extension does is it provides a nice and interactive GUI that we can use to improve our workflow. In the end, both DVC and the VS Code extension read experiments’ data files such as metrics.json or params.yaml.

First, let’s explore the new DVC tab. We can access it using the icon on the navigation bar on the left-hand side of the IDE. We start by listing the tab’s panels:

COLUMNS – in this panel, we can select which columns (representing metrics, parameters, versions of data/models, etc.) we want to see in the experiment evaluation table.
EXPERIMENTS – contains the list of executed experiments.
SORT BY – we can use this panel to configure how the experiments table is sorted.
FILTER BY – we can use this panel to view/add/remove metrics and parameter filters, which are applied to the evaluation table.
PLOTS – contains utilities to fine-tune the plotting dashboard.

For this tutorial, only the first two will be relevant. You can see their preview in the following image.

👁 Image

We have additionally marked two handy buttons, which display the experiments evaluation table (1) and the plots (2). In this article, we only focus on the table, which we present in the following screenshot. By the way, we can also inspect the table in the terminal by running dvc exp show.

👁 Image

First, we explain the experiment column on the left-hand side. I have already run some experiments, so the table is populated. By default, the table contains the workspace (current codebase), the current Git branch (in this case, main), and the previous 2 commits to that branch. Each of those can have arbitrarily many experiments, which have their respective names and IDs. For the experiment to be registered in the table, we simply need to run it using DVC (either from the extension or command line), we do not need to commit those experiments to Git.

Using the default settings, the table is quite massive as it contains:

experiments’ IDs and their run date,
all the requested metrics (from the metrics.json file),
all the parameters (from the params.yaml file),
the version of the files in the src dir, the data directories and the stored models.

As that is quite a handful and the table is not that easy to read, we use the COLUMNS panel to hide the columns that we know we will not be changing during this tutorial. As a result, the smaller table is much easier to read. You can see it below.

👁 Image

To run a new experiment, we right-click on the stage that we want to modify. In this case, we want to create a new experiment from the current workspace. The following image illustrates.

👁 Image

We select the first option – Modify and Run. Clicking on it opens a pop-up window, in which we can select which parameters from the params.yaml file we want to modify. For this simple experiment, we just change the classifier from Random Forest to LightGBM.

👁 Image

After selecting the parameter and clicking OK (or pressing Enter), another pop-up follows. There, we have to input the new value of the parameter.

👁 Image

Pressing Enter will accept the new value and run the experiment using the new set of parameters. Some additional things worth mentioning:

In case we selected multiple parameters to modify, we would have to provide the new values in a sequence, each in a separate dialog window. As the name of the parameter we are currently modifying is visible at the top, it is quite intuitive and straightforward.
Adjusting those parameters via the DVC extension modifies the params.yaml file with the new parameter values.
We could have achieved the very same result by running the following command in the terminal: dvc exp run --set-param train.model_type=lgbm.
Running the dvc exp run command without any arguments will result in running the experiment with the default settings, that is, the current contents of the params.yaml file.
The new model is versioned separately and we can see its ID in the last column.

Below, we can see the results of our experiment. We can see the results in two places – in the workspace row and the experiment displayed under the main branch (as we were operating from that branch).

👁 Image

Based on the table, we can arrive at a few conclusions, for example:

the LightGBM model seems to be severely underperforming and requires more tuning,
various Random Forest models achieve a precision of 100% on the test set,
the most balanced model, that is, the one with the highest F1 Score is the Random Forest model with the balanced class weights.

We will not dive deeper into evaluating the performance of the model, as that is not the goal of this article. Instead, we mention one more handy feature of the extension. By right-clicking on any of the past experiments, we can see the following options:

👁 Image

Using those options, we can easily:

Apply all the experiment’s changes to our current workspace.
Create a new branch using the given experiment’s settings.
Run a new experiment by adjusting the parameters of any of the previously executed experiments.
Remove the experiments from the table. The experiments created with DVC are ephemeral, that is, they are only stored in DVC and not on Git (unless we commit those).

That would conclude the most basic scenario of using the new VS Code extension to create and evaluate experiments using DVC.

Additional functionalities

So far, we have covered the basic workflow of using the DVC extension for VS Code. However, the extension (and DVC itself) provides many other functionalities that you might find interesting for your projects:

The experiments are smartly cached, rerunning an experiment with the exact same parameters will simply result in the loading of the cached results.
The extension has quite a lot of nice functionalities around plotting. First, it can store plots that are the outputs of particular stages of the pipeline, for example, a confusion matrix heatmap or the ROC curve. Second, it has live-plotting functionalities which allow us to see, for example, the training and validation loss of our model while it is still being trained.
We can create queues of experiments that will run in sequence. That feature is especially useful if the experiments take quite a bit of time to run and we do not want to constantly monitor if the first experiment is done before we run the next one.
Using simple scripts we can create loops over dvc exp run command to search for the best combination of hyperparameters using an exhaustive or random grid search. Alternatively, we can use the CLI for that as well by combining the --set-param and --queue arguments of the dvc exp run command. You can find more information about the grid search implementation in this article.
With the SORT BY and FILTER BY panels we can modify the experiments table to make the evaluation easier and more intuitive.
We are not restricted to using the extension locally. We can launch VS Code on a virtual machine or connect our local IDE to any of the cloud environments (GitHub Codespaces, Google Colab, etc.).

Takeaways

In this article, we presented how to integrate the new DVC VS Code extension into our workflows. Using the extension, we can easily run and evaluate experiments without leaving the IDE. This way, we can increase our productivity by potentially avoiding issues related to context switching.

As always, any constructive feedback is more than welcome. You can reach out to me on Twitter or in the comments. You can find all the code used for this article in this repository.

Liked the article? Become a Medium member to continue learning by reading without limits. If you use this link to become a member, you will support me at no extra cost to you. Thanks in advance and see you around!

You might also be interested in one of the following:

Dealing with Outliers Using Three Robust Linear Regression Models

Investigating the effects of resampling imbalanced datasets with data validation techniques

Estimating the Performance of an ML Model in the Absence of Ground Truth

References

Mark, G., Gudith, D., & Klocke, U. (2008, April). The cost of interrupted work: more speed and stress. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (pp. 107–110).
https://github.com/iterative/vscode-dvc
https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
https://dvc.org/doc

All images, unless noted otherwise, are by the author.

Written By

Eryk Lewinson

See all from Eryk Lewinson

Data Science, Education, Machine Learning, Python, Vscode

Share This Article

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

URL: https://towardsdatascience.com/turn-vs-code-into-a-one-stop-shop-for-ml-experiments-49c97c47db27/