Random Walks with Restart Explained

Understand the Random Walk with Restart algorithm and its associated implementation in Python

Feb 1, 2021

4 min read

Image from Unplash

The scope of this article is to explain and focus around the conceptual understanding behind the random walk with restart algorithm. A strong mathematical understanding will not be provided here, but I have left links to resources where those interested can investigate further into the mathematics behind this algorithm. I will also provide a documented Python script at the end of the article associated with the implementation of this algorithm.

Table of Content

Introduction
What is a Random Walk?
Random Walk with Restart
Advantages
Disadvantages
Python Implementation
Resources

Introduction

This algorithm is highly applicable in research as well as the industry. If you’re working with time series data, or able to structure your data as a network then a random walk / random walk with reset can be implemented on it. In finance, it is often argued that the changes in stock prices follow a random walk. This algorithm works well with other stochastic processes like Markov Chains and Monte Carlo Simulations. This simple to understand algorithm was first formally introduced by Karl Pearson, a mathematician who dedicated the majority of his life to building a strong foundation around mathematical statistics.

In our examples below we will be running a random walk on either of the following things. 1) A connected graph G with vertices V and edges E connecting multiple nodes to each other OR 2) A NxN matrix Note : A matrix can be converted into a graph and vice versa

Common problems where this algorithm is applied are :

Recommendation Systems
Community Detection
Clustering
Image Segmentation

What is a Random Walk?

A random walk is a type of stochastic process. The simplest explanation of a random walk would be through walking. Imagine that each step you take is determined probabilistically. This implies that at each index of time, you have moved in a certain direction based on a probabilistic outcome. This algorithm explores the relationship to each step that you would take and its distance from the initial starting point.

👁 Figure 1 : 2D Random walk diagram (Image provided by Author)

Figure 1 : 2D Random walk diagram (Image provided by Author)

If you started at the origin (large black circle in the image above) with an equal probability of moving in any direction at each time step. The image above outlines 3 possible walks after 4 time steps, each with a different ending point from the origin. If this process was simulated a large number of times with a large enough time step, then you can create a probability distribution associated with the outcome of each random walk ending at a specific location.

In this 2-dimensional example, after many iterations of a random walk, the distribution will look like a 3D bell curve (as shown in figure 2). This can be replicated in N dimensions.

👁 Figure 2 : Probability distribution of random walks on 2D plane (Image provided by author)

Figure 2 : Probability distribution of random walks on 2D plane (Image provided by author)

Given this probability distribution, it can be represented as the closeness between a pair of positions.

Random Walk with Restart

Random walk with restart is exactly as a random walk but with one extra component to it. This component can be denoted as the restart probability. Essentially indicating that for every step taken in any direction there is a probability associated with going back to the initial starting position, the origin. In our example above of randomly moving in any direction, imagine now that there is a chance that you would instantly teleport back to the origin after every step based on this restart probability.

This article has an excellent explanation of the mathematics behind the random walk with restart algorithm.

Advantages

Provides a strong representation of similarity between two nodes in a weighted & unweighted graph
Has a variety of applications and can be used be used in joint with other stochastic models like Markov Chains

Disadvantages

Using the same restart probability for every node
There is no systematic way of choosing the restart probability for each application

Implementation

Here I’ve implemented the random walk with restart algorithm on a given network G. Note, that there are many variations of this algorithm and can be substantially improved depending on the problem you’re trying to solve. The resulting output of this script can be used on other ML algorithms and models like kNN and k means.

A very simple, 1 line implementation of this algorithm can be through a library called networkX. They have a built-in function called PageRank which is essentially a random walk with restart. Below I’ve included a simple example from the networkX documentation for PageRank.

import networkx as nx

G = nx.DiGraph(nx.path_graph(4)) # create the network
pr = nx.pagerank(G, alpha=0.9) # apply PageRank

Resources

If you liked this article, then you might also like my other articles where I’ve explained things like the Monte Carlo method and Markov Chains.

Node2Vec Explained

Link Prediction Recommendation Engines with Node2Vec

Mining & Modelling Character Networks – Part II

Bayesian A/B Testing Explained

Markov Chain Explained

Monte Carlo Method Explained

Written By

Vatsal

See all from Vatsal

Machine Learning, Mathematics, Programming, Python, Towards Data Science

Share This Article

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

URL: https://towardsdatascience.com/random-walks-with-restart-explained-77c3fe216bca/