![]() |
VOOZH | about |
In the real world, Networks are just the collection of interconnected nodes. To represent this type of network we need a data structure that is similar to it. Fortunately, we have a data structure that is the graph.
The graph contains vertices (which represents the node in the network) that are connected by edges (which can represent interconnection b/w nodes)
The deep walk is an algorithm proposed for learning latent representations of vertices in a network. These latent representations are used to represent the social representation b/w two graphs. It uses a randomized path traversing technique to provide insights into localized structures within networks. It does so by utilizing these random paths as sequences, that are then used to train a Skip-Gram Language Model.
Skip-Gram Model is used to predict the next word in the sentence by maximizing the co-occurrence probability among the words that appear within a window, w, in a sentence. For our implementation, we will use the Word2Vec implementation which uses the cosine distance to calculate the probability.
Deepwalk process operates in few steps:
Given an undirected graph G = (V, E), with n =| V | and m =| E |, a natural random walk is a stochastic process that starts from a given vertex, and then selects one of its neighbors uniformly at random to visit.
Word Embeddings is a way to map words into a feature vector of a fixed size to make the processing of words easier. In 2013, Google proposed word2vec, a group of related models that are used to produce word embeddings. In the skip-gram architecture of word2vec, the input is the center word and the predictions are the context words. Consider an array of words W, if W(i) is the input (center word), then W(i-2), W(i-1), W(i+1), and W(i+2) are the context words if the sliding window size is 2.
Below is the template architecture for the skip-gram model:
Output:
node_1 node_2 0 0 1972 1 0 5111 2 0 138 3 0 3053 4 0 1473 ------------- # length of nodes 5908 ------------- # embedding shape (5908, 64)
References: