![]() |
VOOZH | about |
Word embeddings enable models to interpret text by converting words into numerical vectors. Traditional methods like Word2Vec and GloVe generate fixed embeddings, assigning the same vector to a word regardless of its context.
ELMo (Embeddings from Language Models) addresses this limitation by producing contextualized embeddings that vary based on surrounding words. This approach allows models to better capture the meaning of words in different contexts, improving performance in tasks like sentiment analysis, named entity recognition and question answering.
ELMo (Embeddings from Language Models) generates word vectors by considering the entire sentence. Unlike traditional methods, ELMo derives word meanings from the internal states of a deep bi-directional LSTM network trained as a language model. Its Key characteristics are:
A bidirectional language model (biLM) is trained on a large text corpus. The model uses two separate LSTMs:
For each word, the model captures contextual information from both directions. The hidden states from the forward and backward LSTMs are summed up to form a contextualized embedding. These embeddings vary depending on the wordβs role in the sentence. ELMo also combines outputs from multiple LSTM layers, capturing both syntactic and semantic patterns.
Once trained, the biLM is used to generate embeddings for specific NLP tasks.
This phase allows ELMo to be applied to tasks like named entity recognition, sentiment analysis and text classification where understanding context is crucial.
Consider the word "bank" in two different contexts:
Static embeddings would assign the same vector to both, failing to capture the difference. ELMo generates context-dependent vectors which correctly differentiates between these meanings. It adapts based on sentence-level context, providing more accurate representations.
We can implement ELMo embeddings using TensorFlow and TensorFlow Hub. Here is a step-by-step guide with explanations at each stage.
Tensorflowis used for building and running deep learning models and tensorflow_hub allows us to load pretrained models such as ELMo. You can install it using:
pip install tensorflow tensorflow_hub
We define a function that:
Output:
We can see that our model is working fine.
ELMo significantly improves performance across a variety of NLP tasks:
| Feature | Word2Vec / GloVe | ELMo | BERT / RoBERTa |
|---|---|---|---|
| Contextual | Static word representation | Contextualized based on sentence context | Contextualized based on full input |
| Architecture | Shallow neural networks | Bidirectional LSTM language model | Transformer-based |
| Training Objective | Word co-occurrence prediction | Forward and backward language modeling | Masked language modeling |
| Model Complexity | Low | Moderate | High |
| Fine-tuning | Not designed for fine-tuning | Supports task-specific fine-tuning | Designed for fine-tuning |
ELMo introduced the idea of context in word meanings and still influences modern NLP although it has been surpassed by transformer-based models like BERT and RoBERTa in recent years.