![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Sometimes developers need to make a choice when it comes to LLM retrieval approaches. They can use a traditional sparse embedding or a dense embedding. Sparse embeddings work really well for keyword-matching processes. We typically find sparse embeddings in natural language processing (NLP), and these high-dimensional embeddings often contain zero values. The dimensions in these embeddings represent tokens across one (or multiple) language(s). It uses non-zero values to show how relevant each token is to a specific document.
Dense embeddings, on the other hand, are lower-dimensional but they don’t contain any zero values. As the name suggests, dense embeddings are jam-packed with information. This makes dense embeddings ideal for semantic search tasks, making it easier to match the “spirit” of meaning instead of the exact string.
BGE-M3 is a machine learning model used to create an advanced type of embedding called a “learned sparse embedding.” The nice thing about these learned embeddings is that they combine the best of both worlds: the precision of sparse embedding and the semantic richness of dense embeddings. This model uses the tokens in a sparse embedding to learn which other tokens may be relevant or related, even if they’re not explicitly used in the original search string. Ultimately, this yields an embedding that is rich with relevant information.
Bidirectional Encoder Representations from Transformers (or BERT) is more than meets the eye. It is the underlying architecture that enables advanced machine learning models like BGE-M3 and SPLADE.
BERT approaches text differently than traditional models. Instead of just reading a text string sequentially, it examines everything all at once, taking the relationship between all the components into account. BERT does this with a two-pronged approach. These are separate pre-training tasks that the model implements, but their outputs work together to enrich the meaning of inputs.
When the BERT model analyzes a query, each layer of the encoder conducts its analysis independently of the other layers. This allows each layer to generate unique results, free from the influence of the other encoders. The output of this is a richer, more robust data set.
It’s important to understand BERT functions because BGE-M3 is based on BERT. The following example demonstrates how BERT works.
Let’s take a basic query and see how BERT creates an embedding from it:
Milvus is a vector database built for scalable similarity search.
The first step is to convert the words in the query string to tokens.
👁 Converting words in a query string to tokens.
You’ll notice that the model added [CLS] to the beginning and [SEP] to the end of the token. These components simply indicate the beginning and end of a sentence, respectively, at the sentence level.
Next, it needs to convert the tokens into an embedding.
👁 Diagram of converting tokens into an embedding.
The first part of this process is the embedding. Here, an embedding matrix converts each token into a vector. Next, BERT adds positional embeddings because the order of the words matters and this embedding keeps those relative positions intact. Finally, the segment embedding simply tracks the breaks between sentences.
We can see the embedding output at this point is monochromatic to represent sparse embeddings. To achieve greater density, these embeddings go through multiple encoders. Like the pre-training tasks identified above that work independently of each other, these encoders do the same. The embeddings undergo continual revision as they work through the encoders. The tokens in the sequence provide a critical context for refining the representation generated by each encoder.
Once this process finishes, the final output is a denser embedding than the pre-encoder output. This is especially true when using individual tokens for further processing or tasks that result in a singular, dense representation.
BERT got us dense embeddings, but the goal here is to generate learned sparse embeddings. So now we finally get to the BGE-M3 model.
BGE-M3 is basically an advanced machine learning model that takes BERT further by focusing on enhancing text representation through multifunctionality, multi-linguisticity and multi-granularity. All this is to say that it does more than create dense embeddings by generating learned sparse embeddings that provide the best of both worlds: word meaning and precise word choices.
Let’s start with the same query we used to understand BERT. Running the query generates the same sequence of contextualized embeddings that we saw above. We can call this output ( Q ).
The BGE-M3 model goes deeper into these embeddings and attempts to understand the significance of each token on a more granular level. There are several aspects of this.
Q[0] as the only possible representation. It also evaluates the contextualized embedding of each token Q[i] within the sequence.W_{lex}.W_{lex} and Q[i] to compute the term weight w_{t} for each token. Using ReLU ensures that the term weight is non-negative, contributing to the sparsity of the embedding.Applying the BGE-M3 model to real-world use cases can help demonstrate the value of this machine learning model. These are areas where organizations stand to benefit from the model’s ability to understand linguistic nuances across large quantities of textual data.
You can use BGE-M3 to power chatbots and virtual assistants, significantly enhancing customer support services. These chatbots can handle a wide range of customer queries, providing instant responses and understanding complex questions and contextual information. They can also learn from interactions to improve over time.
Benefits:
You can leverage BGE-M3 to generate high-quality content for blogs, social media, advertisements and more. It can create articles, social media posts and even full-length reports based on desired tone, style and context. You can also use this model to summarize long documents, create abstracts and generate product descriptions.
Benefits:
Developers in the healthcare sector can use BGE-M3 to analyze clinical documents and patient records, extract relevant information and assist in generating comprehensive medical reports. It can also aid in identifying trends and insights from vast amounts of medical data, supporting better patient care and research.
Benefits:
The BGE-M3 model provides a significant degree of versatility and advanced natural language processing capabilities that have applications across industries and sectors and can provide significant improvements in operational efficiency and service quality.