![]() |
VOOZH | about |
dotnet add package ElBruno.BM25 --version 0.5.0
NuGet\Install-Package ElBruno.BM25 -Version 0.5.0
<PackageReference Include="ElBruno.BM25" Version="0.5.0" />
<PackageVersion Include="ElBruno.BM25" Version="0.5.0" />Directory.Packages.props
<PackageReference Include="ElBruno.BM25" />Project file
paket add ElBruno.BM25 --version 0.5.0
#r "nuget: ElBruno.BM25, 0.5.0"
#:package ElBruno.BM25@0.5.0
#addin nuget:?package=ElBruno.BM25&version=0.5.0Install as a Cake Addin
#tool nuget:?package=ElBruno.BM25&version=0.5.0Install as a Cake Tool
๐ License
๐ NuGet
๐ .NET
Production-ready BM25 full-text search library with zero external dependencies. Index millions of documents, search in milliseconds, and integrate seamlessly into RAG pipelines, knowledge bases, and hybrid search systems.
dotnet add package ElBruno.BM25
using ElBruno.BM25;
// 1. Prepare your documents
var documents = new[]
{
new { Id = 1, Title = "Machine Learning Basics", Content = "Learn ML fundamentals" },
new { Id = 2, Title = "Deep Learning Guide", Content = "Neural networks and deep learning" },
new { Id = 3, Title = "NLP Fundamentals", Content = "Natural language processing basics" }
};
// 2. Create an index
var index = new Bm25Index<dynamic>(
documents,
doc => doc.Content // Extract searchable text
);
// 3. Search
var results = index.Search("learning", topK: 10);
// 4. Display results
foreach (var (doc, score) in results)
{
Console.WriteLine($"{doc.Title}: {score:F2}");
}
Output:
Machine Learning Basics: 2.45
Deep Learning Guide: 1.89
NLP Fundamentals: 0.56
var index = new Bm25Index<Article>(
articles,
article => article.Content
);
var results = index.Search("machine learning", topK: 5);
using ElBruno.BM25.Tokenizers;
var index = new Bm25Index<Article>(
articles,
article => article.Content,
tokenizer: new EnglishTokenizer() // Stems: "running" โ "run", "authentication" โ "authent"
);
var customTokenizer = new CustomTokenizer(text =>
{
// Your domain-specific logic here
return text.ToLower().Split(' ').ToList();
});
var index = new Bm25Index<Article>(articles, a => a.Content, customTokenizer);
var tuner = new Bm25Tuner<Article>(index);
var validationQueries = new List<(string query, List<Article> relevant)>
{
("machine learning", relevantArticles1),
("neural networks", relevantArticles2)
};
var optimizedParams = await tuner.TuneAsync(validationQueries, TuningMetric.F1);
index.Parameters = optimizedParams;
var query = "machine learning";
var doc = articles[0];
// Simple explanation as dictionary
var explanation = index.ExplainScore(doc, query);
Console.WriteLine($"Total Score: {explanation["total_score"]}");
// Detailed breakdown
var detailed = index.ExplainScoreDetailed(doc, query);
Console.WriteLine($"Matched Terms: {detailed.MatchedTermCount}");
foreach (var term in detailed.TermScores)
{
Console.WriteLine($" {term.Key}: IDF={detailed.TermIDFs[term.Key]:F2}, Score={term.Value:F2}");
}
var queries = new[] { "machine learning", "neural networks", "NLP" };
var batchResults = await index.SearchBatch(queries, topK: 5);
foreach (var (query, results) in batchResults)
{
Console.WriteLine($"\nQuery: {query}");
foreach (var (doc, score) in results)
{
Console.WriteLine($" {doc.Title}: {score:F2}");
}
}
// Save index to disk
index.SaveIndex("my_index.json");
// Load it back later
var restoredIndex = Bm25Index<Article>.LoadIndex("my_index.json");
var results = restoredIndex.Search("machine learning");
var index = new Bm25Index<Article>(articles, a => a.Content);
// Add new document
var newArticle = new Article { Title = "New ML Article", Content = "..." };
index.AddDocument(newArticle);
// Remove document
index.RemoveDocument(oldArticle);
// Reindex entire collection
index.Reindex(updatedArticles);
| Operation | Dataset | Time | Notes |
|---|---|---|---|
| Index | 1M documents | <5s | Tokenization + inverted index |
| Search | 1M documents | <50ms | Single query, topK=10 |
| Batch Search | 1M documents, 100 queries | <5s | 50ms per query average |
| Save to Disk | 1M documents | <1s | JSON format, ~500MB |
| Load from Disk | 1M documents | <2s | Cold start |
Memory Usage:
ElBruno.BM25 implements the BM25F (Best Matching 25 with Fields) formula, a proven ranking function in information retrieval.
Score Formula:
BM25(q,d) = ฮฃ IDF(q_i) * ((k1 + 1) * TF(q_i,d)) / (TF(q_i,d) + k1(1 - b + b * |d|/avgdl))
Parameters:
Preset Parameters:
Bm25Parameters.Default โ Balanced (k1=1.5, b=0.75)Bm25Parameters.Aggressive โ For large corpora (k1=2.0, b=1.0)Bm25Parameters.Conservative โ For small corpora (k1=1.0, b=0.5)Constructor:
new Bm25Index<T>(
IEnumerable<T> documents,
Func<T, string> contentSelector,
ITokenizer? tokenizer = null, // Defaults to SimpleTokenizer
Bm25Parameters? parameters = null, // Defaults to Default
bool caseInsensitive = true
)
Key Methods:
| Method | Description |
|---|---|
Search(query, topK=10, threshold=0) |
Search and return top results |
SearchBatch(queries, topK=10) |
Async batch search multiple queries |
AddDocument(doc) |
Add single document to index |
RemoveDocument(doc) |
Remove document from index |
Reindex(documents) |
Replace entire index |
SaveIndex(path) |
Persist to disk (JSON) |
LoadIndex(path) |
Load from disk (static) |
ExplainScore(doc, query) |
Get score breakdown dictionary |
ExplainScoreDetailed(doc, query) |
Get detailed ScoreExplanation object |
GetTerms() |
List all indexed terms |
GetTermDocuments(term) |
Find all docs containing term |
GetDocumentLength(doc) |
Get token count for document |
GetStatistics() |
Index metadata and stats |
Properties:
DocumentCount โ Number of indexed documentsTermCount โ Number of unique termsParameters โ Get/set BM25 parametersITokenizer Interface:
public interface ITokenizer
{
List<string> Tokenize(string text); // Convert text to terms
string Normalize(string term); // Normalize single term
string Name { get; } // Tokenizer name
}
Built-in Tokenizers:
SimpleTokenizer โ Whitespace split, lowercase, no stemmingEnglishTokenizer โ Includes Porter stemming for EnglishCustomTokenizer โ User-defined functionvar tuner = new Bm25Tuner<T>(index);
var optimized = await tuner.TuneAsync(
validationQueries, // (query, relevantDocs) tuples
metric: TuningMetric.F1, // Metric to optimize
ct: cancellationToken
);
TuningMetric Options:
Precision โ % of retrieved docs that are relevantRecall โ % of relevant docs that are retrievedF1 โ Harmonic mean (recommended for balanced tuning)NDCG โ Ranking quality// 1. Index knowledge base
var kb = LoadKnowledgeBase();
var index = new Bm25Index<KbArticle>(kb, a => a.Content, new EnglishTokenizer());
// 2. Retrieve context for LLM
var query = userQuestion;
var context = index.Search(query, topK: 5)
.Select(r => r.document.Content)
.ToList();
// 3. Pass to LLM
var llmPrompt = $"Context:\n{string.Join("\n", context)}\n\nQuestion: {query}";
var response = await llm.GenerateAsync(llmPrompt);
// BM25 retrieval
var bm25Results = index.Search(query, topK: 20);
// Vector search (your embedding model)
var vectorResults = await vectorStore.SearchAsync(embedding, topK: 20);
// Hybrid ranking (combine scores)
var hybrid = bm25Results
.Union(vectorResults)
.GroupBy(r => r.id)
.Select(g => new {
doc = g.Key,
score = g.Sum(x => x.score) // Combine scores
})
.OrderByDescending(x => x.score)
.Take(10);
public class KnowledgeBaseSearch
{
private readonly Bm25Index<Article> _index;
public KnowledgeBaseSearch(List<Article> articles)
{
_index = new Bm25Index<Article>(
articles,
a => $"{a.Title} {a.Content}",
new EnglishTokenizer()
);
}
public List<Article> Find(string query, int limit = 5)
{
return _index.Search(query, topK: limit)
.Select(r => r.document)
.ToList();
}
}
Empty search results?
ExplainScore() to debug scoringSlow search on large indexes?
topK threshold (retrieve more before filtering)SearchBatch() for multiple queriesBm25TunerLow relevance scores?
Bm25Parameters.Aggressive for large corpora)EnglishTokenizer instead of SimpleTokenizerOut of memory?
cd tests/ElBruno.BM25.Tests
dotnet test
Test Coverage:
MIT License. See for details.
Made with โค๏ธ for .NET developers who need fast, lightweight full-text search.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 net8.0 is compatible. net8.0-android net8.0-android was computed. net8.0-browser net8.0-browser was computed. net8.0-ios net8.0-ios was computed. net8.0-maccatalyst net8.0-maccatalyst was computed. net8.0-macos net8.0-macos was computed. net8.0-tvos net8.0-tvos was computed. net8.0-windows net8.0-windows was computed. net9.0 net9.0 was computed. net9.0-android net9.0-android was computed. net9.0-browser net9.0-browser was computed. net9.0-ios net9.0-ios was computed. net9.0-maccatalyst net9.0-maccatalyst was computed. net9.0-macos net9.0-macos was computed. net9.0-tvos net9.0-tvos was computed. net9.0-windows net9.0-windows was computed. net10.0 net10.0 was computed. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
Showing the top 1 NuGet packages that depend on ElBruno.BM25:
| Package | Downloads |
|---|---|
|
MemPalace.Search
Semantic and hybrid search for MemPalace.NET with vector similarity, keyword boosting, and optional reranking. |
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.5.0 | 452 | 4/29/2026 |