Encamina.Enmarcha.SemanticKernel.Connectors.Document 10.0.5

.NET 10.0

dotnet add package Encamina.Enmarcha.SemanticKernel.Connectors.Document --version 10.0.5

NuGet\Install-Package Encamina.Enmarcha.SemanticKernel.Connectors.Document -Version 10.0.5

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Encamina.Enmarcha.SemanticKernel.Connectors.Document" Version="10.0.5" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Encamina.Enmarcha.SemanticKernel.Connectors.Document" Version="10.0.5" />
 

 Directory.Packages.props

<PackageReference Include="Encamina.Enmarcha.SemanticKernel.Connectors.Document" />
 

 Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Encamina.Enmarcha.SemanticKernel.Connectors.Document --version 10.0.5

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Encamina.Enmarcha.SemanticKernel.Connectors.Document, 10.0.5"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Encamina.Enmarcha.SemanticKernel.Connectors.Document@10.0.5

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Encamina.Enmarcha.SemanticKernel.Connectors.Document&version=10.0.5
 

 Install as a Cake Addin

#tool nuget:?package=Encamina.Enmarcha.SemanticKernel.Connectors.Document&version=10.0.5
 

 Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Semantic Kernel - Document Connectors

👁 Nuget package

Document Connectors specializes in reading information from files in various formats and subsequently chunking it. The most typical use case is, within the context of generating document embeddings, reading information from a variety of file formats (pdf, docx, pptx, etc.) and chunks its content into smaller parts.

Setup

Nuget package

First, install NuGet. Then, install Encamina.Enmarcha.SemanticKernel.Connectors.Document from the package manager console:

PM> Install-Package Encamina.Enmarcha.SemanticKernel.Connectors.Document

.NET CLI:

First, install .NET CLI. Then, install Encamina.Enmarcha.SemanticKernel.Connectors.Document from the .NET CLI:

dotnet add package Encamina.Enmarcha.SemanticKernel.Connectors.Document

How to use

Starting from a Program.cs or a similar entry point file in your project, add the following code:

// Entry point
var builder = WebApplication.CreateBuilder(new WebApplicationOptions
{
 // ...
});

// ...

services.AddDefaultDocumentContentExtractor();

This extension method will add the default implementation of the interface as a singleton. The default implementation is . With this, we can resolve the IDocumentContentExtractor interface and obtain the chunks of a file:

Construction injection

public class MyClass
{
 private readonly IDocumentContentExtractor documentContentExtractor;

 public MyClass(IDocumentContentExtractor documentContentExtractor)
 {
 this.documentContentExtractor = documentContentExtractor;
 }

 public IEnumerable<string> GetPdfChunks()
 {
 using var file = File.OpenRead("example.pdf");

 var pdfChunks = documentContentExtractor.GetDocumentContent(file, ".pdf");

 return pdfChunks;
 }
}

Service Provider

var serviceProvider = services.BuildServiceProvider();
var documentContentExtractor = serviceProvider.GetRequiredService<IDocumentContentExtractor>();

using var file = File.OpenRead("example.pdf");
var fileChunks = documentContentExtractor.GetDocumentContent(file, ".pdf");

For the above code to be fully functional, it is necessary to configure some additional services, specifically the interface and a .

The previous code, based on the file extension, searches for a suitable IDocumentConnector for the file type, processes the file to extract its text and finally, it uses an ITextSplitter to split the text into chunks.

Details about the `IDocumentConnector`

The default implementation DefaultDocumentContentExtractor, uses the following IDocumentConnectors:

WordDocumentConnector: For .docx files, it extracts the text from the file by adding each paragraph on a new line.
: For .pdf files, it extracts the raw text from the file (with all words separated by spaces) and removes common words, typically headers or footers that appear in at least 25% of the document.
: For .pptx files, it extracts the text from the file, with one line per paragraph found in each slide.
: For .txt files, it extracts the raw text from the file using UTF-8 as the character encoding.
: For .md files, it extracts the raw text from the file using UTF-8 as the character encoding.
: For .vtt files, it extracts the text from the subtitles while removing the timestamp marks. Use UTF-8 as the character encoding.

For other formats, it throws a NotSupportedException.

Others available `IDocumentConnector`

: For .pptx files, it extracts the text from the file with just one line for each slide found.
: For .pdf files, it extracts the raw text from the file for each page (all words separated by spaces) and add a line break between the text of each page.
: For .pdf files, it retrieve the Table of Contents and generates, for each Table of Contents item, a text with the section title, a colon mark (:), and the content text of the section (e.g. Title1: Content of the Title1 section). Add a line break between each section. The output format of the text is configurable with the TocItemFormat property. Additionally, remove common words, typically headers or footers that appear in at least 25% of the document.
: For .pdf files, it extracts the text from the file and attempts to preserve the document's formatting, including paragraphs, titles, and other structural elements. Additionally, it removes common words, typically headers or footers that appear in at least 25% of the document, and it excludes non-horizontal text. During the text extraction process, an effort is made to retain the document's format; however, it is important to note that this process relies on OCR recognition, which is not perfect, and the results may vary depending on the quality of the PDF.

Use your own `IDocumentConnector`

To use your own IDocumentConnectors, you can use the base class and override the GetDocumentConnector method. This way, you can return your own IDocumentConnectors to handle a specific file format based on the file extension.

public class MyCustomDocumentContentExtractor : DocumentContentExtractorBase
{
 public MyCustomDocumentContentExtractor(ITextSplitter textSplitter, Func<string, int> lengthFunction) : base(textSplitter, lengthFunction)
 {
 }

 protected override IDocumentConnector GetDocumentConnector(string fileExtension)
 {
 return fileExtension.ToUpperInvariant() switch
 {
 @".rtf" => new MyCustomRtfDocumentConnector(),
 @".pdf" => new PdfWithTocDocumentConnector(),
 @".txt" => new TxtDocumentConnector(Encoding.UTF8),
 _ => throw new NotSupportedException(fileExtension),
 };
 }
}

Don't forget to register it.

// Entry point
var builder = WebApplication.CreateBuilder(new WebApplicationOptions
{
 // ...
});

// ...

// Now we use our own implementation
// services.AddDefaultDocumentContentExtractor();

services.AddSingleton<IDocumentContentExtractor, MyCustomDocumentContentExtractor>();

With this, you will be able to use the extractor you need for each type of file.

Product	Versions Compatible and additional computed target framework versions.
.NET	net10.0 net10.0 is compatible. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net10.0
- BitMiracle.LibTiff.NET (>= 2.4.660)
- CommunityToolkit.Diagnostics (>= 8.4.0)
- Encamina.Enmarcha.AI.Abstractions (>= 10.0.5)
- Encamina.Enmarcha.AI.OpenAI.Abstractions (>= 10.0.5)
- Encamina.Enmarcha.AI.OpenAI.Azure (>= 10.0.5)
- Encamina.Enmarcha.Core (>= 10.0.5)
- Encamina.Enmarcha.DependencyInjection (>= 10.0.5)
- ExcelNumberFormat (>= 1.1.0)
- HtmlAgilityPack (>= 1.12.0)
- Microsoft.Extensions.Http (>= 10.0.1)
- Microsoft.Extensions.Options.ConfigurationExtensions (>= 10.0.1)
- Microsoft.Extensions.Options.DataAnnotations (>= 10.0.1)
- Microsoft.SemanticKernel.Connectors.AzureOpenAI (>= 1.74.0)
- Microsoft.SemanticKernel.Plugins.Document (>= 1.74.0-alpha)
- PdfPig (>= 0.1.10)
- ScratchPad.NPOI.HWPF (>= 2.5.7)
- SixLabors.ImageSharp (>= 3.1.12)
- System.Drawing.Common (>= 10.0.1)
- System.Memory.Data (>= 10.0.1)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
10.0.5	119	6/1/2026
10.0.4	576	4/8/2026
10.0.3	252	4/6/2026
10.0.2	491	12/17/2025
10.0.1	307	12/17/2025
10.0.0	313	12/16/2025
10.0.0-preview-09	431	11/19/2025
10.0.0-preview-08	446	11/18/2025
10.0.0-preview-07	711	10/22/2025
10.0.0-preview-06	321	10/14/2025
10.0.0-preview-05	211	10/8/2025
10.0.0-preview-04	209	10/7/2025
10.0.0-preview-03	343	9/16/2025
10.0.0-preview-02	342	9/16/2025
8.3.0	512	9/10/2025
8.3.0-preview-02	221	9/10/2025
8.3.0-preview-01	223	9/8/2025
8.2.1-preview-08	228	8/18/2025
8.2.1-preview-07	208	8/12/2025

Loading failed

URL: https://www.nuget.org/packages/Encamina.Enmarcha.SemanticKernel.Connectors.Document/

⇱ NuGet Gallery | Encamina.Enmarcha.SemanticKernel.Connectors.Document 10.0.5

Encamina.Enmarcha.SemanticKernel.Connectors.Document 10.0.5

Semantic Kernel - Document Connectors

Setup

Nuget package

.NET CLI:

How to use

Construction injection

Service Provider

Details about the `IDocumentConnector`

Others available `IDocumentConnector`

Use your own `IDocumentConnector`

net10.0

NuGet packages

GitHub repositories

URL: https://www.nuget.org/packages/Encamina.Enmarcha.SemanticKernel.Connectors.Document/

⇱ NuGet Gallery | Encamina.Enmarcha.SemanticKernel.Connectors.Document 10.0.5

Encamina.Enmarcha.SemanticKernel.Connectors.Document 10.0.5

Semantic Kernel - Document Connectors

Setup

Nuget package

.NET CLI:

How to use

Construction injection

Service Provider

Details about the IDocumentConnector

Others available IDocumentConnector

Use your own IDocumentConnector

net10.0

NuGet packages

GitHub repositories

Details about the `IDocumentConnector`

Others available `IDocumentConnector`

Use your own `IDocumentConnector`