![]() |
VOOZH | about |
dotnet add package Mythosia.Documents.Pdf --version 1.1.1
NuGet\Install-Package Mythosia.Documents.Pdf -Version 1.1.1
<PackageReference Include="Mythosia.Documents.Pdf" Version="1.1.1" />
<PackageVersion Include="Mythosia.Documents.Pdf" Version="1.1.1" />Directory.Packages.props
<PackageReference Include="Mythosia.Documents.Pdf" />Project file
paket add Mythosia.Documents.Pdf --version 1.1.1
#r "nuget: Mythosia.Documents.Pdf, 1.1.1"
#:package Mythosia.Documents.Pdf@1.1.1
#addin nuget:?package=Mythosia.Documents.Pdf&version=1.1.1Install as a Cake Addin
#tool nuget:?package=Mythosia.Documents.Pdf&version=1.1.1Install as a Cake Tool
PDF document loader. Parses PDF files into DoclingDocument structured models via PdfPig. Provides font-size based heading detection, bullet/numbered list recognition, and spatial paragraph grouping. Supports encrypted PDFs, metadata extraction, and page number headers.
dotnet add package Mythosia.Documents.Pdf
using Mythosia.Documents.Pdf;
var loader = new PdfDocumentLoader();
IReadOnlyList<DoclingDocument> docs = await loader.LoadAsync("docs/manual.pdf");
string markdown = docs[0].ToMarkdown();
var service = new AnthropicService(apiKey, httpClient)
.WithRag(rag => rag
.AddDocuments(new PdfDocumentLoader(), "docs/manual.pdf")
);
// Or auto-select loader by extension:
var service = new AnthropicService(apiKey, httpClient)
.WithRag(rag => rag.AddDocument("docs/manual.pdf"));
The parser analyses font sizes and spatial layout to produce a structured DoclingDocument:
•, -, *, etc.) or numbered patterns (1., a), iv.) are emitted as list items.GetWords() returns no results but raw page text exists, the text is preserved as a paragraph.using Mythosia.Documents.Pdf;
var options = new PdfParserOptions
{
Password = null, // For encrypted PDFs
IncludeMetadata = true, // Extract title, author, page count
IncludePageNumbers = false, // Add page number headers
NormalizeWhitespace = true, // Collapse excessive whitespace (preserves newlines)
};
var loader = new PdfDocumentLoader(options: options);
Implement IDocumentParser and pass it to the loader:
var loader = new PdfDocumentLoader(parser: new MyCustomPdfParser());
| Package | Description |
|---|---|
| Mythosia.Documents.Abstractions | Core abstractions (DoclingDocument, IDocumentLoader) |
| Mythosia.Documents.Office | Word / Excel / PowerPoint loaders |
| Mythosia.AI.Rag | RAG pipeline |
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 net5.0 was computed. net5.0-windows net5.0-windows was computed. net6.0 net6.0 was computed. net6.0-android net6.0-android was computed. net6.0-ios net6.0-ios was computed. net6.0-maccatalyst net6.0-maccatalyst was computed. net6.0-macos net6.0-macos was computed. net6.0-tvos net6.0-tvos was computed. net6.0-windows net6.0-windows was computed. net7.0 net7.0 was computed. net7.0-android net7.0-android was computed. net7.0-ios net7.0-ios was computed. net7.0-maccatalyst net7.0-maccatalyst was computed. net7.0-macos net7.0-macos was computed. net7.0-tvos net7.0-tvos was computed. net7.0-windows net7.0-windows was computed. net8.0 net8.0 was computed. net8.0-android net8.0-android was computed. net8.0-browser net8.0-browser was computed. net8.0-ios net8.0-ios was computed. net8.0-maccatalyst net8.0-maccatalyst was computed. net8.0-macos net8.0-macos was computed. net8.0-tvos net8.0-tvos was computed. net8.0-windows net8.0-windows was computed. net9.0 net9.0 was computed. net9.0-android net9.0-android was computed. net9.0-browser net9.0-browser was computed. net9.0-ios net9.0-ios was computed. net9.0-maccatalyst net9.0-maccatalyst was computed. net9.0-macos net9.0-macos was computed. net9.0-tvos net9.0-tvos was computed. net9.0-windows net9.0-windows was computed. net10.0 net10.0 was computed. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
| .NET Core | netcoreapp3.0 netcoreapp3.0 was computed. netcoreapp3.1 netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.1 netstandard2.1 is compatible. |
| MonoAndroid | monoandroid monoandroid was computed. |
| MonoMac | monomac monomac was computed. |
| MonoTouch | monotouch monotouch was computed. |
| Tizen | tizen60 tizen60 was computed. |
| Xamarin.iOS | xamarinios xamarinios was computed. |
| Xamarin.Mac | xamarinmac xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos xamarinwatchos was computed. |
Showing the top 1 NuGet packages that depend on Mythosia.Documents.Pdf:
| Package | Downloads |
|---|---|
|
Mythosia.AI.Rag
RAG (Retrieval Augmented Generation) orchestration for Mythosia.AI. Implements Mythosia.AI.Rag.Abstractions v6.x. Includes RagPipeline, text splitters, context builder, OpenAI/vLLM embedding providers, hybrid search (BM25 + Vector + RRF), re-ranking (Cohere, LLM, vLLM), Agentic RAG tool registration with per-call RagQueryOptions and structured search traces, search gate, keyword extraction, weighted-blend final selection, progress reporting, DoclingDocument-to-RagDocument conversion, and per-query VectorFilter passthrough (StoreFilter). Depends on Mythosia.AI.Abstractions (IAIService) instead of the full Mythosia.AI implementation. |
This package is not used by any popular GitHub repositories.
v1.1.1: Recompiled against Mythosia.Documents.Abstractions 1.1.0 (pluggable table serialization).
v1.1.0: Structured extraction — font-size heading detection, bullet/numbered list recognition, spatial paragraph grouping. Direct metadata access (reflection removed). NormalizeWhitespace preserves newlines. Fallback for PDFs with no extractable words.