![]() |
VOOZH | about |
dotnet add package Mythosia.Documents.Abstractions --version 1.2.0
NuGet\Install-Package Mythosia.Documents.Abstractions -Version 1.2.0
<PackageReference Include="Mythosia.Documents.Abstractions" Version="1.2.0" />
<PackageVersion Include="Mythosia.Documents.Abstractions" Version="1.2.0" />Directory.Packages.props
<PackageReference Include="Mythosia.Documents.Abstractions" />Project file
paket add Mythosia.Documents.Abstractions --version 1.2.0
#r "nuget: Mythosia.Documents.Abstractions, 1.2.0"
#:package Mythosia.Documents.Abstractions@1.2.0
#addin nuget:?package=Mythosia.Documents.Abstractions&version=1.2.0Install as a Cake Addin
#tool nuget:?package=Mythosia.Documents.Abstractions&version=1.2.0Install as a Cake Tool
Core document abstractions for structured document loading and parsing. Framework-agnostic — usable with any RAG pipeline or document processing system.
dotnet add package Mythosia.Documents.Abstractions
Unified structured document representation following the docling convention. Content items are stored in flat lists; the tree structure is maintained via body/furniture root nodes.
using Mythosia.Documents;
using Mythosia.Documents.Elements;
var doc = new DoclingDocument
{
Name = "report",
Source = "docs/report.pdf",
};
// Builder API
doc.AddTitle("Annual Report");
doc.AddHeading("Revenue", level: 2);
doc.AddParagraph("Total revenue increased by 15%.");
doc.AddCode("var x = 42;", language: "csharp");
// Export to Markdown
string markdown = doc.ToMarkdown();
// Optional: override table rendering strategy
doc.TableSerializer = new SemanticTableSerializer();
string semanticMarkdown = doc.ToMarkdown();
For plain-text content that should be preserved as-is, use RawContent:
var doc = new DoclingDocument
{
Name = "notes",
Source = "notes.txt",
RawContent = rawText, // ToMarkdown() returns this directly
};
DoclingDocument.ToMarkdown() uses MarkdownSerializer to render the body tree. Body text is escaped by default so source text such as *literal*, [brackets], | pipes, and backticks stays literal Markdown content instead of becoming formatting.
using Mythosia.Documents.Elements;
var doc = new DoclingDocument();
doc.AddParagraph("Keep *this* literal and preserve [brackets].");
string safeMarkdown = doc.ToMarkdown();
// Keep \*this\* literal and preserve \[brackets\].
var serializer = new MarkdownSerializer
{
EscapeText = false,
};
string rawMarkdown = serializer.Serialize(doc);
MarkdownSerializer also clamps heading output to Markdown # through ###### and inserts a blank line when a list is followed by another block element, preventing the next paragraph, heading, table, code block, formula, or image placeholder from being absorbed into the list.
Table rendering is pluggable via ITableSerializer. The default is GridTableSerializer (standard Markdown pipe table). Switch to SemanticTableSerializer for form-style documents:
using Mythosia.Documents.Elements;
// Default: pipe table
var doc = new DoclingDocument { Name = "report" };
string md = doc.ToMarkdown(); // uses GridTableSerializer
// Semantic: bold group labels for form-style tables
doc.TableSerializer = new SemanticTableSerializer();
string md2 = doc.ToMarkdown(); // uses SemanticTableSerializer
| Serializer | Output Style |
|---|---|
GridTableSerializer |
Standard Markdown pipe table (default) |
SemanticTableSerializer |
Form-style with **bold labels** and inline data |
public interface IDocumentLoader
{
Task<IReadOnlyList<DoclingDocument>> LoadAsync(
string source, CancellationToken cancellationToken = default);
}
public interface IDocumentParser
{
bool CanParse(string source);
Task<DoclingDocument> ParseAsync(string source, CancellationToken ct = default);
}
| Type | Description |
|---|---|
TextItem |
Paragraph, generic text |
TitleItem |
Document title rendered as Markdown H1 |
SectionHeaderItem |
Section heading rendered as Markdown H2-H6 for standard heading levels |
CodeItem |
Code block with language |
DocListItem |
List item (ordered/unordered) |
TableItem / TableData / TableCell |
Table structure |
TableSemanticView |
Semantic group/column analysis for table layout |
PictureItem |
Image placeholder |
GroupItem |
Container (chapter, slide, sheet) |
| Package | Description |
|---|---|
| Mythosia.Documents.Hwp | HWP (Korean word processor) loader |
| Mythosia.Documents.Office | Word / Excel / PowerPoint loaders |
| Mythosia.Documents.Pdf | PDF loader (PdfPig) |
| Mythosia.AI.Rag | RAG pipeline that consumes DoclingDocument |
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 net5.0 was computed. net5.0-windows net5.0-windows was computed. net6.0 net6.0 was computed. net6.0-android net6.0-android was computed. net6.0-ios net6.0-ios was computed. net6.0-maccatalyst net6.0-maccatalyst was computed. net6.0-macos net6.0-macos was computed. net6.0-tvos net6.0-tvos was computed. net6.0-windows net6.0-windows was computed. net7.0 net7.0 was computed. net7.0-android net7.0-android was computed. net7.0-ios net7.0-ios was computed. net7.0-maccatalyst net7.0-maccatalyst was computed. net7.0-macos net7.0-macos was computed. net7.0-tvos net7.0-tvos was computed. net7.0-windows net7.0-windows was computed. net8.0 net8.0 was computed. net8.0-android net8.0-android was computed. net8.0-browser net8.0-browser was computed. net8.0-ios net8.0-ios was computed. net8.0-maccatalyst net8.0-maccatalyst was computed. net8.0-macos net8.0-macos was computed. net8.0-tvos net8.0-tvos was computed. net8.0-windows net8.0-windows was computed. net9.0 net9.0 was computed. net9.0-android net9.0-android was computed. net9.0-browser net9.0-browser was computed. net9.0-ios net9.0-ios was computed. net9.0-maccatalyst net9.0-maccatalyst was computed. net9.0-macos net9.0-macos was computed. net9.0-tvos net9.0-tvos was computed. net9.0-windows net9.0-windows was computed. net10.0 net10.0 was computed. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
| .NET Core | netcoreapp3.0 netcoreapp3.0 was computed. netcoreapp3.1 netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.1 netstandard2.1 is compatible. |
| MonoAndroid | monoandroid monoandroid was computed. |
| MonoMac | monomac monomac was computed. |
| MonoTouch | monotouch monotouch was computed. |
| Tizen | tizen60 tizen60 was computed. |
| Xamarin.iOS | xamarinios xamarinios was computed. |
| Xamarin.Mac | xamarinmac xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos xamarinwatchos was computed. |
Showing the top 3 NuGet packages that depend on Mythosia.Documents.Abstractions:
| Package | Downloads |
|---|---|
|
Mythosia.Documents.Pdf
PDF document loader. Parses PDF files into DoclingDocument structured models via PdfPig. Font-size based heading detection, bullet/numbered list recognition, spatial paragraph grouping. Supports encrypted PDFs, metadata extraction, and page number headers. |
|
|
Mythosia.Documents.Office
Office document loaders for Word (.docx), Excel (.xlsx), and PowerPoint (.pptx). Parses documents into DoclingDocument structured models via OpenXml, preserving heading hierarchy and slide content order. |
|
|
Mythosia.Documents.Hwp
HWP document loader. Parses Korean word-processor (.hwp) files into DoclingDocument structured models via HwpLibSharp. Section/paragraph text extraction with table support. |
This package is not used by any popular GitHub repositories.
v1.2.0: MarkdownSerializer now escapes Markdown-significant characters in body text by default via EscapeText, clamps heading output to H1-H6, and inserts blank lines when leaving list blocks. RawContent continues to bypass serialization. Updated System.Text.Json to 10.0.7.
v1.1.0: Added pluggable table serialization. ITableSerializer strategy interface with GridTableSerializer (pipe table, default) and SemanticTableSerializer (form-style group rendering with bold labels). DoclingDocument.TableSerializer property allows per-document override. TableData and TableSemanticView for structural table analysis.
v1.0.0: Initial release as Mythosia.Documents.Abstractions. DoclingDocument structured model with body tree, RawContent bypass, Metadata, Builder API, and Markdown export. IDocumentLoader returns DoclingDocument. Element types in Mythosia.Documents.Elements namespace.