![]() |
VOOZH | about |
dotnet add package Unhwp --version 0.5.1
NuGet\Install-Package Unhwp -Version 0.5.1
<PackageReference Include="Unhwp" Version="0.5.1" />
<PackageVersion Include="Unhwp" Version="0.5.1" />Directory.Packages.props
<PackageReference Include="Unhwp" />Project file
paket add Unhwp --version 0.5.1
#r "nuget: Unhwp, 0.5.1"
#:package Unhwp@0.5.1
#addin nuget:?package=Unhwp&version=0.5.1Install as a Cake Addin
#tool nuget:?package=Unhwp&version=0.5.1Install as a Cake Tool
High-performance .NET library for extracting HWP/HWPX Korean word processor documents to Markdown.
dotnet add package Unhwp
Or via NuGet Package Manager:
Install-Package Unhwp
using Unhwp;
// Simple conversion
string markdown = UnhwpConverter.ToMarkdown("document.hwp");
Console.WriteLine(markdown);
// Extract plain text
string text = UnhwpConverter.ExtractText("document.hwp");
// Full parsing with images
using var result = UnhwpConverter.Parse("document.hwp");
Console.WriteLine(result.Markdown);
Console.WriteLine($"Sections: {result.SectionCount}");
Console.WriteLine($"Paragraphs: {result.ParagraphCount}");
// Save images
foreach (var img in result.Images)
{
img.Save($"output/{img.Name}");
}
Version - Gets the library version stringSupportedFormats - Gets a description of supported formatsDetectFormat(string path) -> DocumentFormatDetect the format of a document file.
var format = UnhwpConverter.DetectFormat("document.hwp");
if (format == DocumentFormat.Hwp5)
Console.WriteLine("HWP 5.0 format");
Parse(string path, RenderOptions? options = null) -> ParseResultParse a document with full access to content and images.
using var result = UnhwpConverter.Parse("document.hwp");
Console.WriteLine(result.Markdown);
Console.WriteLine(result.Text);
foreach (var img in result.Images)
Console.WriteLine($"{img.Name}: {img.Data.Length} bytes");
ParseBytes(byte[] data, RenderOptions? options = null) -> ParseResultParse a document from byte array.
byte[] documentBytes = File.ReadAllBytes("document.hwp");
using var result = UnhwpConverter.ParseBytes(documentBytes);
Console.WriteLine(result.Markdown);
ToMarkdown(string path) -> stringConvert an HWP/HWPX document to Markdown.
string markdown = UnhwpConverter.ToMarkdown("document.hwp");
ToMarkdownWithCleanup(string path, CleanupOptions? options = null) -> stringConvert with optional cleanup.
string markdown = UnhwpConverter.ToMarkdownWithCleanup(
"document.hwp",
CleanupOptions.Aggressive
);
ExtractText(string path) -> stringExtract plain text content.
string text = UnhwpConverter.ExtractText("document.hwp");
ParseResultResult of parsing a document. Implements IDisposable.
Properties:
Markdown - Rendered Markdown contentText - Plain text contentRawContent - Content without cleanupSectionCount - Number of sectionsParagraphCount - Number of paragraphsImageCount - Number of imagesImages - List of extracted imagesRenderOptionsOptions for Markdown rendering.
var opts = new RenderOptions
{
IncludeFrontmatter = true,
ImagePathPrefix = "images/",
TableFallback = TableFallback.Html,
PreserveLineBreaks = false,
EscapeSpecialChars = true
};
CleanupOptionsOptions for output cleanup.
// Presets
var minimal = CleanupOptions.Minimal;
var defaultOpts = CleanupOptions.Default;
var aggressive = CleanupOptions.Aggressive;
var disabled = CleanupOptions.Disabled;
// Custom
var custom = new CleanupOptions
{
Enabled = true,
Preset = CleanupPreset.Default,
DetectMojibake = true,
PreserveFrontmatter = true
};
UnhwpImageRepresents an extracted image.
Properties:
Name - Image filenameData - Image data as byte arrayMethods:
Save(string path) - Save image to fileDocumentFormatUnknown - Unknown formatHwp5 - HWP 5.0 binary formatHwpx - HWPX XML formatHwp3 - HWP 3.x legacy formatTableFallbackMarkdown - Render as Markdown tablesHtml - Render as HTML tablesText - Render as plain textCleanupPresetMinimal - Minimal cleanupDefault - Balanced cleanupAggressive - Maximum cleanupMIT License - see for details.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 net10.0 is compatible. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
Showing the top 2 NuGet packages that depend on Unhwp:
| Package | Downloads |
|---|---|
|
FileFlux
Complete document processing SDK optimized for RAG systems. Transform PDF, DOCX, Excel, PowerPoint, Markdown and other formats into high-quality chunks with intelligent semantic boundary detection. Includes advanced chunking strategies, metadata extraction, and performance optimization. |
|
|
FileFlux.Core
Pure document extraction SDK for RAG systems. Zero AI dependencies. Extract text from PDF, DOCX, Excel, PowerPoint, Markdown, HTML, and text files. Provides IDocumentReader interface and implementations. Use FileFlux.Core for extraction-only scenarios. For AI-enhanced extraction (image OCR, captioning), use the FileFlux package. |
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.5.1 | 102 | 6/5/2026 |
| 0.5.0 | 106 | 5/31/2026 |
| 0.4.0 | 98 | 5/31/2026 |
| 0.3.2 | 332 | 5/12/2026 |
| 0.3.1 | 103 | 5/12/2026 |
| 0.3.0 | 107 | 5/6/2026 |
| 0.2.4 | 132 | 4/14/2026 |
| 0.2.3 | 125 | 3/19/2026 |
| 0.2.2 | 2,765 | 2/23/2026 |
| 0.2.1 | 773 | 2/21/2026 |
| 0.2.0 | 118 | 2/21/2026 |
| 0.1.16 | 260 | 2/5/2026 |
| 0.1.14 | 193 | 1/29/2026 |
| 0.1.13 | 483 | 1/28/2026 |
| 0.1.12 | 121 | 1/27/2026 |
| 0.1.11 | 124 | 1/27/2026 |