![]() |
VOOZH | about |
FieldCure.DocumentParsers.Ocr 1.0.0
Additional DetailsRenamed to FieldCure.DocumentParsers.Ocr 1.0. Same functionality, new namespace FieldCure.DocumentParsers.Ocr.*; AddPdfOcrSupport() → AddOcrSupport(). Windows only.
dotnet add package FieldCure.DocumentParsers.Pdf.Ocr --version 1.0.1
NuGet\Install-Package FieldCure.DocumentParsers.Pdf.Ocr -Version 1.0.1
<PackageReference Include="FieldCure.DocumentParsers.Pdf.Ocr" Version="1.0.1" />
<PackageVersion Include="FieldCure.DocumentParsers.Pdf.Ocr" Version="1.0.1" />Directory.Packages.props
<PackageReference Include="FieldCure.DocumentParsers.Pdf.Ocr" />Project file
paket add FieldCure.DocumentParsers.Pdf.Ocr --version 1.0.1
#r "nuget: FieldCure.DocumentParsers.Pdf.Ocr, 1.0.1"
#:package FieldCure.DocumentParsers.Pdf.Ocr@1.0.1
#addin nuget:?package=FieldCure.DocumentParsers.Pdf.Ocr&version=1.0.1Install as a Cake Addin
#tool nuget:?package=FieldCure.DocumentParsers.Pdf.Ocr&version=1.0.1Install as a Cake Tool
Tesseract OCR fallback for scanned PDFs in FieldCure.DocumentParsers.Pdf.
using FieldCure.DocumentParsers.Pdf.Ocr;
// Register PDF parser with OCR fallback (call once at startup)
using var ocrEngine = DocumentParserFactoryOcrExtensions.AddPdfOcrSupport();
// Use as usual — scanned pages are automatically OCR'd
var parser = DocumentParserFactory.GetParser(".pdf")!;
var text = parser.ExtractText(File.ReadAllBytes("scanned.pdf"));
eng.traineddata)kor.traineddata)Languages are auto-discovered from embedded traineddata files.
Uses an engine pool (default size: min(ProcessorCount, 4)) for concurrent OCR processing.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 net8.0 is compatible. net8.0-android net8.0-android was computed. net8.0-browser net8.0-browser was computed. net8.0-ios net8.0-ios was computed. net8.0-maccatalyst net8.0-maccatalyst was computed. net8.0-macos net8.0-macos was computed. net8.0-tvos net8.0-tvos was computed. net8.0-windows net8.0-windows was computed. net9.0 net9.0 was computed. net9.0-android net9.0-android was computed. net9.0-browser net9.0-browser was computed. net9.0-ios net9.0-ios was computed. net9.0-maccatalyst net9.0-maccatalyst was computed. net9.0-macos net9.0-macos was computed. net9.0-tvos net9.0-tvos was computed. net9.0-windows net9.0-windows was computed. net10.0 net10.0 was computed. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
This package is not used by any NuGet packages.
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated | |
|---|---|---|---|
| 1.0.1 | 167 | 4/8/2026 | 1.0.1 is deprecated because it is no longer maintained. |
| 1.0.0 | 138 | 4/8/2026 | 1.0.0 is deprecated because it is no longer maintained. |
# Release Notes — FieldCure.DocumentParsers.Pdf.Ocr
## [1.0.1] - 2026-04-08
### Fixed
- Tesseract native DLLs (`leptonica-1.82.0.dll`, `tesseract50.dll`) now included in NuGet package with `build/` and `buildTransitive/` targets, fixing `DllNotFoundException` in `PackAsTool` consumers (e.g., MCP servers)
## [1.0.0] - 2026-04-08
### Added
- `TesseractOcrEngine` — Tesseract OCR fallback for scanned PDFs with no text layer
- Embedded traineddata (tessdata_fast): English + Korean
- Automatic language discovery from tessdata directory
- Korean post-processing: removes spurious inter-character spaces from Tesseract output
- Engine pool via `ConcurrentBag` + `SemaphoreSlim` for concurrent OCR (default: `min(ProcessorCount, 4)`)
- `DocumentParserFactoryOcrExtensions.AddPdfOcrSupport()` — one-line factory registration with OCR