VOOZH about

URL: https://www.nuget.org/packages/TesseractOCR/

⇱ NuGet Gallery | TesseractOCR 5.5.2




👁 Image
TesseractOCR 5.5.2

dotnet add package TesseractOCR --version 5.5.2
 
 
NuGet\Install-Package TesseractOCR -Version 5.5.2
 
 
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="TesseractOCR" Version="5.5.2" />
 
 
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="TesseractOCR" Version="5.5.2" />
 
Directory.Packages.props
<PackageReference Include="TesseractOCR" />
 
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add TesseractOCR --version 5.5.2
 
 
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: TesseractOCR, 5.5.2"
 
 
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package TesseractOCR@5.5.2
 
 
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=TesseractOCR&version=5.5.2
 
Install as a Cake Addin
#tool nuget:?package=TesseractOCR&version=5.5.2
 
Install as a Cake Tool
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

👁 image

What is TesseractOCR

It is a .NET wrapper for Tesseract 5.5.0 that is originally copied from Charles Weld (https://github.com/charlesw/tesseract) and modified for my own needs

How to use

You need trained data in tessdata by language You can get them at https://github.com/tesseract-ocr/tessdata or https://github.com/tesseract-ocr/tessdata_fast

Microsoft Visual C++ runtimes

The DLL's Tesseract54.dll (and exe) and leptonica-1.85.0.dll are compiled with Visual Studio 2022 you need these C++ runtimes for it on your computer

How to use in Docker on Linux

See this wiki for more information https://github.com/Sicos1977/TesseractOCR/wiki/How-to-use-in-Docker-on-Linux

OCR a page

using var engine = new Engine(@"./tessdata", Language.English, EngineMode.Default);
using var img = TesseractOCR.Pix.Image.LoadFromFile(testImagePath);
using var page = engine.Process(img);
Console.WriteLine("Mean confidence: {0}", page.MeanConfidence);
Console.WriteLine("Text: \r\n{0}", page.Text);

Iterate through the layout of a page

using var engine = new Engine(@"./tessdata", Language.English, EngineMode.Default);
using var img = Pix.Image.LoadFromFile(testImagePath);
using var page = engine.Process(img);

var result = new StringBuilder();

foreach (var block in page.Layout)
{
 result.AppendLine($"Block confidence: {block.Confidence}");
 if (block.BoundingBox != null)
 {
 var boundingBox = block.BoundingBox.Value;
 result.AppendLine($"Block bounding box X1 '{boundingBox.X1}', Y1 '{boundingBox.Y2}', X2 " +
 $"'{boundingBox.X2}', Y2 '{boundingBox.Y2}', width '{boundingBox.Width}', height '{boundingBox.Height}'");
 }
 result.AppendLine($"Block text: {block.Text}");

 foreach (var paragraph in block.Paragraphs)
 {
 result.AppendLine($"Paragraph confidence: {paragraph.Confidence}");
 if (paragraph.BoundingBox != null)
 {
 var boundingBox = paragraph.BoundingBox.Value;
 result.AppendLine($"Paragraph bounding box X1 '{boundingBox.X1}', Y1 '{boundingBox.Y2}', X2 " +
 $"'{boundingBox.X2}', Y2 '{boundingBox.Y2}', width '{boundingBox.Width}', height '{boundingBox.Height}'");
 }
 var info = paragraph.Info;
 result.AppendLine($"Paragraph info justification: {info.Justification}");
 result.AppendLine($"Paragraph info is list item: {info.IsListItem}");
 result.AppendLine($"Paragraph info is crown: {info.IsCrown}");
 result.AppendLine($"Paragraph info first line ident: {info.FirstLineIdent}");
 result.AppendLine($"Paragraph text: {paragraph.Text}");
 
 foreach (var textLine in paragraph.TextLines)
 {
 if (textLine.BoundingBox != null)
 {
 var boundingBox = textLine.BoundingBox.Value;
 result.AppendLine($"Text line bounding box X1 '{boundingBox.X1}', Y1 '{boundingBox.Y2}', X2 " +
 $"'{boundingBox.X2}', Y2 '{boundingBox.Y2}', width '{boundingBox.Width}', height '{boundingBox.Height}'");
 }
 result.AppendLine($"Text line confidence: {textLine.Confidence}");
 result.AppendLine($"Text line text: {textLine.Text}");

 foreach (var word in textLine.Words)
 {
 result.AppendLine($"Word confidence: {word.Confidence}");
 if (word.BoundingBox != null)
 {
 var boundingBox = word.BoundingBox.Value;
 result.AppendLine($"Word bounding box X1 '{boundingBox.X1}', Y1 '{boundingBox.Y2}', X2 " +
 $"'{boundingBox.X2}', Y2 '{boundingBox.Y2}', width '{boundingBox.Width}', height '{boundingBox.Height}'");
 }
 result.AppendLine($"Word is from dictionary: {word.IsFromDictionary}");
 result.AppendLine($"Word is numeric: {word.IsNumeric}");
 result.AppendLine($"Word language: {word.Language}");
 result.AppendLine($"Word text: {word.Text}");

 foreach (var symbol in word.Symbols)
 {
 result.AppendLine($"Symbol confidence: {symbol.Confidence}");
 if (symbol.BoundingBox != null)
 {
 var boundingBox = symbol.BoundingBox.Value;
 result.AppendLine($"Symbol bounding box X1 '{boundingBox.X1}', Y1 '{boundingBox.Y2}', X2 " +
 $"'{boundingBox.X2}', Y2 '{boundingBox.Y2}', width '{boundingBox.Width}', height '{boundingBox.Height}'");
 }
 result.AppendLine($"Symbol is superscript: {symbol.IsSuperscript}");
 result.AppendLine($"Symbol is dropcap: {symbol.IsDropcap}");
 result.AppendLine($"Symbol text: {symbol.Text}");
 }
 }
 }
 }
}

For more examples see https://github.com/Sicos1977/TesseractOCR/wiki/examples.md

Supported input formats

Tesseract uses the Leptonica library to read images with one of these formats:

  • PNG - requires libpng, libz
  • JPEG - requires libjpeg / libjpeg-turbo
  • TIFF - requires libtiff, libz
  • JPEG 2000 - requires libopenjp2
  • GIF - requires libgif (giflib)
  • WebP (including animated WebP) - requires libwebp
  • BMP - no library required* = PNM - no library required*
  • Except Leptonica

I have dropped support for the Windows.Drawing.Image namespace since this only works good on Windows and not on other systems. You should be fine with Leptonica

Logging

TesseractOCR uses the Microsoft ILogger interface (https://docs.microsoft.com/en-us/dotnet/api/microsoft.extensions.logging.ilogger?view=dotnet-plat-ext-5.0). You can use any logging library that uses this interface.

TesseractOCR has some build in loggers that can be found in the TesseractOCR.Logger namespace.

For example

var logger = !string.IsNullOrWhiteSpace(<some logfile>)
 ? new TesseractOCR.Loggers.Stream(File.OpenWrite(<some logfile>))
 : new TesseractOCR.Loggers.Console();

Installing via NuGet

👁 NuGet

The easiest way to install TesseractOCR is via NuGet.

In Visual Studio's Package Manager Console, simply enter the following command:

Install-Package TesseractOCR

License Information

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Core Team

Product Versions Compatible and additional computed target framework versions.
.NET net5.0 net5.0 was computed.  net5.0-windows net5.0-windows was computed.  net6.0 net6.0 was computed.  net6.0-android net6.0-android was computed.  net6.0-ios net6.0-ios was computed.  net6.0-maccatalyst net6.0-maccatalyst was computed.  net6.0-macos net6.0-macos was computed.  net6.0-tvos net6.0-tvos was computed.  net6.0-windows net6.0-windows was computed.  net7.0 net7.0 was computed.  net7.0-android net7.0-android was computed.  net7.0-ios net7.0-ios was computed.  net7.0-maccatalyst net7.0-maccatalyst was computed.  net7.0-macos net7.0-macos was computed.  net7.0-tvos net7.0-tvos was computed.  net7.0-windows net7.0-windows was computed.  net8.0 net8.0 was computed.  net8.0-android net8.0-android was computed.  net8.0-browser net8.0-browser was computed.  net8.0-ios net8.0-ios was computed.  net8.0-maccatalyst net8.0-maccatalyst was computed.  net8.0-macos net8.0-macos was computed.  net8.0-tvos net8.0-tvos was computed.  net8.0-windows net8.0-windows was computed.  net9.0 net9.0 was computed.  net9.0-android net9.0-android was computed.  net9.0-browser net9.0-browser was computed.  net9.0-ios net9.0-ios was computed.  net9.0-maccatalyst net9.0-maccatalyst was computed.  net9.0-macos net9.0-macos was computed.  net9.0-tvos net9.0-tvos was computed.  net9.0-windows net9.0-windows was computed.  net10.0 net10.0 was computed.  net10.0-android net10.0-android was computed.  net10.0-browser net10.0-browser was computed.  net10.0-ios net10.0-ios was computed.  net10.0-maccatalyst net10.0-maccatalyst was computed.  net10.0-macos net10.0-macos was computed.  net10.0-tvos net10.0-tvos was computed.  net10.0-windows net10.0-windows was computed. 
.NET Core netcoreapp2.0 netcoreapp2.0 was computed.  netcoreapp2.1 netcoreapp2.1 was computed.  netcoreapp2.2 netcoreapp2.2 was computed.  netcoreapp3.0 netcoreapp3.0 was computed.  netcoreapp3.1 netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 netstandard2.0 is compatible.  netstandard2.1 netstandard2.1 is compatible. 
.NET Framework net net is compatible.  net461 net461 was computed.  net462 net462 was computed.  net463 net463 was computed.  net47 net47 was computed.  net471 net471 was computed.  net472 net472 was computed.  net48 net48 was computed.  net481 net481 was computed. 
MonoAndroid monoandroid monoandroid was computed. 
MonoMac monomac monomac was computed. 
MonoTouch monotouch monotouch was computed. 
Tizen tizen40 tizen40 was computed.  tizen60 tizen60 was computed. 
Xamarin.iOS xamarinios xamarinios was computed. 
Xamarin.Mac xamarinmac xamarinmac was computed. 
Xamarin.TVOS xamarintvos xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (6)

Showing the top 5 NuGet packages that depend on TesseractOCR:

Package Downloads
NCPC.Documents

Usefull methods to work with Documents

Siliscrypt.PdfTools

Package Description

BdsCrmOCR

OCR que utiliza funcionalidades de Tesseract JS

Siliscrypt.Pdf

Package Description

Frank.SemanticKernel.Plugins.Ocr

Package Description

GitHub repositories (5)

Showing the top 5 popular GitHub repositories that depend on TesseractOCR:

Repository Stars
hanmin0822/MisakaTranslator
御坂翻译器—Galgame/文字游戏/漫画多语种实时机翻工具
umlx5h/LLPlayer
The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!
josdemmers/Diablo4Companion
A companion app and loot filter for Diablo IV to help you find your perfect gear affixes.
derekshreds/Snacks
Automated transcoding for your video and music libraries
Tentacule/PgsToSrt
PGS to Srt converter
Version Downloads Last Updated
5.5.2 21,620 3/5/2026
5.5.1 62,672 4/25/2025
5.5.0 4,158 4/13/2025
5.4.2 28,394 10/23/2024
5.3.5 203,178 8/15/2023
Loading failed

- Updated to Tesseract 5.5.1
- Updated nuget packages