![]() |
VOOZH | about |
dotnet add package TesseractOCR --version 5.5.2
NuGet\Install-Package TesseractOCR -Version 5.5.2
<PackageReference Include="TesseractOCR" Version="5.5.2" />
<PackageVersion Include="TesseractOCR" Version="5.5.2" />Directory.Packages.props
<PackageReference Include="TesseractOCR" />Project file
paket add TesseractOCR --version 5.5.2
#r "nuget: TesseractOCR, 5.5.2"
#:package TesseractOCR@5.5.2
#addin nuget:?package=TesseractOCR&version=5.5.2Install as a Cake Addin
#tool nuget:?package=TesseractOCR&version=5.5.2Install as a Cake Tool
It is a .NET wrapper for Tesseract 5.5.0 that is originally copied from Charles Weld (https://github.com/charlesw/tesseract) and modified for my own needs
You need trained data in tessdata by language You can get them at https://github.com/tesseract-ocr/tessdata or https://github.com/tesseract-ocr/tessdata_fast
The DLL's Tesseract54.dll (and exe) and leptonica-1.85.0.dll are compiled with Visual Studio 2022 you need these C++ runtimes for it on your computer
See this wiki for more information https://github.com/Sicos1977/TesseractOCR/wiki/How-to-use-in-Docker-on-Linux
using var engine = new Engine(@"./tessdata", Language.English, EngineMode.Default);
using var img = TesseractOCR.Pix.Image.LoadFromFile(testImagePath);
using var page = engine.Process(img);
Console.WriteLine("Mean confidence: {0}", page.MeanConfidence);
Console.WriteLine("Text: \r\n{0}", page.Text);
using var engine = new Engine(@"./tessdata", Language.English, EngineMode.Default);
using var img = Pix.Image.LoadFromFile(testImagePath);
using var page = engine.Process(img);
var result = new StringBuilder();
foreach (var block in page.Layout)
{
result.AppendLine($"Block confidence: {block.Confidence}");
if (block.BoundingBox != null)
{
var boundingBox = block.BoundingBox.Value;
result.AppendLine($"Block bounding box X1 '{boundingBox.X1}', Y1 '{boundingBox.Y2}', X2 " +
$"'{boundingBox.X2}', Y2 '{boundingBox.Y2}', width '{boundingBox.Width}', height '{boundingBox.Height}'");
}
result.AppendLine($"Block text: {block.Text}");
foreach (var paragraph in block.Paragraphs)
{
result.AppendLine($"Paragraph confidence: {paragraph.Confidence}");
if (paragraph.BoundingBox != null)
{
var boundingBox = paragraph.BoundingBox.Value;
result.AppendLine($"Paragraph bounding box X1 '{boundingBox.X1}', Y1 '{boundingBox.Y2}', X2 " +
$"'{boundingBox.X2}', Y2 '{boundingBox.Y2}', width '{boundingBox.Width}', height '{boundingBox.Height}'");
}
var info = paragraph.Info;
result.AppendLine($"Paragraph info justification: {info.Justification}");
result.AppendLine($"Paragraph info is list item: {info.IsListItem}");
result.AppendLine($"Paragraph info is crown: {info.IsCrown}");
result.AppendLine($"Paragraph info first line ident: {info.FirstLineIdent}");
result.AppendLine($"Paragraph text: {paragraph.Text}");
foreach (var textLine in paragraph.TextLines)
{
if (textLine.BoundingBox != null)
{
var boundingBox = textLine.BoundingBox.Value;
result.AppendLine($"Text line bounding box X1 '{boundingBox.X1}', Y1 '{boundingBox.Y2}', X2 " +
$"'{boundingBox.X2}', Y2 '{boundingBox.Y2}', width '{boundingBox.Width}', height '{boundingBox.Height}'");
}
result.AppendLine($"Text line confidence: {textLine.Confidence}");
result.AppendLine($"Text line text: {textLine.Text}");
foreach (var word in textLine.Words)
{
result.AppendLine($"Word confidence: {word.Confidence}");
if (word.BoundingBox != null)
{
var boundingBox = word.BoundingBox.Value;
result.AppendLine($"Word bounding box X1 '{boundingBox.X1}', Y1 '{boundingBox.Y2}', X2 " +
$"'{boundingBox.X2}', Y2 '{boundingBox.Y2}', width '{boundingBox.Width}', height '{boundingBox.Height}'");
}
result.AppendLine($"Word is from dictionary: {word.IsFromDictionary}");
result.AppendLine($"Word is numeric: {word.IsNumeric}");
result.AppendLine($"Word language: {word.Language}");
result.AppendLine($"Word text: {word.Text}");
foreach (var symbol in word.Symbols)
{
result.AppendLine($"Symbol confidence: {symbol.Confidence}");
if (symbol.BoundingBox != null)
{
var boundingBox = symbol.BoundingBox.Value;
result.AppendLine($"Symbol bounding box X1 '{boundingBox.X1}', Y1 '{boundingBox.Y2}', X2 " +
$"'{boundingBox.X2}', Y2 '{boundingBox.Y2}', width '{boundingBox.Width}', height '{boundingBox.Height}'");
}
result.AppendLine($"Symbol is superscript: {symbol.IsSuperscript}");
result.AppendLine($"Symbol is dropcap: {symbol.IsDropcap}");
result.AppendLine($"Symbol text: {symbol.Text}");
}
}
}
}
}
For more examples see https://github.com/Sicos1977/TesseractOCR/wiki/examples.md
Tesseract uses the Leptonica library to read images with one of these formats:
I have dropped support for the Windows.Drawing.Image namespace since this only works good on Windows and not on other systems. You should be fine with Leptonica
TesseractOCR uses the Microsoft ILogger interface (https://docs.microsoft.com/en-us/dotnet/api/microsoft.extensions.logging.ilogger?view=dotnet-plat-ext-5.0). You can use any logging library that uses this interface.
TesseractOCR has some build in loggers that can be found in the TesseractOCR.Logger namespace.
For example
var logger = !string.IsNullOrWhiteSpace(<some logfile>)
? new TesseractOCR.Loggers.Stream(File.OpenWrite(<some logfile>))
: new TesseractOCR.Loggers.Console();
The easiest way to install TesseractOCR is via NuGet.
In Visual Studio's Package Manager Console, simply enter the following command:
Install-Package TesseractOCR
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 net5.0 was computed. net5.0-windows net5.0-windows was computed. net6.0 net6.0 was computed. net6.0-android net6.0-android was computed. net6.0-ios net6.0-ios was computed. net6.0-maccatalyst net6.0-maccatalyst was computed. net6.0-macos net6.0-macos was computed. net6.0-tvos net6.0-tvos was computed. net6.0-windows net6.0-windows was computed. net7.0 net7.0 was computed. net7.0-android net7.0-android was computed. net7.0-ios net7.0-ios was computed. net7.0-maccatalyst net7.0-maccatalyst was computed. net7.0-macos net7.0-macos was computed. net7.0-tvos net7.0-tvos was computed. net7.0-windows net7.0-windows was computed. net8.0 net8.0 was computed. net8.0-android net8.0-android was computed. net8.0-browser net8.0-browser was computed. net8.0-ios net8.0-ios was computed. net8.0-maccatalyst net8.0-maccatalyst was computed. net8.0-macos net8.0-macos was computed. net8.0-tvos net8.0-tvos was computed. net8.0-windows net8.0-windows was computed. net9.0 net9.0 was computed. net9.0-android net9.0-android was computed. net9.0-browser net9.0-browser was computed. net9.0-ios net9.0-ios was computed. net9.0-maccatalyst net9.0-maccatalyst was computed. net9.0-macos net9.0-macos was computed. net9.0-tvos net9.0-tvos was computed. net9.0-windows net9.0-windows was computed. net10.0 net10.0 was computed. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
| .NET Core | netcoreapp2.0 netcoreapp2.0 was computed. netcoreapp2.1 netcoreapp2.1 was computed. netcoreapp2.2 netcoreapp2.2 was computed. netcoreapp3.0 netcoreapp3.0 was computed. netcoreapp3.1 netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.0 netstandard2.0 is compatible. netstandard2.1 netstandard2.1 is compatible. |
| .NET Framework | net net is compatible. net461 net461 was computed. net462 net462 was computed. net463 net463 was computed. net47 net47 was computed. net471 net471 was computed. net472 net472 was computed. net48 net48 was computed. net481 net481 was computed. |
| MonoAndroid | monoandroid monoandroid was computed. |
| MonoMac | monomac monomac was computed. |
| MonoTouch | monotouch monotouch was computed. |
| Tizen | tizen40 tizen40 was computed. tizen60 tizen60 was computed. |
| Xamarin.iOS | xamarinios xamarinios was computed. |
| Xamarin.Mac | xamarinmac xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos xamarinwatchos was computed. |
Showing the top 5 NuGet packages that depend on TesseractOCR:
| Package | Downloads |
|---|---|
|
NCPC.Documents
Usefull methods to work with Documents |
|
|
Siliscrypt.PdfTools
Package Description |
|
|
BdsCrmOCR
OCR que utiliza funcionalidades de Tesseract JS |
|
|
Siliscrypt.Pdf
Package Description |
|
|
Frank.SemanticKernel.Plugins.Ocr
Package Description |
Showing the top 5 popular GitHub repositories that depend on TesseractOCR:
| Repository | Stars |
|---|---|
|
hanmin0822/MisakaTranslator
御坂翻译器—Galgame/文字游戏/漫画多语种实时机翻工具
|
|
|
umlx5h/LLPlayer
The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!
|
|
|
josdemmers/Diablo4Companion
A companion app and loot filter for Diablo IV to help you find your perfect gear affixes.
|
|
|
derekshreds/Snacks
Automated transcoding for your video and music libraries
|
|
|
Tentacule/PgsToSrt
PGS to Srt converter
|
- Updated to Tesseract 5.5.1
- Updated nuget packages