pytesseract 0.1.7
pip install pytesseract==0.1.7
Released:
Python-tesseract is a python wrapper for google's Tesseract-OCR
Navigation
Verified details
These details have been verified by PyPIMaintainers
π Avatar for madmaze from gravatar.commadmaze
Unverified details
These details have not been verified by PyPIProject links
Meta
- License: GPLv3
- Author: Matthias Lee
- Tags python-tesseract , OCR , Python
Classifiers
- Programming Language
Project description
Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and βreadβ the text embedded in images.
Python-tesseract is a wrapper for Googleβs Tesseract-OCR Engine. It is also useful as a stand-alone invocation script to tesseract, as it can read all image types supported by the Python Imaging Library, including jpeg, png, gif, bmp, tiff, and others, whereas tesseract-ocr by default only supports tiff and bmp. Additionally, if used as a script, Python-tesseract will print the recognized text in stead of writing it to a file. Support for confidence estimates and bounding box data is planned for future releases.
USAGE
try: import Imageexcept ImportError: from PIL import Imageimport pytesseractpytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'# Include the above line, if you don't have tesseract executable in your PATH# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'print(pytesseract.image_to_string(Image.open('test.png')))print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
Add the following config, if you have tessdata error like: βError opening data fileβ¦β
tessdata_dir_config = '--tessdata-dir "<replace_with_your_tessdata_dir_path>"'# Example config: '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'# It's important to add double quotes around the dir path.pytesseract.image_to_string(image, lang='chi_sim', config=tessdata_dir_config)
INSTALLATION
Prerequisites:
Python-tesseract requires python 2.5+ or python 3.x
You will need the Python Imaging Library (PIL) (or the Pillow fork). Under Debian/Ubuntu, this is the package python-imaging or python3-imaging.
Install Google Tesseract OCR (additional info how to install the engine on Linux, Mac OSX and Windows). You must be able to invoke the tesseract command as tesseract. If this isnβt the case, for example because tesseract isnβt in your PATH, you will have to change the βtesseract_cmdβ variable at the top of tesseract.py. Under Debian/Ubuntu you can use the package tesseract-ocr. For Mac OS users. please install homebrew package tesseract.
$(env)>pipinstallpytesseract
$>gitclonegit@github.com:madmaze/pytesseract.git$(env)>pythonsetup.pyinstall
LICENSE
Python-tesseract is released under the GPL v3.
CONTRIBUTERS
Originally written by Samuel Hoffstaetter
Project details
Verified details
These details have been verified by PyPIMaintainers
π Avatar for madmaze from gravatar.commadmaze
Unverified details
These details have not been verified by PyPIProject links
Meta
- License: GPLv3
- Author: Matthias Lee
- Tags python-tesseract , OCR , Python
Classifiers
- Programming Language
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pytesseract-0.1.7.tar.gz.
File metadata
- Download URL: pytesseract-0.1.7.tar.gz
- Upload date:
- Size: 150.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cf5f088349ec29ac4770658a5c150b48f93cbdbdc7405a51d778ee7e81a1426
|
|
| MD5 |
764bb4a0c7e9f85c0ee27beffb65065a
|
|
| BLAKE2b-256 |
292f7b73fa55f3c36160d54ec08b6bdddc1d8385fe0aa1acf8dac3ef1e635516
|
