VOOZH about

URL: https://www.analyticsvidhya.com/blog/2022/09/extract-text-from-images-quickly-using-keras-ocr-pipeline/

⇱ Extract Text from Images Quickly Using Keras-OCR Pipeline


Reading list

Extract Text from Images Quickly Using Keras-OCR Pipeline

Devashree Last Updated : 08 Nov, 2024
6 min read

Although plenty of digital information is available for consumption by businesses, employees still have to handle printed invoices, flyers, brochures, and forms in hard copies or textual images saved in .jpg,.png, or .pdf formats. Handling such data manually in these files is tedious, time-consuming, and prone to manual errors. Such files cannot be edited directly, and there is a need to make them editable first or have a tool that can read the content from the image and extract it for further processing. We all must have used online or offline tools to convert images to editable text formats to make things easier. This is possible using OCR or Optical Character Recognition.

This article was published as a part of the Data Science Blogathon.

What is Keras OCR?

The acronym β€˜OCR’ stands for Optical Character Recognition. Commonly known as β€˜Text Recognition,’ it is a popular technique for extracting text from images. An OCR program is a tool that extracts and re-purposes data from scanned documents, camera images, and image-only pdf. An OCR system uses a combination of hardware, such as optical scanners and software capable of image processing. For text extraction, the OCR tools (OCR libraries) employ several machine algorithms for pattern recognition to identify the presence and layout of the text in an image file.

These tools are trained to identify the shapes of characters or numbers on an image to recognize the text in the image. Later these can reconstruct the extracted text in a machine-readable format. Due to this, the extracted text can be selected, edited, or copy-pasted like regular text. In a simpler sense, OCR converts digital data in image format into editable word processing documents. Thankfully, many free and commercial tools (offline and online) allow OCR technology to extract text from images.

Currently, OCR tools are pretty advanced due to the implementation of techniques such as intelligent character recognition (ICR), which can identify languages, handwriting styles, etc.

In this article, we will discuss OCR, the benefits of OCR, why we need text extraction from documents, OCR libraries available in Python, and an example of text extraction from an image using the Keras-OCR library in Python.

Why do we need to extract Text from Images?

As mentioned in the above section, the primary benefit of OCR technology is that it automates manual and time-consuming data entry tasks. This is because by using OCR, we can create digital documents that can be edited and stored per requirements. An OCR tool processes the image to identify the text and creates a hidden layer of text behind the image. This additional layer can be easily read by a computer, thus making the image recognizable and searchable. This is crucial for businesses as they have to deal with media and content daily. OCR also offers the following benefits –

  • Automated, faster processing and conversion of paper-based documents into digital formats that accelerate workflows
  • It saves time and reduces the scope of manual errors
  • Eliminates the requirement for manual data entry
  • Reduced manual data entry indicates reduced overall costs for the business
  • It saves paper and storage space as more data can be converted to electronic format

A typical example of an OCR application can be seen in medical insurance claim form processing. With OCR, it is easier to compare the insurance claim with the policyholder’s details. OCR-equipped systems can flag any anomalies in the data to the concerned teams and prevent possible fraud.

Even though OCR can easily extract text from images, it sometimes faces challenges. This happens when the text is available in images representing natural environments, geometrical distortions, too much noise or cluttered and complex backgrounds, and different fonts other than the regular ones. Still, the OCR technology has an increasingly strong potential in deep learning applications to build tools for reading license plates on vehicles, digitizing invoices or menus, scanning ID cards, comparing claim forms, and so on.

Available Python OCR Libraries

Now that we have understood OCR and its use let us look at some commonly used open-source Python libraries for text recognition and extraction.

  1. Pytesseract – Also called β€˜Python-tesseract,’ it is an OCR tool for Python that works as a wrapper for the Tesseract-OCR Engine. This library can read all image types (.jpeg,.png, .gif, .bmp, .tiff, etc.) and recognize text in images. Hence, it is commonly seen in use cases for OCR image-to-text conversion.
  2. EasyOCR – Another quite popular Python library is EasyOCR. As the name suggests, the library is designed for beginners and is easy to use. It is a general OCR Python module that supports more than 80 languages and can read natural scene and dense text in documents. Once installed, users need to initialize only two classes – one reader() and another through the readtext() function for reading the text in the image.
  3. Keras-OCR – This is an equally powerful open-source library like the two libraries mentioned above. The Keras-OCR library provides a high-level API and end-to-end training pipeline to build new OCR models. In the next section, we will see a step-by-step tutorial using the Keras-OCR to extract text from multiple images. You can find the documentation here.

Keras OCR Demo

In this section, we will build a Keras-OCR pipeline to extract text from a few sample images. I am using Google Colab for this tutorial.

Let’s begin by installing the keras-ocr library (supports Python >= 3.6 and TensorFlow >= 2.0.0) using the following code –

!pip install -q keras-ocr

You can also use the following command to install the package from the master location.

pip install git+https://github.com/faustomorales/keras-ocr.git#egg=keras-ocr

We must import matplotlib and the newly-installed Keras-ocr library to process the images and extract text from them.

import keras_ocr
import matplotlib.pyplot as plt

Let’s set up a pipeline with Keras-ocr. The model is a pre-trained text extraction model loaded with pre-trained weights for the detector and recognizer.

pipeline = keras_ocr.pipeline.Pipeline()

We will use two images to test the capabilities of the Keras-ocr library. You can try the same with any other image with text of your choice.

# Read images from folder path to image object
images = [
 keras_ocr.tools.read(img) for img in ['/content/Image1.png',
 '/content/Image2.png',]
]

Here are the two images we used for this tutorial on the Keras-ocr library. One is a plain image with text using handwriting style font, and the other is an image containing text.

Now, let us run the pipeline recognizer on images and make predictions about the text in these images.

# generate text predictions from the images
prediction_groups = pipeline.recognize(images)

We can plot the predictions from the model using the following code –

# plot the text predictions
fig, axs = plt.subplots(nrows=len(images), figsize=(10, 20))
for ax, image, predictions in zip(axs, images, prediction_groups):
 keras_ocr.tools.drawAnnotations(image=image, 
 predictions=predictions, 
 ax=ax)

We get the predicted output as –

The Keras-OCR library performed well on both images. It was able to correctly identify the text’s location and extract the words from the input images.

We can also print the identified text from the images as –

predicted_image = prediction_groups[1]
for text, box in predicted_image:
 print(text)

If required, the above-recognized text from the above images can be converted to .csv or .txt format for further use.

Conclusion

In this tutorial, we discussed OCR, its advantages to businesses for image processing, and different open-source OCR libraries in Python. Next, we learned how to extract text from multiple images using the Keras-OCR library. Here are a few key takeaways from the article-

  • OCR has made it easier to process images with text and convert them to editable documents.
  • It can reduce manual data entry work, accelerating business workflows.
  • Several open-source and commercial tools employ OCR technology to process images and documents faster.
  • Keras-OCR is a highly accurate library for extracting text with just a few lines of code. It is a good option for open-source image text extraction projects.

That’s it for this tutorial. Try the Keras-ocr library to see how accurately it can identify the text in your images.

I hope you enjoyed reading this article and learned about Keras-ocr. The code for this text extraction tutorial is available on my GitHub repository.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Devashree has an M.Eng degree in Information Technology from Germany and a Data Science background. As an Engineer, she enjoys working with numbers and uncovering hidden insights in diverse datasets from different sectors to build beautiful visualizations to try and solve interesting real-world machine learning problems.

In her spare time, she loves to cook, read & write, discover new Python-Machine Learning libraries or participate in coding competitions.

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner