VOOZH about

URL: https://dzone.com/articles/google-cloud-document-ai-basics

⇱ Google Cloud Document AI Basics


Related

  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Google Cloud Document AI Basics

Google Cloud Document AI Basics

This simple example shows how to use a custom extractor in Google's Doc AI to process W-2s and use a PDF as part of the context to Gemini.

By Apr. 30, 25 · Tutorial
Likes
Comment
Save
4.7K Views

Join the DZone community and get the full member experience.

Join For Free

Google Cloud’s Document AI (Doc AI) helps organizations automate the processing, extraction, and classification of massive amounts of documents. 

Doc AI has a lot of capabilities and use cases, and here are a few ways it can help organizations. They’re tailored towards the public sector since that’s the customers I help; however, these use cases also apply to private companies.

Doc AI Example Use Cases

Processing Applications

  • Automating the extraction of key data from applications such as services/benefits, driver’s licenses, and building permits.

Tax Document Processing

  • Extracting information from tax forms (W-2s, 1040s, etc.) for faster processing and auditing. We’ll focus on this example.

Healthcare Administration

  • Processing medical documents, such as medical records and insurance claims, for faster payment.

Unemployment

  • Streamline the process of collecting various documents, quickly adjudicate, and reduce the time it takes to process benefits.

Let’s Get Started!

In this blog post, we’ll review how to create a custom document extractor for W-2 forms, use the Doc AI API to extract information from a document, and pass the W-2 PDF to Gemini to summarize the document.

Create a Custom Processor

Rather than going over the steps to create a custom extractor in this blog post, you can reference the Document AI Workbench — Custom Document Extractor Google codelab. The codelab does an excellent job of showing you, step by step, how to easily create, train, test, validate, and deploy a custom processor using the Doc AI Workbench without writing any code.

Here’s what one of the W-2s looks like after you’ve labeled it in Doc AI Workbench. You can choose three different training methods with a custom extractor. I chose one that uses Gemini 1.5 Flash. The Gen AI training method requires about 50 documents for the best results. You can learn more about the training methods here.

Labeled W-2

You can view evaluation metrics and upload a document to test as well.

Evaluation metrics

Application Overview

Our application is very simple. You upload a W-2 PDF, Doc AI extracts the key items, Gemini 2.0 Flash summarizes the PDF, and the results are displayed as shown below. Rather than go through the entire application, I’ll just show the code on document extraction and summarization using Gemini Flash 2.0. I plan on sharing the entire code on GitHub soon.


Here’s the sample W-2 we’ll upload.

W-2 First Page

W-2 Second Page

Doc AI Code

Here’s the code for Doc AI and an explanation of what it does.

Python
from google.cloud import documentai
import os

def process_document(file):
 try:
 # Initialize Document AI client
 client = documentai.DocumentProcessorServiceClient()
 
 # Configure processor path
 LOCATION = 'us' # Format is 'us' or 'eu'
 PROJECT_ID = os.getenv('PROJECT_ID')
 PROCESSOR_ID = os.getenv('PROCESSOR_ID')
 
 if not PROJECT_ID or not PROCESSOR_ID:
 raise ValueError("PROJECT_ID and PROCESSOR_ID must be set in .env file")
 
 PROCESSOR_PATH = f"projects/{PROJECT_ID}/locations/{LOCATION}/processors/{PROCESSOR_ID}"
 print(f"Using processor path: {PROCESSOR_PATH}")
 
 # Read file content
 file_content = file.read()
 print(f"Read file content, size: {len(file_content)} bytes")
 
 # Configure the process request
 raw_document = documentai.RawDocument(
 content=file_content,
 mime_type="application/pdf"
 )
 
 # Process the document
 request = documentai.ProcessRequest(
 name=PROCESSOR_PATH,
 raw_document=raw_document
 )
 
 print("Sending request to Doc AI...")
 result = client.process_document(request=request)
 print("Received response from Doc AI")
 
 document = result.document
 
 # Extract entities from the processed document
 extracted_data = {}
 for entity in document.entities:
 extracted_data[entity.type_] = entity.mention_text
 
 print(f"Extracted {len(extracted_data)} entities")
 return extracted_data
 
 except Exception as e:
 print(f"Error in process_document: {str(e)}")
        raise


  1. Import libraries: Import the Doc AI library.
  2. Doc AI processor: Get the Doc AI processor information from the workbench.
  3. Read and configure file: Read the file into the file_content variable. Load the PDF into raw_document variable so that Doc AI can scan it.
  4. Process document: Send the document to Doc AI. Save the results to the document variable.
  5. Extract key data: The extracted_data variable is a dictionary. It gets the entities in the document and returns them.

Here’s the final output.

Doc AI Output

Summarize PDF Using Gemini

I’m using the Gemini Flash 2.0 model to create a summary of the W-2.

Python
import google.generativeai as genai
import os

def get_summary(file):

 api_key = os.getenv('GEMINI_API_KEY')
 genai.configure(api_key=api_key)

 
 sample_pdf = genai.upload_file(path="PDF Path", display_name="file")

 model = genai.GenerativeModel(model_name="gemini-2.0-flash")
 
 response = model.generate_content(
 contents=[sample_pdf, "Give me a summary of this pdf file." ]
 )
 print(response.text)

    return response.text


The code is really simple. One of the things I love about Gemini 2.0 is that you can give it a PDF or a TXT directly in the prompt request or even provide multimodal prompts. There’s no need for me to build RAG or do other preprocessing. Simply put the PDF inside the model.generate_content prompt request as shown in the code above.

Here are the results of Gemini Flash 2.0.

Gemini Summarization

References

Here are some additional references:

AI Document PDF Google (verb)

Opinions expressed by DZone contributors are their own.

Related

  • Responsible AI Is an Engineering Problem, not a Policy Document
  • Boosting React.js Development Productivity With Google Code Assist
  • Mastering Gemma 4
  • Build a Smart AI Financial Advisor Using Google ADK

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

Let's be friends: