VOOZH about

URL: https://apify.com/calm_necessity/invoice-data-extractor

⇱ OCR Data Extractor – Invoice & Receipt OCR API [DEPRECATED] Β· Apify


πŸ‘ Invoice Data Extractor avatar

Invoice Data Extractor

Deprecated

Pricing

from $40.00 / 1,000 results

Go to Apify Store

Invoice Data Extractor

Deprecated

AI-powered Bill actor for extracting structured data from invoices, receipts, and documents. Upload an image to receive clean, structured data including vendor details, invoice numbers, line items, totals, and other key fields.

Pricing

from $40.00 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ Taher Ali Badnawarwala

Taher Ali Badnawarwala

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

2

Monthly active users

4 months ago

Last modified

Share

OCR Data Extractor Actor

An Apify Actor that extracts structured data from invoices, receipts, and documents using AI-powered OCR technology. Simply upload an image or provide a URL, and the Actor will extract key information like vendor details, invoice numbers, line items, and totals.

What This Tool Does

This Actor connects to the MultipleWords OCR API to extract structured data from document images. It accepts image files or URLs, processes them through advanced OCR technology, and returns comprehensive structured data including vendor information, customer details, billing information, and line items.

Key Features:

  • πŸ“„ Extract data from invoices, receipts, and documents
  • πŸš€ Fast and automated document processing
  • πŸ“¦ Structured output with all key fields extracted
  • πŸ”„ Reliable error handling and validation
  • πŸ“Š Complete extraction details with confidence scores
  • πŸ–ΌοΈ Multiple input methods (file upload)local

Purpose & Use Cases

This tool is designed to help businesses, accountants, and developers automate document data extraction:

Accounting & Finance

  • Automate invoice data entry into accounting systems
  • Extract receipt information for expense tracking
  • Process bulk invoices for payment processing
  • Digitize paper documents for archival

Business Operations

  • Streamline accounts payable workflows
  • Automate vendor information extraction
  • Process purchase orders and quotes
  • Extract data from shipping documents

E-commerce & Retail

  • Process supplier invoices automatically
  • Extract product details from purchase orders
  • Automate inventory documentation
  • Handle customer receipt processing

Development & Automation

  • Integrate OCR into automated workflows
  • Batch process documents programmatically
  • Create document processing pipelines
  • Build custom accounting integrations

Document Management

  • Digitize paper document archives
  • Extract searchable data from scanned documents
  • Automate document classification and filing
  • Create structured databases from unstructured documents

Input Parameters

The Actor accepts the following input:

file (Optional)

  • Type: String or File Upload
  • Description: Upload an image file (invoice, receipt, document) to extract data from. You can also provide a file path, URL, or base64 string.
  • Supported Formats: JPG, PNG, PDF (image-based)
  • Example: Upload via file picker in Apify Console

image_url (Optional)

  • Type: String
  • Description: URL of the image to process (alternative to file upload)
  • Example: "https://example.com/invoice.jpg"

Note: Either file or image_url must be provided. The user_id and isPro parameters are handled automatically with default values.

Output Structure

The Actor returns structured data containing the extracted document information:

{
"status":1,
"vendor_company":"Acme Corporation",
"vendor_email":"billing@acme.com",
"customer_name":"John Smith",
"customer_email":"john.smith@email.com",
"invoice_number":"INV-2024-001",
"issue_date":"2024-01-15",
"due_date":"2024-02-15",
"subtotal":"1000.00",
"total_tax":"100.00",
"grand_total":"1100.00",
"currency":"$",
"line_items_count":5,
"document_type":"invoice",
"extraction_confidence":"high",
"line_items":[
{
"description":"Product A",
"quantity":"2",
"unit_price":"250.00",
"amount":"500.00"
}
],
"full_details":{
"vendor_information":{ ... },
"customer_information":{ ... },
"billing_details":{ ... },
"totals_and_taxes":{ ... }
}
}

Output Fields Explained

  • status: Success indicator (1 = success)
  • vendor_company: Name of the vendor/seller company
  • vendor_email: Vendor's email address
  • customer_name: Customer/buyer name
  • customer_email: Customer's email address
  • invoice_number: Invoice or receipt number
  • issue_date: Date when the invoice was issued
  • due_date: Payment due date
  • subtotal: Subtotal amount before tax
  • total_tax: Total tax amount
  • grand_total: Final total amount
  • currency: Currency symbol or code
  • line_items_count: Number of line items/products
  • document_type: Type of document (invoice, receipt, etc.)
  • extraction_confidence: Overall confidence level of extraction
  • line_items: Array of individual items with quantities and prices
  • full_details: Complete raw data from the OCR API

How to Use

Running Locally

  1. Install dependencies:

    $npminstall
  2. Create input file at storage/key_value_stores/default/INPUT.json:

    {
    "image_url":"https://example.com/invoice.jpg"
    }
  3. Run the Actor:

    $apify run
  4. View results in storage/datasets/default/

Deploy to Apify Platform

  1. Login to Apify:

    $apify login
  2. Deploy the Actor:

    $apify push
  3. Run on Apify Console:

    • Go to Actors β†’ My Actors
    • Select your OCR Data Extractor Actor
    • Upload a document image or provide a URL
    • Click "Start" to extract data
    • View results in the Dataset tab

Using via API

Once deployed, you can call the Actor via Apify API:

curl-X POST "https://api.apify.com/v2/acts/YOUR_USERNAME~ocr-data-extractor/run-sync"\
-H"Authorization: Bearer YOUR_API_TOKEN"\
-H"Content-Type: application/json"\
-d'{
"image_url": "https://example.com/invoice.jpg"
}'

Integration Examples

With Make (Integromat)

  • Connect the Actor to your Make workflows
  • Automatically extract data when invoices are received via email
  • Send extracted data to accounting software or spreadsheets

With Zapier

  • Trigger OCR extraction from file uploads
  • Automatically add extracted data to Google Sheets or Airtable
  • Send notifications via Slack or email with extracted details

With Custom Applications

  • Integrate via Apify API into your web applications
  • Batch process documents for multiple clients
  • Create automated document processing workflows

Technical Details

  • Runtime: Node.js 18+
  • Dependencies: Apify SDK v3.5.2+
  • API Endpoint: http://shorts.multiplewords.com/mwvideos/api/image_data_extractor
  • Request Method: POST
  • Content Type: multipart/form-data
  • Mode: Batch processing (non-standby)

Error Handling

The Actor includes comprehensive error handling:

  • Validates input parameters before processing
  • Handles API errors gracefully with detailed messages
  • Provides informative error logs for debugging
  • Supports multiple input formats with fallback strategies
  • Returns appropriate exit codes for automation workflows

Best Practices

  1. Image Quality: Use high-resolution, clear images for best results
  2. File Formats: JPG and PNG work best; ensure PDFs are image-based
  3. Document Types: Works best with standard invoice/receipt layouts
  4. Batch Processing: For multiple documents, queue multiple Actor runs
  5. Error Recovery: Implement retry logic for failed extractions

Resources

Support

For issues, questions, or feature requests, please refer to the Apify documentation or community forums.


Built with ❀️ using Apify SDK

You might also like

India MCA Company Data Scraper - CIN, Directors & Charges

haketa/india-mca-scraper

India company data scraper & API (MCA / CIN lookup): enrich any Indian company by CIN and export name, status, incorporation date, RoC, category, authorized & paid-up capital, address, email, directors with DIN & registered charges. India KYC, due-diligence & B2B lead data β€” fast, no login.

AI Blueprint Analyzer: Floor Plan & Construction Data

ntriqpro/blueprint-intelligence

AI-powered architectural blueprint analyzer. Extract floor plans, rooms, dimensions, materials, walls, doors & structural elements from construction drawings. Built for architects, contractors, real estate pros. Batch up to 10 images. PDF & JPG. Structured JSON.

Indian Company Data β€” CIN, Directors & Financials

foxlabs/indian-company-data

Look up any Indian company by CIN and get directors, financials, balance sheet, paid-up capital, charges & registry details as clean JSON. MCA-sourced data via Tofler β€” for due diligence, KYC, lead enrichment & investor research.

20

Kleinanzeigen.de Scraper

haketa/kleinanzeigen-scraper

Kleinanzeigen scraper & API (Germany classifieds): search listings by keyword and category and export title, price, description, location, seller, attributes, photos, date and URL. German marketplace and second-hand market data plus seller lead gen β€” fast, no login.

NC Licensing Board for General Contractors Scraper

haketa/nc-licensing-board-for-general-contractors-scraper

North Carolina general contractor license scraper & API: search the NC Licensing Board and export license number, status, classification, company name, qualifier, address and limits. Contractor verification, compliance and B2B lead generation β€” fast, no login.

YallaMotor Scraper | GCC Used & New Cars Marketplace

haketa/yallamotor-scraper

YallaMotor scraper & API (GCC cars): search new and used cars and export make, model, year, price, mileage, specs, seller, location, photos and listing URL. Gulf automotive marketplace data and dealer lead generation β€” fast, no login.

Dropbox App Center Scraper (n8n Workflow Templates)

crawlerbros/dropbox-app-center-scraper

Scrapes the n8n workflow template library - a publicly accessible automation marketplace with 9,800+ workflow templates. Search by keyword, browse by category, or fetch by ID. Returns template metadata, creator info, node integrations, pricing and view counts.

AI OCR Text Extractor - High Precision Image-to-Text

mikolabs/ai-ocr-text-extractor-high-precision-image-to-text

It's a high-performance solution designed to extract text from images with exceptional accuracy. Powered by industrial-grade deep learning models, it transforms unstructured image dataβ€”such as invoices, receipts, screenshots, and handwritten notesβ€”into structured, searchable JSON data in seconds.