VOOZH about

URL: https://apify.com/acme-ai/ocr-tax-document-ai

โ‡ฑ AI OCR for Tax Documents: Invoices, Balance Sheets & Tables ยท Apify


๐Ÿ‘ AI OCR for Tax Documents: Invoices, Balance Sheets & Tables avatar

AI OCR for Tax Documents: Invoices, Balance Sheets & Tables

Pricing

from $450.00 / 1,000 tax document extracteds

Go to Apify Store

AI OCR for Tax Documents: Invoices, Balance Sheets & Tables

Extract structured data from invoices, receipts, balance sheets and tabular PDFs with AI. Returns issuer, dates, totals, taxes and tables as JSON. Upload a file or pass URLs; batch or real-time API.

Pricing

from $450.00 / 1,000 tax document extracteds

Rating

0.0

(0)

Developer

๐Ÿ‘ Acme AI

Acme AI

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

7 days ago

Last modified

Share

๐Ÿงพ AI OCR for Tax Documents (Invoices, Balance Sheets & Tables)

Turn invoices, receipts, balance sheets, bank statements and tabular PDFs into clean, structured JSON with AI. Upload a file or pass document URLs, and get back the document type, issuer/recipient, dates, totals, taxes, line-item tables and a summary - ready for your accounting system or spreadsheet.

๐ŸŽฏ Built for Tax & accounting teams. Not a generic text dump: the AI detects the document type and extracts the fields that matter, plus the tables, preserving layout meaning.


What you get (per document)

FieldDescription
documentTypeinvoice, receipt, balance_sheet, income_statement, bank_statement, purchase_order, table, other
issuerName / issuerTaxIdVendor/company and tax ID (VAT, CNPJ, EIN...)
recipientName / recipientTaxIdBuyer/customer and tax ID
documentNumber, issueDate, dueDateDocument identification
currency, subtotal, taxAmount, totalAmountMonetary fields (plain numbers)
tables[]Extracted tables (line items, balances...) with columns + rows
keyValuesAny other labelled fields (payment terms, account no., period...)
summaryOne-line description
fileMetadatatype, sizeBytes, pageCount (PDF)

How to use

Upload a file in the input, or pass URLs for batch:

{
"documentUrls":[
"https://example.com/invoice.pdf",
"https://example.com/receipt.jpg"
]
}

Supports PDF, PNG, JPG and WebP. Up to 50 documents per run (send larger volumes via sequential calls). PDFs are read natively (multi-page); images are auto-optimized before analysis.


Pricing

Charged per document successfully extracted (event tax-document-extracted). Documents that fail to download or can't be read are not charged.


Example output

[
{
"documentUrl":"https://example.com/invoice.pdf",
"success":true,
"documentType":"invoice",
"issuerName":"ACME Ltda",
"issuerTaxId":"12345678000190",
"recipientName":"Globex Inc",
"documentNumber":"INV-2024-001",
"issueDate":"2024-03-15",
"dueDate":"2024-04-15",
"currency":"USD",
"subtotal":1100.0,
"taxAmount":150.0,
"totalAmount":1250.0,
"tables":[
{"title":"Line items","columns":["description","qty","unitPrice","total"],
"rows":[{"description":"Consulting","qty":10,"unitPrice":110,"total":1100}]}
],
"keyValues":{"paymentTerms":"Net 30"},
"summary":"Invoice from ACME Ltda to Globex Inc, total USD 1250.",
"fileMetadata":{"type":"pdf","sizeBytes":84210,"pageCount":1},
"failureReason":null,
"processedAt":"2026-01-01T12:00:00.000Z",
"error":null
}
]

FAQ

Which documents work best? Clear digital PDFs and sharp scans/photos. Very low-resolution or handwritten documents may not be readable - the reason is reported in failureReason.

Does it handle multi-page PDFs? Yes. PDFs are read natively, including tables and layout, across pages.

Can I upload a file directly? Yes - use the upload field in the input, or call the API with a document URL.

Can I call it in real time? Yes. The Standby endpoint POST /extract responds synchronously. See below.


๐Ÿ”Œ API integration

Batch run:

curl-X POST "https://api.apify.com/v2/acts/acme-ai~ocr-tax-document-ai/run-sync-get-dataset-items?token=YOUR_TOKEN"\
-H"Content-Type: application/json"\
-d'{"documentUrls":["https://example.com/invoice.pdf","https://example.com/receipt.jpg"]}'

Standby (POST /extract):

curl-X POST "https://acme-ai--ocr-tax-document-ai.apify.actor/extract"\
-H"Authorization: Bearer YOUR_APIFY_TOKEN"\
-H"Content-Type: application/json"\
--compressed\
-d'{"documentUrls":["https://example.com/invoice.pdf","https://example.com/receipt.jpg"]}'

The token goes in the Authorization: Bearer header, never in the URL.


Notes

This Actor analyzes documents you provide. You are responsible for having the right to process them and any personal or financial data they may contain.

You might also like

OCR Structured Extractor (AI) โ€” Image/PDF โ†’ OCR Text + JSON

macheta/ocr-structured-extractor

Extract OCR text and structured JSON from an image or PDF URL. Great for invoices, receipts, forms, IDs, and tables. Powered by Gemini 3 Pro.

Vision OCR MCP

accelerationengg/vision-ocr-mcp

Extract text from images instantly. Turn receipts, invoices, documents, and handwritten notes into structured data.

13

5.0

PDF to Markdown Converter - AI-Powered with OCR & Tables

clearpath/pdf-to-markdown-api

Convert PDFs to clean Markdown with GPU-accelerated AI. Extracts tables, LaTeX formulas, and images from complex layouts. Supports OCR for scanned docs in 8 languages. Batch process hundreds of PDFs in parallel via URL, upload, or API.

Invoice Data Extractor

calm_necessity/invoice-data-extractor

AI-powered Bill actor for extracting structured data from invoices, receipts, and documents. Upload an image to receive clean, structured data including vendor details, invoice numbers, line items, totals, and other key fields.

๐Ÿ‘ User avatar

Taher Ali Badnawarwala

2

PDF OCR API - Document Extraction

alizarin_refrigerator-owner/pdf-ocr-api

Extract text from PDFs including scanned documents. OCR processing, table extraction & structured data output. Process invoices, contracts & forms at scale.

Bulk Pdf To Json OCR

gagandeo/bulk-pdf-to-json-ocr

Convert PDF invoices, menus, images with text and documents into structured JSON. Features hybrid Digital+OCR parsing and AI-powered data extraction.

๐Ÿ‘ User avatar

Kumar Gagandeo

6

Pdf to json

shahabuddin38/pdf-to-json

Convert PDF files into structured JSON with optional OCR, table extraction, key-value detection, and metadata parsing. Ideal for invoices, receipts, contracts, statements, forms, and document automation workflows. Supports digital and scanned PDFs for API-ready data extraction.

10

Pdf Json Extractor

p6t_p10n/pdf-json-extractor

Convert any PDF into structured JSON using AI and OCR (Tesseract or Google Vision). Supports custom schemas, validation, and auto-repair. Ideal for invoices, contracts, receipts, and automation workflows. Fast, accurate, and easy to integrate.

๐Ÿ‘ User avatar

Peerapat Pongnipakorn

2