👁 Text-to-JSON Structured Extractor avatar

Text-to-JSON Structured Extractor

Pricing

from $10.00 / 1,000 results

Text-to-JSON Structured Extractor

A versatile Apify actor that converts unstructured text and HTML into clean, structured JSON. Supports four extraction modes with auto-detection, URL fetching, and batch processing.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

👁 Jamshaid Arif

Jamshaid Arif

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

🎯 What It Does

Mode	Input	Output
Resume	Plain-text resume/CV	Contact info, experience, education, skills, certifications
E-Commerce	Product page HTML	Product name, price, brand, SKU, rating, images, availability
Blog SEO	Blog/article HTML	SEO score (A–F), meta tags, headings, links, content stats, recommendations
Chat Log	Chat exports (WhatsApp, Slack, Discord, IRC)	Messages, participants, topics, shared links, statistics
Auto	Any of the above	Detects the best extractor automatically

🚀 Quick Start

Minimal Input (uses defaults)

{
"extractionMode":"auto",
"inputType":"raw_text",
"rawInput":"Jane Smith\njane@email.com\n\nExperience\nEngineer at Google..."
}

Fetch from URLs

{
"extractionMode":"ecommerce",
"inputType":"urls",
"urls":[
"https://example.com/products/widget-pro",
"https://example.com/products/widget-lite"
],
"outputFormat":"compact",
"maxConcurrency":10
}

Blog SEO Audit

{
"extractionMode":"blog_seo",
"inputType":"urls",
"urls":["https://myblog.com/latest-post"],
"outputFormat":"full"
}

📥 Input Schema

Field	Type	Default	Description
`extractionMode`	enum	`"auto"`	`resume`, `ecommerce`, `blog_seo`, `chat_log`, or `auto`
`inputType`	enum	`"raw_text"`	`raw_text`, `urls`, or `key_value_store`
`rawInput`	string	(sample resume)	Direct text/HTML input
`urls`	string[]	`[]`	URLs to fetch content from
`kvStoreKeys`	string[]	`[]`	Keys to read from KV store
`chatLogFormat`	enum	`"auto"`	`auto`, `whatsapp`, `slack`, `discord`, `irc`, `generic`, `simple`
`outputFormat`	enum	`"full"`	`full`, `compact`, or `flat`
`includeSourceText`	boolean	`false`	Include original text in output
`maxConcurrency`	integer	`5`	Parallel URL fetches (1–20)
`proxyConfiguration`	object	Apify Proxy	Proxy settings for URL fetching
`requestTimeoutSecs`	integer	`30`	URL fetch timeout (5–120)

📤 Output Format

Each dataset record looks like:

{
"source":"raw_input",
"extraction_mode":"resume",
"output_format":"full",
"success":true,
"error":null,
"data":{ ... }
}

Output Modes

full — All extracted fields, deeply nested
compact — Key fields only (great for dashboards)
flat — Single-level dict with underscore-separated keys (great for spreadsheets)

🔍 Extraction Details

Resume Extractor

Detects and parses: name, email, phone, LinkedIn, GitHub, location, summary, work history with bullet points, education with GPA, categorized skills, projects, certifications, and languages.

E-Commerce Extractor

Three-priority pipeline: (1) JSON-LD Schema.org, (2) Open Graph meta tags, (3) HTML class-based parsing. Extracts product name, description, price with currency, brand, SKU, availability, rating, review count, and images.

Blog SEO Extractor

Produces a complete SEO audit with a score (0–100, grade A–F) based on 14 weighted checks. Analyzes title, meta description, Open Graph, Twitter Card, heading hierarchy, image alt text, internal/external links, structured data, content length, and more.

Chat Log Extractor

Auto-detects format from WhatsApp, Slack, Discord, IRC, and generic patterns. Builds participant profiles (message count, word average), extracts shared links with context, identifies topics via keyword frequency, and counts media messages.

🧪 Running Locally

# Install dependencies
pip install-r requirements.txt
# Run with Apify CLI
apify run --input-file=INPUT.json

📋 Example Output (Compact Resume)

{
"name":"John Doe",
"email":"johndoe@email.com",
"phone":"(555) 123-4567",
"location":"New York, NY",
"summary":"Full-stack developer with 5+ years...",
"skills":["Python","JavaScript","TypeScript","React","Django","AWS"],
"experience_count":2,
"education_count":1,
"certifications":["AWS Solutions Architect Associate","Certified Kubernetes Administrator"],
"languages":["English (Native)","French (Conversational)"]
}

👁 Universal Data Structure Converter avatar

Universal Data Structure Converter

moving_beacon-owner1/my-actor-63

A production-grade Apify actor that converts between HTML, XML, CSV, YAML, and JSON formats. Supports 9+ conversion types with smart auto-detection, nested JSON flattening, HTML table scraping, batch URL processing, and full customization.

👁 User avatar

Jamshaid Arif

👁 🔥 AI HTML to JSON Extractor (Fast, Free LLM for Data) avatar

🔥 AI HTML to JSON Extractor (Fast, Free LLM for Data)

autoscaler/ai-html-to-json-extractor

Eliminate messy HTML cleanup and high LLM costs. This Actor uses a high-speed, zero-cost large language model to turn unstructured content (HTML, text, reviews, blog posts) into valid, structured JSON.

👁 User avatar

Mooo

👁 Data Cleaning & Transformation Toolkit avatar

Data Cleaning & Transformation Toolkit

moving_beacon-owner1/my-actor-66

A powerful, multi-mode Apify actor that transforms messy, unstructured data into clean, structured JSON — ready for APIs, databases, or downstream processing.

👁 User avatar

Jamshaid Arif

👁 Pdf To Text Scraper avatar

Pdf To Text Scraper

getdataforme/pdf-to-text-scraper

The Pdf To Text Scraper is an Apify Actor that efficiently extracts text from PDFs, preserving structure and supporting batch processing....

👁 User avatar

GetDataForMe

👁 Duckduckgo Search Scraper avatar

Duckduckgo Search Scraper

runtime/duckduckgo-search-scraper

Extract search results from DuckDuckGo with multiple search modes (web, images, news, videos). Supports batch processing, auto-pagination, SEO data extraction, and proxy configuration. Clean, structured output format with anti-detection measures.

👁 User avatar

scraping automation

PDF Text Extractor API - URL to Text, Per-Page, Batch

gratifying_graph/pdf-extract-api

Turn any public PDF URL into clean text and metadata. Per-page output, batch processing, and a synchronous API mode for AI agents. Pay per page extracted, cheaper than the alternatives.

👁 User avatar

Jimmy A

👁 Image To Text Ai avatar

Image To Text Ai

welcoming_fireplace/image-to-text-ai

A powerful OCR tool that goes beyond standard text extraction. Powered by a Premium Vision AI model, it accurately reads handwriting, preserves table structures, and converts messy receipts or documents into structured JSON or Markdown. Supports batch processing for high-volume workflows.

👁 User avatar

Richmond Nkrumah

👁 Website Content Text Extractor avatar

Website Content Text Extractor

smart-digital/website-content-text-extractor

Extract visible text content from websites as structured JSON blocks. Supports multi-URL batch processing, header/footer/cookie exclusion, and optional form extraction. Perfect for content analysis and translation workflows.

My Smart Digital

5.0

👁 HTML to JSON Smart Parser avatar

HTML to JSON Smart Parser

parseforge/html-to-json-smart-parser

Convert HTML to structured JSON using AI! Uses OpenAI to extract and structure data from HTML into clean JSON format. Perfect for developers and data analysts who need to transform HTML into structured data without manual parsing.

👁 User avatar

ParseForge

5.0

👁 SmartSchema Extract — Text to JSON with AI avatar

SmartSchema Extract — Text to JSON with AI

olican/smartschema-extract

Convert any unstructured text into validated JSON using Google Gemini. Define your JSON Schema per request. Perfect for invoice parsing, web scraping, email extraction, and ETL pipelines.

👁 User avatar

Sergio Calvo

5.0

👁 Blog article image

The definitive guide to text scraping

URL: https://apify.com/moving_beacon-owner1/my-actor-68