VOOZH about

URL: https://apify.com/moving_beacon-owner1/my-actor-68

โ‡ฑ Text-to-JSON Structured Extractor ยท Apify


Pricing

from $10.00 / 1,000 results

Go to Apify Store

Text-to-JSON Structured Extractor

A versatile Apify actor that converts unstructured text and HTML into clean, structured JSON. Supports four extraction modes with auto-detection, URL fetching, and batch processing.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Jamshaid Arif

Jamshaid Arif

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 months ago

Last modified

Share

A versatile Apify actor that converts unstructured text and HTML into clean, structured JSON. Supports four extraction modes with auto-detection, URL fetching, and batch processing.


๐ŸŽฏ What It Does

ModeInputOutput
ResumePlain-text resume/CVContact info, experience, education, skills, certifications
E-CommerceProduct page HTMLProduct name, price, brand, SKU, rating, images, availability
Blog SEOBlog/article HTMLSEO score (Aโ€“F), meta tags, headings, links, content stats, recommendations
Chat LogChat exports (WhatsApp, Slack, Discord, IRC)Messages, participants, topics, shared links, statistics
AutoAny of the aboveDetects the best extractor automatically

๐Ÿš€ Quick Start

Minimal Input (uses defaults)

{
"extractionMode":"auto",
"inputType":"raw_text",
"rawInput":"Jane Smith\njane@email.com\n\nExperience\nEngineer at Google..."
}

Fetch from URLs

{
"extractionMode":"ecommerce",
"inputType":"urls",
"urls":[
"https://example.com/products/widget-pro",
"https://example.com/products/widget-lite"
],
"outputFormat":"compact",
"maxConcurrency":10
}

Blog SEO Audit

{
"extractionMode":"blog_seo",
"inputType":"urls",
"urls":["https://myblog.com/latest-post"],
"outputFormat":"full"
}

๐Ÿ“ฅ Input Schema

FieldTypeDefaultDescription
extractionModeenum"auto"resume, ecommerce, blog_seo, chat_log, or auto
inputTypeenum"raw_text"raw_text, urls, or key_value_store
rawInputstring(sample resume)Direct text/HTML input
urlsstring[][]URLs to fetch content from
kvStoreKeysstring[][]Keys to read from KV store
chatLogFormatenum"auto"auto, whatsapp, slack, discord, irc, generic, simple
outputFormatenum"full"full, compact, or flat
includeSourceTextbooleanfalseInclude original text in output
maxConcurrencyinteger5Parallel URL fetches (1โ€“20)
proxyConfigurationobjectApify ProxyProxy settings for URL fetching
requestTimeoutSecsinteger30URL fetch timeout (5โ€“120)

๐Ÿ“ค Output Format

Each dataset record looks like:

{
"source":"raw_input",
"extraction_mode":"resume",
"output_format":"full",
"success":true,
"error":null,
"data":{ ... }
}

Output Modes

  • full โ€” All extracted fields, deeply nested
  • compact โ€” Key fields only (great for dashboards)
  • flat โ€” Single-level dict with underscore-separated keys (great for spreadsheets)

๐Ÿ” Extraction Details

Resume Extractor

Detects and parses: name, email, phone, LinkedIn, GitHub, location, summary, work history with bullet points, education with GPA, categorized skills, projects, certifications, and languages.

E-Commerce Extractor

Three-priority pipeline: (1) JSON-LD Schema.org, (2) Open Graph meta tags, (3) HTML class-based parsing. Extracts product name, description, price with currency, brand, SKU, availability, rating, review count, and images.

Blog SEO Extractor

Produces a complete SEO audit with a score (0โ€“100, grade Aโ€“F) based on 14 weighted checks. Analyzes title, meta description, Open Graph, Twitter Card, heading hierarchy, image alt text, internal/external links, structured data, content length, and more.

Chat Log Extractor

Auto-detects format from WhatsApp, Slack, Discord, IRC, and generic patterns. Builds participant profiles (message count, word average), extracts shared links with context, identifies topics via keyword frequency, and counts media messages.


๐Ÿงช Running Locally

# Install dependencies
pip install-r requirements.txt
# Run with Apify CLI
apify run --input-file=INPUT.json

๐Ÿ“‹ Example Output (Compact Resume)

{
"name":"John Doe",
"email":"johndoe@email.com",
"phone":"(555) 123-4567",
"location":"New York, NY",
"summary":"Full-stack developer with 5+ years...",
"skills":["Python","JavaScript","TypeScript","React","Django","AWS"],
"experience_count":2,
"education_count":1,
"certifications":["AWS Solutions Architect Associate","Certified Kubernetes Administrator"],
"languages":["English (Native)","French (Conversational)"]
}

You might also like

Universal Data Structure Converter

moving_beacon-owner1/my-actor-63

A production-grade Apify actor that converts between HTML, XML, CSV, YAML, and JSON formats. Supports 9+ conversion types with smart auto-detection, nested JSON flattening, HTML table scraping, batch URL processing, and full customization.

2

๐Ÿ”ฅ AI HTML to JSON Extractor (Fast, Free LLM for Data)

autoscaler/ai-html-to-json-extractor

Eliminate messy HTML cleanup and high LLM costs. This Actor uses a high-speed, zero-cost large language model to turn unstructured content (HTML, text, reviews, blog posts) into valid, structured JSON.

Data Cleaning & Transformation Toolkit

moving_beacon-owner1/my-actor-66

A powerful, multi-mode Apify actor that transforms messy, unstructured data into clean, structured JSON โ€” ready for APIs, databases, or downstream processing.

2

Pdf To Text Scraper

getdataforme/pdf-to-text-scraper

The Pdf To Text Scraper is an Apify Actor that efficiently extracts text from PDFs, preserving structure and supporting batch processing....

Duckduckgo Search Scraper

runtime/duckduckgo-search-scraper

Extract search results from DuckDuckGo with multiple search modes (web, images, news, videos). Supports batch processing, auto-pagination, SEO data extraction, and proxy configuration. Clean, structured output format with anti-detection measures.

๐Ÿ‘ User avatar

scraping automation

4

Image To Text Ai

welcoming_fireplace/image-to-text-ai

A powerful OCR tool that goes beyond standard text extraction. Powered by a Premium Vision AI model, it accurately reads handwriting, preserves table structures, and converts messy receipts or documents into structured JSON or Markdown. Supports batch processing for high-volume workflows.

๐Ÿ‘ User avatar

Richmond Nkrumah

42

Website Content Text Extractor

smart-digital/website-content-text-extractor

Extract visible text content from websites as structured JSON blocks. Supports multi-URL batch processing, header/footer/cookie exclusion, and optional form extraction. Perfect for content analysis and translation workflows.

My Smart Digital

81

5.0

HTML to JSON Smart Parser

parseforge/html-to-json-smart-parser

Convert HTML to structured JSON using AI! Uses OpenAI to extract and structure data from HTML into clean JSON format. Perfect for developers and data analysts who need to transform HTML into structured data without manual parsing.

40

5.0

SmartSchema Extract โ€” Text to JSON with AI

olican/smartschema-extract

Convert any unstructured text into validated JSON using Google Gemini. Define your JSON Schema per request. Perfect for invoice parsing, web scraping, email extraction, and ETL pipelines.

1

5.0

Related articles

The definitive guide to text scraping
Read more