VOOZH about

URL: https://apify.com/getascraper/sec-edgar-rag-extractor

โ‡ฑ SEC EDGAR Scraper for RAG: 10-K/10-Q/8-K as JSON ยท Apify


๐Ÿ‘ SEC EDGAR Scraper for RAG: 10-K/10-Q/8-K as JSON avatar

SEC EDGAR Scraper for RAG: 10-K/10-Q/8-K as JSON

Pricing

from $20.00 / 1,000 extracted sec filings

Go to Apify Store

SEC EDGAR Scraper for RAG: 10-K/10-Q/8-K as JSON

Extract SEC EDGAR filings (10-K, 10-Q, 8-K). Fixed-token text chunks of primary documents for finance LLMs and compliance RAG. Drop-in for LlamaIndex, LangChain. Skip manual XBRL parsing. $0.03/filing.

Pricing

from $20.00 / 1,000 extracted sec filings

Rating

0.0

(0)

Developer

๐Ÿ‘ GetAScraper

GetAScraper

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Extract SEC EDGAR filings into RAG-ready text chunks for finance and compliance LLMs. Get 10-K annual reports, 10-Q quarterly reports, and 8-K material events pre-chunked as JSON. Drop-in ready for LlamaIndex, LangChain, Pinecone, and Qdrant. Built for AI training data teams, buy-side research assistants, and M&A intelligence platforms. Skip manual HTML/XBRL parsing and messy SEC entity decoding.

What does SEC EDGAR RAG Extractor do?

This Actor fetches corporate filings directly from the SEC EDGAR database. It pulls the HTML primary document, strips out the noise (like <ix:header> metadata and tables styling), extracts the plain text, and slices it into fixed-token chunks with overlap. You get clean, LLM-ready JSON arrays representing the core text of the filing.

Try it with Apple's CIK 0000320193 or search for "artificial intelligence risk factors". Runs reliably on the Apify platform with built-in SEC rate limiting.

Why use SEC EDGAR RAG Extractor?

  • Finance AI: Train models on clean corporate disclosures without writing custom HTML parsers.
  • Compliance RAG: Build chatbots that cite specific regulatory filings accurately.
  • M&A Research: Feed target company 10-Ks into your intelligence pipeline.
  • Sell-side Research: Monitor 8-K events and earnings drift automatically.

How to use SEC EDGAR RAG Extractor

  1. Set your User-Agent. The SEC requires a real name and email.
  2. Provide a list of CIKs (e.g., 0000320193 for Apple) or enter a Search Query (e.g., "AI risks").
  3. Select the Form Types you want (10-K, 10-Q, 8-K).
  4. Set your Date Range and Max Filings cap.
  5. Click "Start" and download the chunked JSON.

Input

Provide standard SEC parameters. Here is a JSON example:

{
"cikList":["0000320193"],
"formTypes":["10-K"],
"dateFrom":"2024-01-01",
"dateTo":"2024-12-31",
"maxFilings":5,
"searchQuery":"",
"userAgent":"Jane Smith jane@acme.com"
}

Output

The Actor outputs one record per filing. You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.

{
"accession_no":"0000320193-24-000123",
"cik":"0000320193",
"company_name":"Apple Inc.",
"ticker":"AAPL",
"form_type":"10-K",
"filing_date":"2024-11-01",
"period_of_report":"2024-09-28",
"filing_url":"https://www.sec.gov/Archives/edgar/data/320193/000032019324000123/aapl-20240928.htm",
"source":"full_text",
"chunks":[
{
"idx":0,
"text":"Item 1. Business...",
"tokens":512
}
]
}

Data table

FieldTypeDescription
accession_noStringUnique SEC identifier for the filing
cikString10-digit Central Index Key
company_nameStringFiler name
tickerStringStock ticker (if available)
form_typeString10-K, 10-Q, or 8-K
filing_dateStringDate submitted to the SEC
sourceStringIndicates extraction depth (full_text or exhibits_stripped)
filing_urlStringLink to the SEC Archives primary document

Pricing / Cost estimation

This Actor is priced at $0.02 per filing. How much does it cost to scrape SEC EDGAR? If you pull 1000 Apple 10-Ks and 10-Qs, the run will cost exactly $20.00. You only pay for successful extractions.

User-Agent warning

The SEC strictly requires a valid User-Agent header containing your name and email. The default placeholder will be rejected with a 403 Forbidden error, crashing the run. Please override the userAgent input field with your real contact information before starting.

Tips / Advanced options

  • Narrow via search: Use the full-text search query field to build a highly targeted RAG corpus instead of pulling every filing for a CIK.
  • Filing sizes: 10-K annual reports are very long. Expect 20 to 50 text chunks (512 tokens each) per filing.
  • Rate limiting: The Actor automatically paces requests at the SEC ceiling of 10 requests per second for maximum throughput without IP bans.

Legal disclaimer and limitations

SEC EDGAR data is public domain. Please respect the SEC fair-access policy. Limitations:

  • v1 supports 10-K, 10-Q, and 8-K bodies only. No 13F, S-1, or DEF 14A.
  • v1 does not parse inline XBRL tables for numerical extraction.
  • Text-format exhibits are concatenated into the main body text; binary exhibits are skipped.

FAQ

Why do I need a User-Agent? The SEC blocks automated traffic that doesn't identify itself. A valid name and email allow them to contact you if your traffic causes issues.

Why are some filings marked source: exhibits_stripped? If a filing contains complex or binary attachments that fail to parse cleanly, the Actor falls back to extracting just the primary document body to ensure you still get data.

Can I get 13F filings? S-1? DEF 14A? Not in v1. We focused on the core financial disclosures first.

Support

Found a bug or need a feature? Open an issue on our GitHub repository.

You might also like

๐Ÿ“‘ SEC EDGAR Scraper โ€” 10-K, 10-Q & Filings

nexgendata/sec-edgar-filings-scraper

Extract SEC EDGAR filings โ€” 10-K, 10-Q, 8-K reports, insider transactions. Financial research and compliance monitoring.

SEC EDGAR Analyzer โ€” 10-K, 10-Q & 8-K Data

ryanclinton/sec-edgar-filing-analyzer

Search SEC filings by ticker, name, or CIK. Extract 10-K, 10-Q, 8-K metadata and structured XBRL financials (revenue, net income, assets, EPS). Covers 10,000+ public companies. Free SEC API, no key needed.

12

SEC EDGAR Filing Scraper (8-K and 10-K Reports)

scraped/sec-edgar-filing-scraper-8-k-and-10-k-reports

This scraper fetches 8-K and 10-K filings from the SEC's EDGAR system using a company's ticker symbol.

SEC EDGAR Scraper - 10-K, 10-Q & Company Filings API

pink_comic/sec-edgar-company-filings

Scrape SEC EDGAR public-company filings by company, ticker, or CIK. Get 10-K annual reports, 10-Q quarterly reports, 8-K events, Form 4 insider trades, proxy statements, and direct source links from official SEC data. For investor due diligence, compliance, and fintech workflows.