VOOZH about

URL: https://apify.com/nexgendata/sec-filings-rag-markdown?fpr=2ayu9b

⇱ SEC Filings to Markdown for RAG β€” EDGAR LLM Dataset Β· Apify


Pricing

from $40.00 / 1,000 markdown chunks

Go to Apify Store

SEC Filings to Markdown for RAG

Convert SEC EDGAR filings (10-K, 10-Q, 8-K, 13F) into clean, chunked, citation-tagged Markdown for RAG and LLM pipelines. Official data, no login.

Pricing

from $40.00 / 1,000 markdown chunks

Rating

0.0

(0)

Developer

πŸ‘ NexGenData

NexGenData

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

9 hours ago

Last modified

Share

πŸ“‘ SEC Filings to Markdown for RAG Β· EDGAR β†’ LLM-Ready

Convert SEC EDGAR filings into clean, chunked, citation-tagged Markdown β€” built for AI engineers feeding financial filings into RAG pipelines and LLM agents.

⚑ What you get

FieldDescription
company / cik / tickerIssuer identity
formFiling type (10-K, 10-Q, 8-K, 13F, …)
filingDateDate filed
accessionNumberSEC accession number (citation)
sourceUrlDirect link to the source document (citation)
chunkIndex / totalChunksPosition within the filing
markdownClean Markdown chunk, ready for embedding

🎯 Use cases

  1. AI engineers building financial-research copilots / RAG over filings
  2. Quant & fundamental analysts loading 10-Ks into a vector store
  3. Compliance teams building searchable filing knowledge bases
  4. Fintech products needing LLM-ready filing text with citations

πŸš€ Sample inputs

{"companies":["AAPL","MSFT"],"formTypes":["10-K","10-Q"],"maxFilingsPerCompany":2,"chunkWords":800}
{"companies":["NVDA"],"formTypes":["8-K"],"maxFilingsPerCompany":5}
{"companies":["320193"],"formTypes":["13F-HR"]}

πŸ“¦ Sample output

{"company":"Apple Inc.","cik":"0000320193","ticker":"AAPL","form":"10-K",
"filingDate":"2025-11-01","accessionNumber":"0000320193-25-000123",
"sourceUrl":"https://www.sec.gov/Archives/edgar/data/320193/.../aapl-20250927.htm",
"chunkIndex":0,"totalChunks":42,"markdown":"# Item 1. Business\nThe Company designs..."}

πŸ“Š Sample Output

πŸ‘ Sample output

πŸ›  How it works

  1. Source — resolves tickers→CIK and reads filings from the official SEC EDGAR APIs (data.sec.gov, www.sec.gov/Archives).
  2. Parser β€” strips scripts/styles and converts filing HTML to ATX Markdown.
  3. Chunking β€” splits into ~chunkWords-word chunks for embedding.
  4. Schema β€” one row per chunk with full citation fields (accession + source URL).
  5. Fallback β€” unresolved tickers / failed docs are logged and skipped; the run still succeeds.

πŸ”— Related Actors

πŸ’° Pricing Example

Pay-per-event: $0.005 per run + $0.04 per Markdown chunk (document-record).

ChunksCost
100~$4.00
500~$20.00
2,000~$80.00
Apify's $5 free credit covers ~124 chunks. Start free β†’

βš–οΈ Legal & data sources

Data is from the public SEC EDGAR system (data.sec.gov, www.sec.gov) β€” U.S. government public-domain filings. Requests use an identified, contact-bearing User-Agent per SEC access guidance. You are responsible for your downstream use.

❓ FAQ

Which forms are supported? Any EDGAR form type β€” pass them in formTypes (10-K, 10-Q, 8-K, 13F-HR, DEF 14A, …). Ticker or CIK? Either; tickers are resolved to CIK automatically. Are citations included? Yes β€” every chunk carries the accession number and source URL. How big are chunks? ~chunkWords words (default 800); tune for your embedder. Is the data fresh? Pulled live from EDGAR at run time. Cost control? Use maxFilingsPerCompany and formTypes to bound output.

πŸ†˜ Troubleshooting

  • Company not found β€” check the ticker symbol or pass the CIK directly.
  • 0 chunks β€” the requested formTypes may not exist in that issuer's recent filings.
  • Huge output β€” lower maxFilingsPerCompany or narrow formTypes.
  • Markdown noise from exhibits β€” narrow to the primary form types you need.

🏷️ About NexGenData

NexGenData builds structured public-data tools for analysts, developers, and operators. Full catalog: thenextgennexus.com.

You might also like

πŸ“‘ SEC EDGAR Scraper β€” 10-K, 10-Q & Filings

nexgendata/sec-edgar-filings-scraper

Extract SEC EDGAR filings β€” 10-K, 10-Q, 8-K reports, insider transactions. Financial research and compliance monitoring.

SEC EDGAR Filings API

oakridge_tech/sec-edgar-filings-api

Structured SEC EDGAR filings metadata (10-K, 10-Q, 8-K, 13F, Form 4) by ticker/CIK or from the live recent-filings feed. One row per filing, ready for fintech, quant, and LLM research pipelines.

1

SEC EDGAR Filings Scraper

crawlerbros/sec-edgar-scraper

Scrape SEC EDGAR filings (10-K, 10-Q, 8-K, Form 4 insider trades, 13F holdings) for any US public company. HTTP-only via the SEC's public API. No login, no proxy, no auth.

4

4.0

SEC Filings Extractor

extremescrapes/sec-filings-extractor

Extract SEC filings (10-K, 10-Q, 8-K) as structured Markdown with automatic section splitting by Item headers.

πŸ‘ User avatar

Extreme Scrapes

2

SEC EDGAR Scraper for RAG: 10-K/10-Q/8-K as JSON

getascraper/sec-edgar-rag-extractor

Extract SEC EDGAR filings (10-K, 10-Q, 8-K). Fixed-token text chunks of primary documents for finance LLMs and compliance RAG. Drop-in for LlamaIndex, LangChain. Skip manual XBRL parsing. $0.03/filing.

πŸ“‘ SEC EDGAR Search β€” Company Filings & Reports

nexgendata/sec-edgar-search

Search SEC EDGAR for 10-K, 10-Q, 8-K filings, proxy statements, and insider transactions. Extract financial data from any public company's regulatory filings for investment research.

SEC EDGAR Analyzer β€” 10-K, 10-Q & 8-K Data

ryanclinton/sec-edgar-filing-analyzer

Search SEC filings by ticker, name, or CIK. Extract 10-K, 10-Q, 8-K metadata and structured XBRL financials (revenue, net income, assets, EPS). Covers 10,000+ public companies. Free SEC API, no key needed.

12