Pricing
from $40.00 / 1,000 markdown chunks
SEC Filings to Markdown for RAG
Convert SEC EDGAR filings (10-K, 10-Q, 8-K, 13F) into clean, chunked, citation-tagged Markdown for RAG and LLM pipelines. Official data, no login.
Pricing
from $40.00 / 1,000 markdown chunks
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
9 hours ago
Last modified
Categories
Share
π SEC Filings to Markdown for RAG Β· EDGAR β LLM-Ready
Convert SEC EDGAR filings into clean, chunked, citation-tagged Markdown β built for AI engineers feeding financial filings into RAG pipelines and LLM agents.
β‘ What you get
| Field | Description |
|---|---|
company / cik / ticker | Issuer identity |
form | Filing type (10-K, 10-Q, 8-K, 13F, β¦) |
filingDate | Date filed |
accessionNumber | SEC accession number (citation) |
sourceUrl | Direct link to the source document (citation) |
chunkIndex / totalChunks | Position within the filing |
markdown | Clean Markdown chunk, ready for embedding |
π― Use cases
- AI engineers building financial-research copilots / RAG over filings
- Quant & fundamental analysts loading 10-Ks into a vector store
- Compliance teams building searchable filing knowledge bases
- Fintech products needing LLM-ready filing text with citations
π Sample inputs
{"companies":["AAPL","MSFT"],"formTypes":["10-K","10-Q"],"maxFilingsPerCompany":2,"chunkWords":800}
{"companies":["NVDA"],"formTypes":["8-K"],"maxFilingsPerCompany":5}
{"companies":["320193"],"formTypes":["13F-HR"]}
π¦ Sample output
{"company":"Apple Inc.","cik":"0000320193","ticker":"AAPL","form":"10-K","filingDate":"2025-11-01","accessionNumber":"0000320193-25-000123","sourceUrl":"https://www.sec.gov/Archives/edgar/data/320193/.../aapl-20250927.htm","chunkIndex":0,"totalChunks":42,"markdown":"# Item 1. Business\nThe Company designs..."}
π Sample Output
π How it works
- Source β resolves tickersβCIK and reads filings from the official SEC EDGAR APIs (
data.sec.gov,www.sec.gov/Archives). - Parser β strips scripts/styles and converts filing HTML to ATX Markdown.
- Chunking β splits into ~
chunkWords-word chunks for embedding. - Schema β one row per chunk with full citation fields (accession + source URL).
- Fallback β unresolved tickers / failed docs are logged and skipped; the run still succeeds.
π Related Actors
- SEC EDGAR Scraper β structured filing data
- SEC Form 13F Holdings Tracker β institutional holdings
- RAG Web Browser β web content for retrieval
- Website Content Crawler β full-site Markdown for AI
π° Pricing Example
Pay-per-event: $0.005 per run + $0.04 per Markdown chunk (document-record).
| Chunks | Cost |
|---|---|
| 100 | ~$4.00 |
| 500 | ~$20.00 |
| 2,000 | ~$80.00 |
| Apify's $5 free credit covers ~124 chunks. Start free β |
βοΈ Legal & data sources
Data is from the public SEC EDGAR system (data.sec.gov, www.sec.gov) β U.S. government public-domain filings. Requests use an identified, contact-bearing User-Agent per SEC access guidance. You are responsible for your downstream use.
β FAQ
Which forms are supported? Any EDGAR form type β pass them in formTypes (10-K, 10-Q, 8-K, 13F-HR, DEF 14A, β¦).
Ticker or CIK? Either; tickers are resolved to CIK automatically.
Are citations included? Yes β every chunk carries the accession number and source URL.
How big are chunks? ~chunkWords words (default 800); tune for your embedder.
Is the data fresh? Pulled live from EDGAR at run time.
Cost control? Use maxFilingsPerCompany and formTypes to bound output.
π Troubleshooting
- Company not found β check the ticker symbol or pass the CIK directly.
- 0 chunks β the requested
formTypesmay not exist in that issuer's recent filings. - Huge output β lower
maxFilingsPerCompanyor narrowformTypes. - Markdown noise from exhibits β narrow to the primary form types you need.
π·οΈ About NexGenData
NexGenData builds structured public-data tools for analysts, developers, and operators. Full catalog: thenextgennexus.com.
