EPA Demand Scraper, companies in violation, nationwide
Pricing
Pay per usage
EPA Demand Scraper, companies in violation, nationwide
Every U.S. company currently in EPA violation across all four programs, into one dataset. The only filter is volume.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Actor stats
1
Bookmarked
19
Total users
17
Monthly active users
16 days ago
Last modified
Categories
Share
EPA Demand Scraper
Every U.S. company currently in EPA violation โ across all four programs (Clean Water, Clean Air, Hazardous Waste, Drinking Water), nationwide โ pulled into one Apify dataset. The only filter is volume. You slice by state, industry, or penalty yourself in the dataset.
This is a demand-side lead source: companies in trouble with the EPA right now. The buyers for this list are environmental, remediation, compliance, and safety firms.
Why it gets volume (the "why only 200?" fix)
Most EPA pulls die because they stack filters โ one program, two states, a tight industry code โ on a throttled API. This actor does the opposite:
- All four programs, unioned and deduped by EPA facility ID.
- Nationwide, in-violation, no industry filter.
- Bulk download (one CSV per program) instead of the throttled row-by-row API.
Clean Water Act alone is ~42,800 facilities in violation ($251M in penalties, verified live). Add the other three programs and you're well into six figures.
Input
{"maxResults":5000}
maxResultsโ how many companies you want (1โ200,000). The only knob.programs(optional, advanced) โ defaults to all four; you normally leave it.
Output (one row per company)
{"company":"Del Valle Processing LLC","registry_id":"110000871234","programs":["Clean Water Act","RCRA (Hazardous Waste)"],"violation_status":"Significant Non-Compliance","address":"320 Industrial Blvd","city":"Del Valle","state":"TX","zip":"78617","county":"Travis","naics_code":"311612","naics_label":"Meat Processed from Carcasses","last_inspection_date":"2025-09-30","days_since_inspection":141,"total_penalties_usd":47500,"formal_actions":2,"inspections_last_5yr":5,"latitude":30.1783,"longitude":-97.6042,"source_url":"https://echo.epa.gov/detailed-facility-report?fid=110000871234"}
violation_status (SNC = significant non-compliance is the hottest) and
days_since_inspection are your prioritization signals โ call the freshest,
worst-hit first.
Note: EPA's export has no single "last violation date" field and no employee-count field, so this actor reports
last_inspection_date/days_since_inspectionas the recency proxy and leaves company size to your own enrichment. We don't invent fields the source doesn't have.
Run locally
npminstallnpm run buildapify run
Data source & verification
EPA ECHO REST services (echodata.epa.gov). Flow per program:
get_facilities (in-violation) โ QueryID โ get_download (full CSV).
CWA is live-verified. CAA / RCRA / SDWA use the same shape but their
violation-filter params were not yet live-confirmed (ECHO throttles at 300/hr and
we hit it during recon). The actor logs each program's facility count before
downloading โ if a program logs 0 or UNVERIFIED, fix its violationParams
in src/programs.ts and re-run. Never trust a gov API param silently: it returns
empty or unfiltered without erroring.
