VOOZH about

URL: https://apify.com/technicaldost/company-esg-sustainability-extractor

⇱ ESG & Sustainability Data Extractor | Carbon Net-Zero Β· Apify


πŸ‘ Company ESG & Sustainability Data Extractor avatar

Company ESG & Sustainability Data Extractor

Pricing

from $10.00 / 1,000 esg extractions

Go to Apify Store

Company ESG & Sustainability Data Extractor

Extract ESG and sustainability metrics, carbon commitments, and net-zero targets from public company sustainability pages. Structured JSON output for finance, research, and procurement teams.

Pricing

from $10.00 / 1,000 esg extractions

Rating

0.0

(0)

Developer

πŸ‘ Technical Dost Solutions

Technical Dost Solutions

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

What this Actor does

Extract ESG and sustainability metrics, carbon commitments, and net-zero targets from public company sustainability and ESG report web pages that you supply.

It processes user-provided public URLs, reads schema.org Organization JSON-LD for the company name, scans visible page text for ESG keywords grouped by metric category (carbon, energy, water, waste, diversity, governance), pairs those keywords with nearby numeric values and units, and optionally captures net-zero and reduction-target commitment sentences. It normalizes useful fields, deduplicates rows, and saves structured records to the Apify dataset.

Why this Actor is useful

Sustainability analysts, investors, and procurement teams pay for this kind of extraction because it converts unstructured ESG narrative reports into clean, comparable datasets. It saves manual reading, creates repeatable monitoring, feeds spreadsheets, dashboards, or scoring models, and turns public ESG pages into API-ready data.

Who this is for

  • ESG and sustainability analysts
  • Investment and ESG research teams
  • Corporate sustainability and procurement teams
  • Data providers and ESG rating builders
  • Journalists and NGOs tracking corporate climate claims
  • B2B teams enriching company sustainability profiles

Common use cases

  • Build comparable ESG metric datasets across many companies
  • Track net-zero and carbon-neutral commitments and target years
  • Monitor reported Scope 1/2/3 emissions over time
  • Enrich company profiles with sustainability data points
  • Feed ESG scoring or screening models

Input

  • startUrls: Public URLs to extract from. Use only pages you are allowed to access without login or bypassing access controls.
  • keywords: Optional additional ESG or sustainability terms to match on top of the built-in keyword library.
  • includeCommitments: Capture net-zero, carbon-neutral, and reduction-target sentences as commitment rows with an extracted target year.
  • maxItems: Maximum number of rows to save.
  • maxConcurrency: Number of pages processed in parallel. The default is intentionally conservative.
  • requestTimeoutSecs: Maximum time to spend on a single page.
  • proxyConfiguration: Optional Apify proxy configuration where permitted by your source review.

Output

  • companyName: Company name when exposed in Organization structured data.
  • sourceUrl: URL where the data was extracted.
  • metricCategory: Category such as carbon, energy, water, waste, diversity, governance, commitment, or other.
  • metricName: The matched metric label (for example, Scope 1 emissions).
  • metricValue: The numeric value found near the metric keyword.
  • unit: Detected unit such as %, tCO2e, MWh, or similar.
  • reportingYear: Reporting year detected in the same sentence when available.
  • targetYear: Target year detected for commitment rows.
  • commitmentText: The captured net-zero or reduction-target sentence.
  • framework: Reporting frameworks referenced on the page (GRI, SASB, TCFD, CDP, SDG).
  • extractedAt: Timestamp when this Actor extracted the row.
  • extractionMethod: structured_data, text_extraction, or commitment_text.
  • confidenceScore: Heuristic confidence score (structured 0.9, text-derived 0.6-0.8).
  • missingFields: Required fields (companyName, metricName, metricValue, reportingYear) not available from the source page.

Sample input

{
"startUrls":[
{
"url":"https://example.com/"
}
],
"keywords":[],
"includeCommitments":true,
"maxItems":50,
"maxConcurrency":3,
"requestTimeoutSecs":30
}

Sample output

{
"companyName":"Example Manufacturing Group",
"sourceUrl":"https://example.com/",
"metricCategory":"carbon",
"metricName":"Scope 1 emissions",
"metricValue":125000,
"unit":"tCO2e",
"reportingYear":2024,
"targetYear":null,
"commitmentText":null,
"framework":"GRI",
"extractedAt":"2026-06-12T00:00:00.000Z",
"extractionMethod":"structured_data",
"confidenceScore":0.9,
"missingFields":[]
}

How to use

Run this Actor on Apify with public URLs, export the dataset as JSON, CSV, Excel, or through the Apify API, then connect the output to Google Sheets, Make, Zapier, a webhook, your CRM, or an internal dashboard. For monitoring, save the input as an Apify task and schedule recurring runs.

Pricing

This Actor uses a pay-per-event model: $0.01 per extraction. You pay only for the structured rows the Actor produces, which keeps costs predictable and tied directly to delivered data.

Best practices

  • Start with a small set of reviewed public ESG and sustainability report URLs.
  • Prefer the main sustainability or ESG data pages rather than PDF download links.
  • Add domain-specific terms via keywords when a company uses non-standard metric names.
  • Keep includeCommitments enabled to capture net-zero and target language.
  • Keep maxConcurrency low for smaller websites.
  • Review source website rules before scheduling recurring runs.
  • Treat text-derived values as candidates for human review before downstream scoring.

Compliance and responsible use

This Actor is for public data only. It must not be used to bypass logins, paywalls, CAPTCHAs, or security systems, collect private data, gather sensitive personal data, or support spam or abuse. You are responsible for following applicable laws and source website rules.

Limitations

  • Output quality depends on the public ESG content available on the source pages.
  • Text-derived extraction is heuristic. Numeric values and units are matched near keywords and may need human verification before use in scoring.
  • The Actor reads HTML pages and does not parse PDF reports.
  • Some fields may be empty when the source does not publish them, and they are reported in missingFields rather than inferred.
  • The Actor does not claim support for any specific third-party ESG platform.
  • Website markup and access policies can change.

Troubleshooting

  • Empty output usually means the page has no recognizable ESG keywords paired with numeric values.
  • Invalid URL errors mean one or more input URLs are malformed.
  • Slow runs can usually be improved by lowering maxConcurrency.
  • Missing fields are source-data limitations, not inferred values.

Changelog

  • v0.2.0: Production-readiness pass with improved positioning, samples, schema descriptions, and responsible-use notes.
  • v0.1.0: Initial dry-run factory generated MVP.

You might also like

CSRHub.com ESG Data Scraper

njoylab/csrhub-com-esg-data-scraper

Extract comprehensive ESG metrics and company profiles from CSRHub.com with this efficient Apify scraper. Get structured sustainability ratings, corporate information, and industry benchmarks for investment analysis and research

SGX (Singapore Exchange) Scraper β€” Stocks, ETFs, REITs, Bonds

alwaysprimedev/sgx-scraper

Pull every security listed on the Singapore Exchange β€” stocks, ETFs, REITs, business trusts, bonds, warrants β€” with live delayed prices, ISIN codes, CPF-eligibility flags, and full corporate profiles.

Global Climate Sustainability B2B Leads

blukaze/global-climate-sustainability-b2b-leads-Apify-Actor

Global Climate & Sustainability B2B Leads Finder crawls company websites to detect ESG and sustainability activity, then converts it into structured leads with key pages, contacts, and a sustainabilityIntentScore (0–100) to quickly identify high-intent organizations.

πŸ‘ User avatar

Blukaze Automations

4

EPA Toxics Release Inventory (TRI) Scraper

compute-edge/epa-tri-scraper

Extract toxic chemical release data from the EPA Toxics Release Inventory (TRI). Over 3 million records of industrial facility emissions reported since 1987. Filter by state, year, and chemical name.

EPA TRI Scraper - Toxic Release Inventory API

pink_comic/epa-tri-toxic-release-search

Search EPA Toxic Release Inventory (TRI) facility and chemical release records. Find toxic emissions by state, ZIP, facility, or chemical for ESG research, environmental due diligence, compliance monitoring, and risk screening. No API key required. Pay per result.

GreenTrace-scrapper

sama4/greentrace-scrapper

πŸ’ŽESG Scraper: Sustainability Reports & PDF Disclosures

primeparse/esg-content-scraper

Powerful ESG scraper (Environmental, Social, and Governance) to automatically extract sustainability reports, PDF disclosures, articles, and content from any website. Get clean, AI-ready datasets with keyword filtering, metadata extraction, images, links, and full PDF support.

18

5.0

(1)

Forex Exchange Rate Scraper

taroyamada/exchange-rate-monitor

Feed AI models and RAG pipelines with real-time forex data by scraping live exchange rates from open.er-api.com and calculating exact currency fluctuations.