VOOZH about

URL: https://apify.com/parseforge/pubchem-compound-scraper

โ‡ฑ PubChem Compound Scraper ยท Apify


Pricing

from $20.00 / 1,000 result items

Go to Apify Store

PubChem Compound Scraper

Export chemical compound data from PubChem, the world's largest open chemistry database with 119M+ compounds. Look up by CID, name, SMILES, or InChIKey. Pull molecular formulas, weights, structures, synonyms, IUPAC names, and properties.

Pricing

from $20.00 / 1,000 result items

Rating

0.0

(0)

Developer

๐Ÿ‘ ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Share

๐Ÿ‘ ParseForge Banner

๐Ÿงช PubChem Compound Scraper

๐Ÿš€ Export chemistry data from PubChem in seconds. Look up 119M+ compounds by CID, name, SMILES, or InChIKey. Pull molecular formulas, weights, structures, IUPAC names, synonyms, and 23+ computed properties.

๐Ÿ•’ Last updated: 2026-05-22 ยท ๐Ÿ“Š 19 fields per record ยท ๐Ÿงช 119M+ compounds ยท ๐Ÿ”ฌ NIH official source ยท ๐Ÿ” 4 lookup modes

The PubChem Compound Scraper taps PubChem, the world's largest open chemistry database, maintained by the NIH National Library of Medicine. The Actor returns 19 structured fields per record, including PubChem CID, IUPAC name, molecular formula and weight, canonical and isomeric SMILES, InChI, InChIKey, computed physicochemical properties, and the full synonym list.

The catalog covers 119 million unique chemical compounds, drawn from hundreds of contributing organizations, including the FDA, EPA, DrugBank, ChEMBL, NIST, and pharma research consortia. This Actor exposes four lookup modes (CID, name, SMILES, InChIKey) and lets you cherry-pick which of 23 PubChem-computed properties to return.

๐ŸŽฏ Target Audience๐Ÿ’ก Primary Use Cases
Chemists, pharma R&D, cheminformaticians, materials scientists, drug-discovery teams, regulatory analysts, chemistry educatorsCompound lookup and enrichment, SAR/QSAR feature engineering, ADMET screening inputs, regulatory dossiers, synonym normalization, structure-to-property mapping

๐Ÿ“‹ What the PubChem Compound Scraper does

Four lookup workflows in a single Actor:

  • ๐Ÿ”ข CID lookup. Numeric PubChem identifiers like 2244 (aspirin), 3672 (ibuprofen).
  • ๐Ÿ“› Name lookup. Common names like aspirin, caffeine, paclitaxel.
  • ๐Ÿงฌ SMILES lookup. Pass a structure string and resolve to the canonical PubChem record.
  • ๐Ÿ”‘ InChIKey lookup. Hash-based exact-match lookup, ideal for deduplication.

Pick from 23 PubChem-computed properties (molecular formula, weight, exact mass, SMILES variants, InChI, IUPAC name, XLogP, TPSA, complexity, charge, H-bond donor/acceptor counts, rotatable bonds, heavy atoms, stereocenters, 3D volume, feature count, and more). Toggle synonym fetching to also pull every common name registered for each compound.

๐Ÿ’ก Why it matters: PubChem is the de facto reference for compound metadata in cheminformatics. Building your own client means juggling the PUG REST API, throttling, retries, and per-property batching. This Actor delivers a tidy record per compound, ready for downstream modelling, dashboards, or reports.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


โš™๏ธ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
modeenum"cid"One of cid, name, smiles, inchikey.
identifiersstring[]5 example CIDsList of identifiers to resolve, in the chosen mode.
propertiesstring[]13 core propertiesSubset of 23 PubChem-computed properties.
includeSynonymsbooleantrueAlso fetch the list of common names and synonyms per compound.

Example: lookup 5 common drugs by name with synonyms.

{
"maxItems":5,
"mode":"name",
"identifiers":["aspirin","ibuprofen","caffeine","paracetamol","metformin"],
"includeSynonyms":true
}

Example: minimal property pull by CID for a screening library.

{
"maxItems":1000,
"mode":"cid",
"identifiers":["2244","3672","1983","5793","2519"],
"properties":["MolecularFormula","MolecularWeight","CanonicalSMILES","XLogP","TPSA"],
"includeSynonyms":false
}

โš ๏ธ Good to Know: PubChem PUG REST applies rate limits to free public callers. The Actor batches and paces requests automatically so you avoid 503s.


๐Ÿ“Š Output

Each record contains 19 fields. Download the dataset as CSV, Excel, JSON, or XML.

๐Ÿงพ Schema

FieldTypeExample
๐Ÿ†” cidinteger2244
๐Ÿท๏ธ titlestring | null"Aspirin"
๐Ÿงฌ iupacNamestring | null"2-acetyloxybenzoic acid"
โš—๏ธ molecularFormulastring | null"C9H8O4"
โš–๏ธ molecularWeightstring | null"180.16"
๐Ÿ“ canonicalSMILESstring | null"CC(=O)OC1=CC=CC=C1C(=O)O"
๐ŸŒ€ isomericSMILESstring | null"CC(=O)OC1=CC=CC=C1C(=O)O"
๐Ÿ”— inchistring | null"InChI=1S/C9H8O4/..."
๐Ÿ”‘ inchiKeystring | null"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"
๐Ÿ’ง xLogPnumber | null1.2
๐ŸŽฏ exactMassstring | null"180.04225873"
๐Ÿงฎ tpsanumber | null63.6
๐Ÿ”‹ hBondDonorCountinteger | null1
๐Ÿ”Œ hBondAcceptorCountinteger | null4
๐Ÿ”„ rotatableBondCountinteger | null3
๐Ÿ“ synonymsstring[] | null["Aspirin", "Acetylsalicylic acid", "ASA", ...]
๐Ÿงฑ propertiesobject | null{ "Complexity": 212, "HeavyAtomCount": 13, ... }
๐Ÿ”— urlstring"https://pubchem.ncbi.nlm.nih.gov/compound/2244"
๐Ÿ•“ scrapedAtISO 8601"2026-05-22T00:00:00.000Z"

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

Capability
๐ŸŒMassive coverage. 119M+ compounds from the NIH National Library of Medicine.
๐Ÿ”Four lookup modes. CID, name, SMILES, and InChIKey resolve to the same canonical record.
๐Ÿงฑ23 computed properties. Pick only the ones your model needs and save downstream cleanup.
๐Ÿ“Synonym lists. Resolve trade names, salts, generics, and historical spellings in one shot.
โšกFast. 100 compounds in under a minute, paced under the public rate limit.
๐Ÿ”Always fresh. Every run hits the live PubChem feed.
๐ŸšซNo API key. Public PubChem REST needs no registration.

๐Ÿ“Š PubChem is the most widely cited chemical reference in modern cheminformatics, drug discovery, and materials research.


๐Ÿ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
โญ PubChem Compound Scraper (this Actor)$5 free credit, then pay-per-use119M+ compoundsLive per runCID, name, SMILES, InChIKeyโšก 2 min
Manual web download from PubChemFreePer-compoundManualNone๐Ÿข Hours
Hand-coded PUG REST clientFreeFullPer-buildCustomโณ Days
Commercial cheminformatics suites$$$$/yearCuratedVendor scheduleVendor-defined๐Ÿ•’ Sales cycle

Pick this Actor when you want broad coverage, multi-mode lookup, and zero infrastructure to maintain.


๐Ÿš€ How to use

  1. ๐Ÿ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. ๐ŸŒ Open the Actor. Go to the PubChem Compound Scraper page on the Apify Store.
  3. ๐ŸŽฏ Set input. Pick a lookup mode, paste identifiers, choose which properties to fetch.
  4. ๐Ÿš€ Run it. Click Start and let the Actor collect your data.
  5. ๐Ÿ“ฅ Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

โฑ๏ธ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


๐Ÿ’ผ Business use cases

๐Ÿ’Š Pharma R&D

  • Hit triage and library enrichment
  • ADMET property pulls for early screening
  • Synonym normalization across legacy datasets
  • Regulatory dossier reference checks

๐Ÿงช Cheminformatics and ML

  • Build SAR/QSAR feature tables
  • Train generative-chemistry models with real properties
  • Standardize SMILES/InChI representations
  • Benchmark predicted vs PubChem-computed properties

๐Ÿงฑ Materials and chemicals

  • Specialty-chemical sourcing reference data
  • Polymer monomer property tables
  • Catalyst and ligand databases
  • Raw-material substitution screens

๐Ÿ“‹ Regulatory and EHS

  • Synonym matching for hazardous-substance lists
  • Inventory reconciliation across regulatory IDs
  • Safety data sheet (SDS) cross-referencing
  • Tracking ingredient identifiers across jurisdictions

๐Ÿ”Œ Automating PubChem Compound Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • ๐ŸŸข Node.js. Install the apify-client NPM package.
  • ๐Ÿ Python. Use the apify-client PyPI package.
  • ๐Ÿ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Daily or weekly refreshes keep downstream databases in sync automatically.


๐ŸŒŸ Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

๐ŸŽ“ Research and academia

  • Course datasets for medicinal-chemistry and cheminformatics classes
  • Reproducible papers with cited, versioned compound pulls
  • Open-science notebooks that ground analyses in PubChem
  • Thesis projects on structure-property relationships

๐ŸŽจ Personal and creative

  • Hobbyist science blogs and explainers
  • Visualization projects on molecular property distributions
  • Educational apps that teach chemistry through real compounds
  • Side projects exploring natural-product chemistry

๐Ÿค Non-profit and civic

  • Public-health communication around medicines and toxins
  • Environmental advocacy with chemical-property evidence
  • Citizen-science projects on consumer-product ingredients
  • Educational resources for under-served STEM programs

๐Ÿงช Experimentation

  • Train property-prediction ML models on real labels
  • Validate generative-chemistry tools against PubChem ground truth
  • Prototype agent pipelines that answer chemistry questions
  • Build LLM-grounded chemistry assistants with cited records

๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


โ“ Frequently Asked Questions

๐Ÿงฉ How does it work?

Pick a lookup mode, paste your identifiers, choose which PubChem-computed properties to return, and click Start. The Actor calls the public PubChem feed, paces requests to stay within rate limits, and emits one tidy record per compound.

๐Ÿ“ How accurate is the data?

All numeric properties are PubChem-computed values served live from the NIH source. Synonyms are aggregated from PubChem's depositor network and cover trade names, salts, generics, and historical spellings.

๐Ÿ” How often is the dataset refreshed?

PubChem updates continuously as depositors submit new compounds and properties. Every Actor run pulls the current state of each compound at run time.

๐Ÿงฌ What's the difference between canonical and isomeric SMILES?

Canonical SMILES is a normalized 2D representation. Isomeric SMILES preserves stereochemistry and isotope information. Use isomeric for accurate structure handling in modelling.

โฐ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval and keep a downstream database in sync.

โš–๏ธ Is this data legal to use?

PubChem data is in the public domain in the United States. Many international jurisdictions treat it similarly. Review the downstream terms of your specific use case before redistribution.

๐Ÿ’ผ Can I use this data commercially?

Yes. PubChem's data policy permits commercial use. You are responsible for complying with any downstream regulatory requirements and the terms of contributing depositors.

๐Ÿ’ณ Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.

๐Ÿ” What happens if a run fails or gets interrupted?

Apify automatically retries transient errors. If a run still fails, you can inspect the log in the Runs tab, fix the input, and re-run. Partial datasets from failed runs are preserved so you never lose progress.

๐Ÿ†˜ What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


๐Ÿ”Œ Integrate with any app

PubChem Compound Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe compound data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh compound data into your product backend, or alert your team in Slack.


๐Ÿ”— Recommended Actors

๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


๐Ÿ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


โš ๏ธ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the NIH National Library of Medicine, PubChem, or any government body. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.

You might also like

PubChem Compound Lookup โ€” Chemistry API for Pharma R&D

azureblue/pubchem-compound-scraper

Look up chemical compounds in PubChem by name. Returns CID, molecular formula, weight, SMILES, InChI, IUPAC name, physicochemical properties, description and synonyms.

PubChem Chemical Compound Scraper

crawlerbros/pubchem-chemical-compound-scraper

Search PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, get by CID, or fetch synonyms. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, and more. No API key required.

PubChem Compound Scraper

crawlerbros/pubchem-scraper

Scrape PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, CID, SMILES, or full-text. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, synonyms, and more.

PubChem Compound Scraper

crawlergang/pubchem-scraper

Scrape PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, CID, SMILES, or full-text. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, synonyms, and more.

1

5.0

PubChem Compound Scraper - Chemical & Drug Data API

pink_comic/pubchem-compound-search

Scrape NIH PubChem chemical compound data by name, formula, SMILES, or CID. Get molecular weight, IUPAC, InChI, SMILES, XLogP, synonyms, and drug data for pharma, toxicology, and R&D workflows.

MyChem.info Drug Annotation Scraper

parseforge/mychem-drug-annotation-scraper

Resolve any drug name or InChIKey into a tidy annotation from MyChem.info. Returns DrugBank name and accession, ChEMBL and PubChem ids, UNII, ATC codes, chemical formula, molecular weight, indications, and mechanism classes. Great for drug reference tables and identifier crosswalks.

ECHA Scraper โ€” EU Chemical Substance Data & Hazard Info

studio-amba/echa-scraper

Scrape chemical substance records from the European Chemicals Agency. Get CAS numbers, EC numbers, molecular formulas, hazard classifications, and REACH data.