VOOZH about

URL: https://apify.com/crawlergang/pubchem-scraper

โ‡ฑ PubChem Compound Scraper ยท Apify


Pricing

from $3.00 / 1,000 results

Go to Apify Store

PubChem Compound Scraper

Scrape PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, CID, SMILES, or full-text. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, synonyms, and more.

Pricing

from $3.00 / 1,000 results

Rating

5.0

(11)

Developer

๐Ÿ‘ Crawler Gang

Crawler Gang

Maintained by Community

Actor stats

11

Bookmarked

1

Total users

0

Monthly active users

14 days ago

Last modified

Share

Scrape PubChem โ€” the world's largest free chemistry database with 100M+ compounds maintained by the NCBI. Search by compound name, PubChem CID, SMILES string, or free-text query. Returns molecular identifiers, physicochemical properties, structural data, and synonyms. HTTP-only via the public PubChem REST API. No auth, no proxy required.

What this actor does

  • Four modes: searchByName, searchBySmiles, searchByCid, fullTextSearch
  • Compound lookup: by IUPAC name, common name, CID, or SMILES notation
  • Rich properties: molecular formula, weight, SMILES, InChI, InChIKey, XLogP, H-bond counts, heavy atom count, complexity
  • Synonyms: up to 10 synonyms per compound
  • Empty fields are omitted โ€” no nulls in output

Output per compound

FieldTypeDescription
cidintegerPubChem Compound ID
iupacNamestringIUPAC systematic name
molecularFormulastringMolecular formula (e.g. C9H8O4)
molecularWeightfloatMolecular weight in g/mol
canonicalSmilesstringCanonical SMILES notation
isomericSmilesstringIsomeric SMILES (with stereochemistry)
inchiKeystringStandard InChIKey hash
inchistringStandard InChI string
xlogpfloatComputed XLogP3 lipophilicity
exactMolecularWeightfloatExact monoisotopic mass
hbondDonorCountintegerNumber of hydrogen bond donors
hbondAcceptorCountintegerNumber of hydrogen bond acceptors
heavyAtomCountintegerNumber of heavy (non-hydrogen) atoms
rotatablebondCountintegerNumber of rotatable bonds
synonymsarrayUp to 10 common synonyms
sourceUrlstringPubChem compound page URL
recordTypestringAlways "compound"
scrapedAtstringISO 8601 timestamp

Input

FieldTypeDefaultDescription
modestringsearchByNamesearchByName / searchBySmiles / searchByCid / fullTextSearch
compoundNamesarray[]Compound names to look up (mode=searchByName)
smilesListarray[]SMILES strings (mode=searchBySmiles)
cidsarray[]PubChem CIDs (mode=searchByCid)
searchQuerystringaspirinFree-text query (mode=fullTextSearch)
maxItemsinteger10Max compounds to return (1โ€“1000)

Example: look up common drug compounds

{
"mode":"searchByName",
"compoundNames":["aspirin","caffeine","ibuprofen","acetaminophen"],
"maxItems":4
}

Example: search by SMILES

{
"mode":"searchBySmiles",
"smilesList":["CC(=O)Oc1ccccc1C(=O)O","Cn1cnc2c1c(=O)n(c(=O)n2C)C"],
"maxItems":2
}

Example: full-text search

{
"mode":"fullTextSearch",
"searchQuery":"acetylsalicylic acid",
"maxItems":5
}

FAQs

Do I need an API key? No. PubChem's REST API is freely accessible with no authentication required.

Are there rate limits? PubChem allows up to 5 requests per second. This actor enforces a 0.2s delay between requests automatically.

How many compounds can I scrape? Up to 1000 per run. For fullTextSearch, the actor fetches matching CIDs first, then retrieves full data for each.

What is the difference between canonical and isomeric SMILES? Canonical SMILES is a standardized representation without stereochemistry. Isomeric SMILES includes stereochemical information (E/Z, R/S).

Can I search by molecular structure? Yes, use searchBySmiles mode with a valid SMILES string.

Why are some fields missing from certain compounds? Not all compounds in PubChem have complete property sets. The actor omits any field for which PubChem returns no data.

What is XLogP? XLogP3 is a computed measure of lipophilicity (fat-solubility) โ€” key for predicting drug absorption, distribution, and bioavailability.

You might also like

PubChem Compound Scraper

crawlerbros/pubchem-scraper

Scrape PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, CID, SMILES, or full-text. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, synonyms, and more.

PubChem Chemical Compound Scraper

crawlerbros/pubchem-chemical-compound-scraper

Search PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, get by CID, or fetch synonyms. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, and more. No API key required.

PubChem Compound Lookup โ€” Chemistry API for Pharma R&D

azureblue/pubchem-compound-scraper

Look up chemical compounds in PubChem by name. Returns CID, molecular formula, weight, SMILES, InChI, IUPAC name, physicochemical properties, description and synonyms.

PubChem Compound Scraper

parseforge/pubchem-compound-scraper

Export chemical compound data from PubChem, the world's largest open chemistry database with 119M+ compounds. Look up by CID, name, SMILES, or InChIKey. Pull molecular formulas, weights, structures, synonyms, IUPAC names, and properties.

PubChem Compound Scraper - Chemical & Drug Data API

pink_comic/pubchem-compound-search

Scrape NIH PubChem chemical compound data by name, formula, SMILES, or CID. Get molecular weight, IUPAC, InChI, SMILES, XLogP, synonyms, and drug data for pharma, toxicology, and R&D workflows.

ChEMBL Molecules Scraper

parseforge/chembl-molecules-scraper

Scrape molecules from EBI ChEMBL public API including SMILES, InChI, molecular properties (MW, logP, HBA, HBD, PSA, RTB), max phase, ATC classifications, oral/parenteral/topical flags, first approval, black box warning, prodrug and withdrawn flag. No API key required.

MyChem.info Drug Annotation Scraper

parseforge/mychem-drug-annotation-scraper

Resolve any drug name or InChIKey into a tidy annotation from MyChem.info. Returns DrugBank name and accession, ChEMBL and PubChem ids, UNII, ATC codes, chemical formula, molecular weight, indications, and mechanism classes. Great for drug reference tables and identifier crosswalks.