ChEMBL Molecules Scraper

Pricing

from $28.50 / 1,000 results

ChEMBL Molecules Scraper

Scrape molecules from EBI ChEMBL public API including SMILES, InChI, molecular properties (MW, logP, HBA, HBD, PSA, RTB), max phase, ATC classifications, oral/parenteral/topical flags, first approval, black box warning, prodrug and withdrawn flag. No API key required.

Pricing

from $28.50 / 1,000 results

Rating

0.0

(0)

Developer

👁 ParseForge

ParseForge

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

🧪 ChEMBL Bioactive Molecules Scraper

🚀 Export ChEMBL drug discovery data in seconds. Pull 2.5 million+ bioactive molecules with SMILES, InChI, ATC codes, clinical phase, and approval status. No API key, no registration, no manual REST stitching.

🕒 Last updated: 2026-05-13 · 📊 17 fields per record · 💊 2.5M+ molecules · 🧬 9 molecule types · 🌐 EBI public API

The ChEMBL Molecules Scraper queries the EBI ChEMBL public REST API and returns 17 fields per molecule, including the canonical ChEMBL ID, preferred name, molecule type, max clinical phase, full structure descriptors (canonical SMILES, InChI, InChI Key), calculated molecular properties (molecular weight, LogP, hydrogen-bond donors and acceptors, polar surface area, rotatable bonds, Lipinski Rule of Five violations), ATC classifications, route of administration flags, first-approval year, and withdrawn status. ChEMBL is maintained by the European Bioinformatics Institute and is one of the largest manually curated databases of bioactive molecules in drug discovery.

The catalog covers small molecules, antibodies, enzymes, proteins, oligonucleotides, oligosaccharides, cells, genes, and unknowns, totalling more than 2.5 million entries. This Actor makes the data downloadable as CSV, Excel, JSON, or XML in under a minute. The molecule type filter runs server-side, so antibody-only or small-molecule-only exports are fast.

🎯 Target Audience	💡 Primary Use Cases
Cheminformaticians, drug discovery scientists, computational chemists, pharma data teams, ML researchers, bioinformaticians, academic labs, regulatory analysts	QSAR datasets, virtual screening libraries, ADMET feature tables, ATC mapping, clinical-phase tracking, approved-drug audits, withdrawn-drug watchlists

📋 What the ChEMBL Molecules Scraper does

Two filtering workflows in a single run:

🔎 Full-text query. Substring match across molecule names and synonyms (e.g. aspirin, imatinib, bevacizumab).
🧬 Type filter. Server-side filter on molecule_type. Pick from small molecule, antibody, enzyme, protein, oligonucleotide, oligosaccharide, cell, gene, or unknown.
📜 Paginated catalog dump. Leave both filters empty to walk the entire ChEMBL catalog by offset.

Each record returns the canonical ChEMBL ID, the public explorer URL, the structure block (SMILES, InChI, InChI Key, molfile) when present, the property block (MW, LogP, HBA, HBD, PSA, RTB, full MWT, Rule-of-Five violations), the molecule hierarchy (active / parent / salt), the ATC classifications array, administration route flags (oral, parenteral, topical), the black-box-warning flag, the first-approval year, the withdrawn flag, and the prodrug flag.

💡 Why it matters: ChEMBL underpins most modern drug discovery pipelines. Building your own REST pagination, retry logic, and field selection means a week of plumbing. This Actor returns clean, joined records on every run.

🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded molecule dataset.

⚙️ Input

Input	Type	Default	Behavior
maxItems	integer	10	Records to return. Free plan caps at 10, paid plan at 1,000,000.
query	string	"aspirin"	Substring text search across molecule names and synonyms. Empty = list all by offset.
moleculeType	string	""	One of 9 ChEMBL molecule types (Small molecule, Antibody, Cell, Enzyme, Gene, Oligonucleotide, Oligosaccharide, Protein, Unknown). Empty = all.

Example: 50 approved antibody therapies (server-side type filter).

{
"maxItems":50,
"moleculeType":"Antibody"
}

Example: text query for everything starting with imatinib.

{
"maxItems":25,
"query":"imatinib"
}

⚠️ Good to Know: antibodies, proteins, and cells have no SMILES or InChI because they are macromolecules. The molecule_structures and molecule_properties blocks are omitted for these types and the record stays clean. Small molecules return the full property block. ChEMBL max_phase follows the convention 4 = approved, 3 = phase III, 2 = phase II, 1 = phase I, 0.5 = preclinical, null = unknown.

📊 Output

Each molecule record contains up to 17 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

Field	Type	Example
🆔 `molecule_chembl_id`	string	`"CHEMBL1201580"`
🔗 `url`	string	`"https://www.ebi.ac.uk/chembl/explore/compound/CHEMBL1201580"`
🏷️ `pref_name`	string \| null	`"ADALIMUMAB"`
🧬 `molecule_type`	string \| null	`"Antibody"`
🎯 `max_phase`	number \| null	`4`
🧪 `molecule_structures`	object	`{ canonical_smiles, standard_inchi, standard_inchi_key, molfile }`
📐 `molecule_properties`	object	`{ mw_freebase, alogp, hba, hbd, psa, rtb, full_mwt, num_ro5_violations }`
🌳 `molecule_hierarchy`	object \| null	`{ active_chembl_id, parent_chembl_id, molecule_chembl_id }`
🏥 `atc_classifications`	string[]	`["L04AB04"]`
💊 `indication_class`	string	`"Antineoplastic"`
👄 `oral`	boolean \| null	`false`
💉 `parenteral`	boolean \| null	`true`
🧴 `topical`	boolean \| null	`false`
⚠️ `black_box_warning`	number \| null	`1`
📅 `first_approval`	number \| null	`2002`
🚫 `withdrawn_flag`	boolean \| null	`false`
🧬 `prodrug`	number \| null	`0`
🕒 `scrapedAt`	ISO 8601	`"2026-05-13T22:26:22.480Z"`

📦 Sample records

✨ Why choose this Actor

	Capability
🧪	Massive coverage. 2.5M+ bioactive molecules curated by EBI scientists.
🎯	Server-side type filter. Antibody-only, small-molecule-only, or protein-only exports run fast at the API level.
🧬	Full structure block. Canonical SMILES, InChI, InChI Key, and molfile in one place.
📐	Calculated properties. MW, LogP, HBA, HBD, PSA, RTB, full MWT, and Rule-of-Five violations precomputed by ChEMBL.
🏥	Clinical context. Max phase, ATC class, route of administration, first-approval year, and withdrawn flag.
⚡	Fast. Paginated REST with retry, returns 100 molecules per request.
🚫	No authentication. Works on the public EBI API. No login or API key.

📊 ChEMBL is one of the most cited databases in cheminformatics literature. Accurate molecule metadata drives QSAR models, ADMET pipelines, and clinical-phase analytics.

📈 How it compares to alternatives

Approach	Cost	Coverage	Refresh	Filters	Setup
⭐ ChEMBL Molecules Scraper (this Actor)	$5 free credit, then pay-per-use	2.5M+ molecules	Live per run	text query, molecule type	⚡ 2 min
Hand-rolled REST scripts	Free	Full ChEMBL	Manual	None unless you build them	🐢 Days
DrugBank commercial license	$$$/year	Subset, drug-only	Curated	Many	⏳ Hours
Open Targets GraphQL	Free	Drug-target focus	Live	Many	⏳ Hours

Pick this Actor when you want broad cheminformatics coverage, server-side type filtering, and no pipeline maintenance.

🚀 How to use

📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
🌐 Open the Actor. Go to the ChEMBL Bioactive Molecules Scraper page on the Apify Store.
🎯 Set input. Pick a molecule type, enter a text query, and set maxItems.
🚀 Run it. Click Start and let the Actor collect your data.
📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.

💼 Business use cases

💊 Pharma & Biotech R&D

QSAR and ADMET model training sets
Virtual screening libraries by molecule class
Competitive intelligence on clinical-phase assets
Approved-drug audits for repurposing

🧬 Cheminformatics & Data Science

SMILES libraries for fingerprint pipelines
Lipinski Rule of Five compliance dashboards
Property distribution analyses for lead optimization
Joins with ChEMBL bioactivity tables

🏥 Regulatory & Pharmacovigilance

Withdrawn-drug watchlists with year-of-approval context
ATC classification mapping for therapeutic-area reporting
Black-box-warning audits across portfolios
Route-of-administration filtering for safety review

🤖 ML & AI for Drug Discovery

Training sets for generative chemistry models
Feature tables for activity-prediction models
Multi-modal datasets joining structure and clinical metadata
Benchmark suites for new architectures

🔌 Automating ChEMBL Molecules Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

🟢 Node.js. Install the apify-client NPM package.
🐍 Python. Use the apify-client PyPI package.
📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep your local cheminformatics warehouse in sync with EBI ChEMBL releases.

🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

Reproducible cheminformatics studies with versioned dataset pulls
Teaching datasets for QSAR and medicinal chemistry coursework
Open-source ADMET benchmark publications
Cross-database joins with UniProt, PubChem, and PDB

🎨 Personal and creative

Indie chemistry visualization apps
Educational dashboards for science communication
Drug-of-the-week newsletters and content research
Hobbyist molecule explorers

🤝 Non-profit and civic

Neglected-disease pipeline mapping
Open-science drug repurposing initiatives
Public-domain pharmacology references
Civic transparency on approved-drug catalogs

🧪 Experimentation

Train molecular property predictors
Prototype agentic tools that resolve ChEMBL IDs
Benchmark cheminformatics libraries on real data
Generate molecule embeddings at scale

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

❓ Frequently Asked Questions

🧩 How does it work?

Set a molecule-type filter or a text query in the input form, click Start, and the Actor calls the EBI ChEMBL REST API with server-side pagination. Records are emitted as clean, joined JSON ready for download or piping into a warehouse. No browser automation, no captchas, no setup.

💊 Where does the data come from?

Directly from the EBI ChEMBL public REST API at www.ebi.ac.uk/chembl/api/data/molecule. ChEMBL is maintained by the European Bioinformatics Institute.

🧬 Why are SMILES and InChI missing for some molecules?

Antibodies, proteins, cells, oligonucleotides, and oligosaccharides do not have small-molecule structure descriptors. SMILES and InChI are only meaningful for small molecules, so ChEMBL omits them for macromolecules. Our output reflects that by skipping the molecule_structures block for these types.

🎯 What does `max_phase` mean?

It is the highest clinical development phase a molecule has reached. 4 = approved, 3 = phase III, 2 = phase II, 1 = phase I, 0.5 = preclinical, null = unknown or pre-clinical without a recorded phase.

🏥 What is the ATC classification?

The Anatomical Therapeutic Chemical classification system from the World Health Organization. ChEMBL maps approved drugs to their ATC codes. A molecule can carry several ATC codes when it is indicated across therapeutic areas.

🔁 How often is ChEMBL updated?

EBI releases new ChEMBL versions roughly every 6 to 12 months. Every run of this Actor hits the live API, so your dataset reflects the current ChEMBL release at run time.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (weekly, monthly) and keep a downstream cheminformatics database in sync.

⚖️ Is this data legal to use?

ChEMBL is released under a Creative Commons Attribution-ShareAlike license. The raw molecule data is publicly accessible. Review the ChEMBL license terms for your specific use case, especially for commercial redistribution.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and unlocks scheduling, higher concurrency, and larger datasets.

🧪 What if I need bioactivity data?

This Actor returns molecule-level records only. For activities, IC50 values, and target bindings, reach out via the contact form below to request a companion ChEMBL activities scraper.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.

🔌 Integrate with any app

ChEMBL Molecules Scraper connects to any cloud service via Apify integrations:

Make - Automate multi-step workflows
Zapier - Connect with 5,000+ apps
Slack - Get run notifications in your channels
Airbyte - Pipe molecule data into your warehouse
GitHub - Trigger runs from commits and releases
Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh molecule batches into your product backend, or alert your team in Slack.

🔗 Recommended Actors

🏥 FINRA BrokerCheck Scraper - U.S. broker and firm regulatory disclosures
🤗 Hugging Face Model Scraper - Model metadata, downloads, and benchmarks
🏨 Greatschools Scraper - U.S. school ratings and demographics
📈 Smart Apify Actor Scraper - Apify Store actor metadata and quality signals

💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.

🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.

⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by ChEMBL, the European Bioinformatics Institute, or EMBL-EBI. All trademarks mentioned are the property of their respective owners. Only publicly available open ChEMBL data is collected.

👁 ChEMBL Compounds Scraper avatar

ChEMBL Compounds Scraper

parseforge/chembl-compounds-scraper

Browse the ChEMBL bioactive molecule catalogue by max clinical phase from preclinical through approved drugs. Returns molecule identifiers, molecular weight, standard InChI, and structural data. Paginate by molregno. Useful for drug discovery, cheminformatics, and pharma research.

👁 User avatar

ParseForge

👁 ChEMBL Targets Scraper avatar

ChEMBL Targets Scraper

parseforge/chembl-targets-scraper

Query the ChEMBL target catalog by ID, keyword, organism, or target type. Records include target ChEMBL ID, preferred name, organism, target type, gene symbol, tax ID, components with accession and description, and cross references. Useful for drug discovery research and target review.

👁 User avatar

ParseForge

👁 PubChem Compound Scraper avatar

PubChem Compound Scraper

crawlerbros/pubchem-scraper

Scrape PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, CID, SMILES, or full-text. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, synonyms, and more.

👁 User avatar

Crawler Bros

👁 PubChem Compound Scraper avatar

PubChem Compound Scraper

crawlergang/pubchem-scraper

👁 User avatar

Crawler Gang

5.0

👁 PubChem Chemical Compound Scraper avatar

PubChem Chemical Compound Scraper

crawlerbros/pubchem-chemical-compound-scraper

Search PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, get by CID, or fetch synonyms. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, and more. No API key required.

👁 User avatar

Crawler Bros

👁 PubChem Compound Scraper - Chemical & Drug Data API avatar

PubChem Compound Scraper - Chemical & Drug Data API

pink_comic/pubchem-compound-search

Scrape NIH PubChem chemical compound data by name, formula, SMILES, or CID. Get molecular weight, IUPAC, InChI, SMILES, XLogP, synonyms, and drug data for pharma, toxicology, and R&D workflows.

👁 User avatar

Ava Torres

👁 PubChem Compound Lookup — Chemistry API for Pharma R&D avatar

PubChem Compound Lookup — Chemistry API for Pharma R&D

azureblue/pubchem-compound-scraper

Look up chemical compounds in PubChem by name. Returns CID, molecular formula, weight, SMILES, InChI, IUPAC name, physicochemical properties, description and synonyms.

👁 User avatar

azureblue

PSA Population Report Lookup

lulzasaur/psa-pop-scraper

Look up PSA card cert details and full population report. Returns grade breakdown (Auth through PSA 10) by cert number or spec ID. Uses PSA official API.

👁 User avatar

lulz bot

119

👁 ChEMBL Assays Scraper avatar

ChEMBL Assays Scraper

parseforge/chembl-assays-scraper

Query the ChEMBL assay catalog by assay ID, target, keyword, type, organism, or confidence score. Records carry assay ID, description, type, category, organism, strain, tissue, target, document, confidence score, BAO label, and relationship type. Useful for drug discovery research.

👁 User avatar

ParseForge

👁 MyChem.info Drug Annotation Scraper avatar

MyChem.info Drug Annotation Scraper

parseforge/mychem-drug-annotation-scraper

Resolve any drug name or InChIKey into a tidy annotation from MyChem.info. Returns DrugBank name and accession, ChEMBL and PubChem ids, UNII, ATC codes, chemical formula, molecular weight, indications, and mechanism classes. Great for drug reference tables and identifier crosswalks.

👁 User avatar

ParseForge

URL: https://apify.com/parseforge/chembl-molecules-scraper

⇱ ChEMBL Bioactive Molecules Scraper - Drug Discovery · Apify

ChEMBL Molecules Scraper

🧪 ChEMBL Bioactive Molecules Scraper

📋 What the ChEMBL Molecules Scraper does

🎬 Full Demo

⚙️ Input

📊 Output

🧾 Schema

📦 Sample records

✨ Why choose this Actor

📈 How it compares to alternatives

🚀 How to use

💼 Business use cases

💊 Pharma & Biotech R&D

🧬 Cheminformatics & Data Science

🏥 Regulatory & Pharmacovigilance

🤖 ML & AI for Drug Discovery

🔌 Automating ChEMBL Molecules Scraper

🌟 Beyond business use cases

🎓 Research and academia

🎨 Personal and creative

🤝 Non-profit and civic

🧪 Experimentation

🤖 Ask an AI assistant about this scraper

❓ Frequently Asked Questions

🧩 How does it work?

💊 Where does the data come from?

🧬 Why are SMILES and InChI missing for some molecules?

🎯 What does max_phase mean?

🏥 What is the ATC classification?

🔁 How often is ChEMBL updated?

⏰ Can I schedule regular runs?

⚖️ Is this data legal to use?

💳 Do I need a paid Apify plan to use this Actor?

🧪 What if I need bioactivity data?

🆘 What if I need help?

🔌 Integrate with any app

🔗 Recommended Actors

You might also like

ChEMBL Compounds Scraper

ChEMBL Targets Scraper

PubChem Compound Scraper

PubChem Compound Scraper

PubChem Chemical Compound Scraper

PubChem Compound Scraper - Chemical & Drug Data API

PubChem Compound Lookup — Chemistry API for Pharma R&D

PSA Population Report Lookup

ChEMBL Assays Scraper

MyChem.info Drug Annotation Scraper

🎯 What does `max_phase` mean?