VOOZH about

URL: https://apify.com/parseforge/chembl-molecules-scraper

โ‡ฑ ChEMBL Bioactive Molecules Scraper - Drug Discovery ยท Apify


Pricing

from $28.50 / 1,000 results

Go to Apify Store

ChEMBL Molecules Scraper

Scrape molecules from EBI ChEMBL public API including SMILES, InChI, molecular properties (MW, logP, HBA, HBD, PSA, RTB), max phase, ATC classifications, oral/parenteral/topical flags, first approval, black box warning, prodrug and withdrawn flag. No API key required.

Pricing

from $28.50 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Share

๐Ÿ‘ ParseForge Banner

๐Ÿงช ChEMBL Bioactive Molecules Scraper

๐Ÿš€ Export ChEMBL drug discovery data in seconds. Pull 2.5 million+ bioactive molecules with SMILES, InChI, ATC codes, clinical phase, and approval status. No API key, no registration, no manual REST stitching.

๐Ÿ•’ Last updated: 2026-05-13 ยท ๐Ÿ“Š 17 fields per record ยท ๐Ÿ’Š 2.5M+ molecules ยท ๐Ÿงฌ 9 molecule types ยท ๐ŸŒ EBI public API

The ChEMBL Molecules Scraper queries the EBI ChEMBL public REST API and returns 17 fields per molecule, including the canonical ChEMBL ID, preferred name, molecule type, max clinical phase, full structure descriptors (canonical SMILES, InChI, InChI Key), calculated molecular properties (molecular weight, LogP, hydrogen-bond donors and acceptors, polar surface area, rotatable bonds, Lipinski Rule of Five violations), ATC classifications, route of administration flags, first-approval year, and withdrawn status. ChEMBL is maintained by the European Bioinformatics Institute and is one of the largest manually curated databases of bioactive molecules in drug discovery.

The catalog covers small molecules, antibodies, enzymes, proteins, oligonucleotides, oligosaccharides, cells, genes, and unknowns, totalling more than 2.5 million entries. This Actor makes the data downloadable as CSV, Excel, JSON, or XML in under a minute. The molecule type filter runs server-side, so antibody-only or small-molecule-only exports are fast.

๐ŸŽฏ Target Audience๐Ÿ’ก Primary Use Cases
Cheminformaticians, drug discovery scientists, computational chemists, pharma data teams, ML researchers, bioinformaticians, academic labs, regulatory analystsQSAR datasets, virtual screening libraries, ADMET feature tables, ATC mapping, clinical-phase tracking, approved-drug audits, withdrawn-drug watchlists

๐Ÿ“‹ What the ChEMBL Molecules Scraper does

Two filtering workflows in a single run:

  • ๐Ÿ”Ž Full-text query. Substring match across molecule names and synonyms (e.g. aspirin, imatinib, bevacizumab).
  • ๐Ÿงฌ Type filter. Server-side filter on molecule_type. Pick from small molecule, antibody, enzyme, protein, oligonucleotide, oligosaccharide, cell, gene, or unknown.
  • ๐Ÿ“œ Paginated catalog dump. Leave both filters empty to walk the entire ChEMBL catalog by offset.

Each record returns the canonical ChEMBL ID, the public explorer URL, the structure block (SMILES, InChI, InChI Key, molfile) when present, the property block (MW, LogP, HBA, HBD, PSA, RTB, full MWT, Rule-of-Five violations), the molecule hierarchy (active / parent / salt), the ATC classifications array, administration route flags (oral, parenteral, topical), the black-box-warning flag, the first-approval year, the withdrawn flag, and the prodrug flag.

๐Ÿ’ก Why it matters: ChEMBL underpins most modern drug discovery pipelines. Building your own REST pagination, retry logic, and field selection means a week of plumbing. This Actor returns clean, joined records on every run.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded molecule dataset.


โš™๏ธ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
querystring"aspirin"Substring text search across molecule names and synonyms. Empty = list all by offset.
moleculeTypestring""One of 9 ChEMBL molecule types (Small molecule, Antibody, Cell, Enzyme, Gene, Oligonucleotide, Oligosaccharide, Protein, Unknown). Empty = all.

Example: 50 approved antibody therapies (server-side type filter).

{
"maxItems":50,
"moleculeType":"Antibody"
}

Example: text query for everything starting with imatinib.

{
"maxItems":25,
"query":"imatinib"
}

โš ๏ธ Good to Know: antibodies, proteins, and cells have no SMILES or InChI because they are macromolecules. The molecule_structures and molecule_properties blocks are omitted for these types and the record stays clean. Small molecules return the full property block. ChEMBL max_phase follows the convention 4 = approved, 3 = phase III, 2 = phase II, 1 = phase I, 0.5 = preclinical, null = unknown.


๐Ÿ“Š Output

Each molecule record contains up to 17 fields. Download the dataset as CSV, Excel, JSON, or XML.

๐Ÿงพ Schema

FieldTypeExample
๐Ÿ†” molecule_chembl_idstring"CHEMBL1201580"
๐Ÿ”— urlstring"https://www.ebi.ac.uk/chembl/explore/compound/CHEMBL1201580"
๐Ÿท๏ธ pref_namestring | null"ADALIMUMAB"
๐Ÿงฌ molecule_typestring | null"Antibody"
๐ŸŽฏ max_phasenumber | null4
๐Ÿงช molecule_structuresobject{ canonical_smiles, standard_inchi, standard_inchi_key, molfile }
๐Ÿ“ molecule_propertiesobject{ mw_freebase, alogp, hba, hbd, psa, rtb, full_mwt, num_ro5_violations }
๐ŸŒณ molecule_hierarchyobject | null{ active_chembl_id, parent_chembl_id, molecule_chembl_id }
๐Ÿฅ atc_classificationsstring[]["L04AB04"]
๐Ÿ’Š indication_classstring"Antineoplastic"
๐Ÿ‘„ oralboolean | nullfalse
๐Ÿ’‰ parenteralboolean | nulltrue
๐Ÿงด topicalboolean | nullfalse
โš ๏ธ black_box_warningnumber | null1
๐Ÿ“… first_approvalnumber | null2002
๐Ÿšซ withdrawn_flagboolean | nullfalse
๐Ÿงฌ prodrugnumber | null0
๐Ÿ•’ scrapedAtISO 8601"2026-05-13T22:26:22.480Z"

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

Capability
๐ŸงชMassive coverage. 2.5M+ bioactive molecules curated by EBI scientists.
๐ŸŽฏServer-side type filter. Antibody-only, small-molecule-only, or protein-only exports run fast at the API level.
๐ŸงฌFull structure block. Canonical SMILES, InChI, InChI Key, and molfile in one place.
๐Ÿ“Calculated properties. MW, LogP, HBA, HBD, PSA, RTB, full MWT, and Rule-of-Five violations precomputed by ChEMBL.
๐ŸฅClinical context. Max phase, ATC class, route of administration, first-approval year, and withdrawn flag.
โšกFast. Paginated REST with retry, returns 100 molecules per request.
๐ŸšซNo authentication. Works on the public EBI API. No login or API key.

๐Ÿ“Š ChEMBL is one of the most cited databases in cheminformatics literature. Accurate molecule metadata drives QSAR models, ADMET pipelines, and clinical-phase analytics.


๐Ÿ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
โญ ChEMBL Molecules Scraper (this Actor)$5 free credit, then pay-per-use2.5M+ moleculesLive per runtext query, molecule typeโšก 2 min
Hand-rolled REST scriptsFreeFull ChEMBLManualNone unless you build them๐Ÿข Days
DrugBank commercial license$$$/yearSubset, drug-onlyCuratedManyโณ Hours
Open Targets GraphQLFreeDrug-target focusLiveManyโณ Hours

Pick this Actor when you want broad cheminformatics coverage, server-side type filtering, and no pipeline maintenance.


๐Ÿš€ How to use

  1. ๐Ÿ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. ๐ŸŒ Open the Actor. Go to the ChEMBL Bioactive Molecules Scraper page on the Apify Store.
  3. ๐ŸŽฏ Set input. Pick a molecule type, enter a text query, and set maxItems.
  4. ๐Ÿš€ Run it. Click Start and let the Actor collect your data.
  5. ๐Ÿ“ฅ Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

โฑ๏ธ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


๐Ÿ’ผ Business use cases

๐Ÿ’Š Pharma & Biotech R&D

  • QSAR and ADMET model training sets
  • Virtual screening libraries by molecule class
  • Competitive intelligence on clinical-phase assets
  • Approved-drug audits for repurposing

๐Ÿงฌ Cheminformatics & Data Science

  • SMILES libraries for fingerprint pipelines
  • Lipinski Rule of Five compliance dashboards
  • Property distribution analyses for lead optimization
  • Joins with ChEMBL bioactivity tables

๐Ÿฅ Regulatory & Pharmacovigilance

  • Withdrawn-drug watchlists with year-of-approval context
  • ATC classification mapping for therapeutic-area reporting
  • Black-box-warning audits across portfolios
  • Route-of-administration filtering for safety review

๐Ÿค– ML & AI for Drug Discovery

  • Training sets for generative chemistry models
  • Feature tables for activity-prediction models
  • Multi-modal datasets joining structure and clinical metadata
  • Benchmark suites for new architectures

๐Ÿ”Œ Automating ChEMBL Molecules Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • ๐ŸŸข Node.js. Install the apify-client NPM package.
  • ๐Ÿ Python. Use the apify-client PyPI package.
  • ๐Ÿ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep your local cheminformatics warehouse in sync with EBI ChEMBL releases.


๐ŸŒŸ Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

๐ŸŽ“ Research and academia

  • Reproducible cheminformatics studies with versioned dataset pulls
  • Teaching datasets for QSAR and medicinal chemistry coursework
  • Open-source ADMET benchmark publications
  • Cross-database joins with UniProt, PubChem, and PDB

๐ŸŽจ Personal and creative

  • Indie chemistry visualization apps
  • Educational dashboards for science communication
  • Drug-of-the-week newsletters and content research
  • Hobbyist molecule explorers

๐Ÿค Non-profit and civic

  • Neglected-disease pipeline mapping
  • Open-science drug repurposing initiatives
  • Public-domain pharmacology references
  • Civic transparency on approved-drug catalogs

๐Ÿงช Experimentation

  • Train molecular property predictors
  • Prototype agentic tools that resolve ChEMBL IDs
  • Benchmark cheminformatics libraries on real data
  • Generate molecule embeddings at scale

๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


โ“ Frequently Asked Questions

๐Ÿงฉ How does it work?

Set a molecule-type filter or a text query in the input form, click Start, and the Actor calls the EBI ChEMBL REST API with server-side pagination. Records are emitted as clean, joined JSON ready for download or piping into a warehouse. No browser automation, no captchas, no setup.

๐Ÿ’Š Where does the data come from?

Directly from the EBI ChEMBL public REST API at www.ebi.ac.uk/chembl/api/data/molecule. ChEMBL is maintained by the European Bioinformatics Institute.

๐Ÿงฌ Why are SMILES and InChI missing for some molecules?

Antibodies, proteins, cells, oligonucleotides, and oligosaccharides do not have small-molecule structure descriptors. SMILES and InChI are only meaningful for small molecules, so ChEMBL omits them for macromolecules. Our output reflects that by skipping the molecule_structures block for these types.

๐ŸŽฏ What does max_phase mean?

It is the highest clinical development phase a molecule has reached. 4 = approved, 3 = phase III, 2 = phase II, 1 = phase I, 0.5 = preclinical, null = unknown or pre-clinical without a recorded phase.

๐Ÿฅ What is the ATC classification?

The Anatomical Therapeutic Chemical classification system from the World Health Organization. ChEMBL maps approved drugs to their ATC codes. A molecule can carry several ATC codes when it is indicated across therapeutic areas.

๐Ÿ” How often is ChEMBL updated?

EBI releases new ChEMBL versions roughly every 6 to 12 months. Every run of this Actor hits the live API, so your dataset reflects the current ChEMBL release at run time.

โฐ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (weekly, monthly) and keep a downstream cheminformatics database in sync.

โš–๏ธ Is this data legal to use?

ChEMBL is released under a Creative Commons Attribution-ShareAlike license. The raw molecule data is publicly accessible. Review the ChEMBL license terms for your specific use case, especially for commercial redistribution.

๐Ÿ’ณ Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and unlocks scheduling, higher concurrency, and larger datasets.

๐Ÿงช What if I need bioactivity data?

This Actor returns molecule-level records only. For activities, IC50 values, and target bindings, reach out via the contact form below to request a companion ChEMBL activities scraper.

๐Ÿ†˜ What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


๐Ÿ”Œ Integrate with any app

ChEMBL Molecules Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe molecule data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh molecule batches into your product backend, or alert your team in Slack.


๐Ÿ”— Recommended Actors

๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


๐Ÿ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


โš ๏ธ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by ChEMBL, the European Bioinformatics Institute, or EMBL-EBI. All trademarks mentioned are the property of their respective owners. Only publicly available open ChEMBL data is collected.

You might also like

ChEMBL Compounds Scraper

parseforge/chembl-compounds-scraper

Browse the ChEMBL bioactive molecule catalogue by max clinical phase from preclinical through approved drugs. Returns molecule identifiers, molecular weight, standard InChI, and structural data. Paginate by molregno. Useful for drug discovery, cheminformatics, and pharma research.

ChEMBL Targets Scraper

parseforge/chembl-targets-scraper

Query the ChEMBL target catalog by ID, keyword, organism, or target type. Records include target ChEMBL ID, preferred name, organism, target type, gene symbol, tax ID, components with accession and description, and cross references. Useful for drug discovery research and target review.

PubChem Compound Scraper

crawlerbros/pubchem-scraper

Scrape PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, CID, SMILES, or full-text. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, synonyms, and more.

PubChem Compound Scraper

crawlergang/pubchem-scraper

Scrape PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, CID, SMILES, or full-text. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, synonyms, and more.

1

5.0

PubChem Chemical Compound Scraper

crawlerbros/pubchem-chemical-compound-scraper

Search PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, get by CID, or fetch synonyms. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, and more. No API key required.

PubChem Compound Scraper - Chemical & Drug Data API

pink_comic/pubchem-compound-search

Scrape NIH PubChem chemical compound data by name, formula, SMILES, or CID. Get molecular weight, IUPAC, InChI, SMILES, XLogP, synonyms, and drug data for pharma, toxicology, and R&D workflows.

PubChem Compound Lookup โ€” Chemistry API for Pharma R&D

azureblue/pubchem-compound-scraper

Look up chemical compounds in PubChem by name. Returns CID, molecular formula, weight, SMILES, InChI, IUPAC name, physicochemical properties, description and synonyms.

ChEMBL Assays Scraper

parseforge/chembl-assays-scraper

Query the ChEMBL assay catalog by assay ID, target, keyword, type, organism, or confidence score. Records carry assay ID, description, type, category, organism, strain, tissue, target, document, confidence score, BAO label, and relationship type. Useful for drug discovery research.

MyChem.info Drug Annotation Scraper

parseforge/mychem-drug-annotation-scraper

Resolve any drug name or InChIKey into a tidy annotation from MyChem.info. Returns DrugBank name and accession, ChEMBL and PubChem ids, UNII, ATC codes, chemical formula, molecular weight, indications, and mechanism classes. Great for drug reference tables and identifier crosswalks.