VOOZH about

URL: https://apify.com/themineworks/pubmed-ncbi-scraper

⇱ PubMed NCBI Scraper - Biomedical Articles & MeSH (No Key) Β· Apify


πŸ‘ PubMed NCBI Scraper - Biomedical Articles & MeSH (No Key) avatar

PubMed NCBI Scraper - Biomedical Articles & MeSH (No Key)

Pricing

$2.00 / 1,000 records

Go to Apify Store

PubMed NCBI Scraper - Biomedical Articles & MeSH (No Key)

Scrape 36M+ PubMed/NCBI biomedical articles: title, abstract, authors, journal, PMID, DOI, MeSH terms. No API key needed. Build literature reviews & AI training corpora. Works in Claude, ChatGPT & any MCP agent.

Pricing

$2.00 / 1,000 records

Rating

0.0

(0)

Developer

πŸ‘ The Mine Works

The Mine Works

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

1

Monthly active users

a day ago

Last modified

Share

PubMed / NCBI β€” Biomedical Literature Search

Search the world's largest biomedical literature database from Apify. Access 36 million+ peer-reviewed articles from PubMed and MEDLINE β€” titles, abstracts, authors, journals, PMIDs, DOIs, and MeSH controlled vocabulary terms β€” without any API key. An optional free NCBI API key unlocks higher throughput.

Why This Actor?

PubMed is the definitive source of record for biomedical and life sciences research, maintained by the National Library of Medicine (NLM) at NIH. With 36 million+ articles spanning decades of research across medicine, pharmacology, genomics, neuroscience, oncology, and every other life science domain, it is the first stop for:

  • Pharma and biotech researchers conducting competitive intelligence, target identification, or systematic literature reviews
  • AI and machine learning teams building training corpora, biomedical NLP models, or question-answering systems that require structured scientific text
  • Systematic review authors collecting all studies matching a clinical PICO question for meta-analysis
  • Biotech investors and analysts tracking publication volume, author networks, and research momentum in a therapeutic area
  • Academic departments monitoring publications from collaborating institutions or specific research groups

This actor wraps the official NCBI E-utilities API (esearch + efetch endpoints at eutils.ncbi.nlm.nih.gov) and delivers clean structured JSON β€” one article per dataset row β€” with full MeSH term arrays for downstream semantic analysis.

PubMed Query Syntax

The actor supports PubMed's full advanced query language. Use field tags in brackets to target specific fields:

TagFieldExample
[ti]Article titleCRISPR[ti]
[au]Author nameSmith J[au]
[ta]Journal abbreviationNature[ta]
[mh]MeSH headingNeoplasms[mh]
[dp]Publication date2023:2024[dp]
[pt]Publication typeReview[pt]

Combine with boolean operators: GLP-1 receptor agonist[ti] AND diabetes[mh] AND 2022:2024[dp]

Inputs

FieldTypeDescriptionDefault
querystringPubMed search query with optional field tagsGLP-1 receptor agonist diabetes
dateFromstringPublished from date (YYYY/MM/DD)2020/01/01
dateTostringPublished to date (YYYY/MM/DD)β€”
ncbiApiKeystringOptional free NCBI key (lifts rate limit 3x to 10 req/sec)β€”
maxResultsintegerMaximum articles to return (1–10,000)100

Output Format

Each article is stored as one item in the Apify dataset:

{
"pmid":"38234567",
"title":"Efficacy of GLP-1 receptor agonists in type 2 diabetes: a systematic review",
"abstract":"Background: GLP-1 receptor agonists have emerged as...",
"authors":["Smith Jane A","Chen Robert B","Patel Anita K"],
"journal":"The Lancet Diabetes & Endocrinology",
"issn":"2213-8587",
"year":"2024",
"doi":"10.1016/S2213-8587(24)00123-4",
"mesh_terms":["Glucagon-Like Peptide-1 Receptor","Diabetes Mellitus, Type 2","Hypoglycemic Agents"],
"url":"https://pubmed.ncbi.nlm.nih.gov/38234567/",
"scraped_at":"2024-11-15T09:22:11.000Z"
}

A summary record is appended at the end with total article count and run timestamp.

MeSH Terms

Medical Subject Headings (MeSH) are NLM's controlled vocabulary for indexing biomedical literature. Every PubMed article is manually tagged with MeSH descriptors by NLM indexers. The mesh_terms array in each output record contains these structured tags, which are ideal for:

  • Semantic clustering of articles by disease area
  • Building ontology-aligned training data for biomedical NLP models
  • Identifying related concepts the author did not use in the title or abstract

Pricing

First 25 results are free on every Apify account β€” no charge until you exceed the free tier.

After the free tier: $4 per 1,000 articles (Pay-Per-Event billing). A 1,000-article run costs $4.00. A 10,000-article run costs $40.00. You are charged only for articles actually delivered.

Frequently Asked Questions

Q: Do I need an NCBI API key? No. The E-utilities API is freely accessible without authentication. However, without a key you are limited to 3 requests per second. A free NCBI API key (available at ncbi.nlm.nih.gov/account/settings/) increases this to 10 requests per second, making large runs significantly faster. Enter your key in the ncbiApiKey input field.

Q: How current is the data? PubMed is updated daily with new articles, corrections, and MeSH annotations. Articles typically appear in PubMed within days to weeks of publication, depending on the journal's submission practices.

Q: Can I retrieve full text? The actor retrieves titles, abstracts, and metadata. Full text is not available via the E-utilities API for most articles β€” full text access depends on publisher agreements. For open-access articles, the DOI in the output can be used to retrieve full text via PubMed Central (PMC) or the publisher.

Q: What is the maximum number of articles I can retrieve? The actor supports up to 10,000 articles per run. For larger literature sets, use date range filters (dateFrom/dateTo) or narrower query terms to partition your retrieval across multiple runs.

Q: Can I search by specific author or institution? Yes. Use the [au] tag for author names (e.g. Smith JA[au]) and the [ad] affiliation tag for institutions (e.g. Harvard[ad]). Multiple authors or institutions can be combined with OR: (Smith JA[au] OR Chen RB[au]).

Q: How does the actor handle rate limits? The actor automatically respects NCBI's rate limits by inserting delays between requests β€” 340ms without an API key (staying safely under 3 req/sec) and 110ms with an API key. Automatic retry with exponential backoff handles transient 429 and 5xx errors.

Q: Are preprints included in PubMed results? PubMed indexes peer-reviewed articles from MEDLINE-indexed journals. Preprints on bioRxiv/medRxiv are generally not indexed in PubMed. Use the arXiv or bioRxiv scrapers for preprint coverage.

Use in Claude, ChatGPT & any MCP agent

This actor is also a Model Context Protocol (MCP) server tool β€” call it directly from Claude, ChatGPT, Cursor, Windsurf, or any MCP-compatible AI agent. The agent only pays for results delivered (same pay-per-result model).

  • Per-actor MCP endpoint: https://mcp.apify.com/?tools=themineworks/pubmed-ncbi-scraper
  • Full Mine Works MCP server (all tools): https://the-mine-works-mcp.hatchable.site/api/mcp
// Call this actor as a tool via apify-client (Node)
import{ ApifyClient }from'apify-client';
const client =newApifyClient({token:'YOUR_APIFY_TOKEN'});
const run =await client.actor('themineworks/pubmed-ncbi-scraper').call({/* input from the table above */});
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

You might also like

PubMed Search Scraper

crawlerbros/pubmed-search-scraper

Search PubMed (NCBI E-utilities) for biomedical articles by keyword, date range, and article type. Returns title, authors, journal, abstract, DOI, MeSH terms, keywords, and citation. Free public API, no proxy, no cookies. Optional NCBI API key for higher rate limits.

PubMed Scraper β€” Papers, DOI & MeSH to JSON

devilscrapes/pubmed-papers-scraper

Search PubMed by query and export structured paper rows β€” title, authors, abstract, journal, DOI, PMID, MeSH terms, publication date β€” to JSON or CSV. A clean PubMed API wrapper that handles NCBI pagination, rate limits, and retries for research and ML pipelines.

🧬 PubMed Scraper - Biomedical Literature & Citations

benthepythondev/pubmed-scraper

PubMed Scraper for the official NCBI PubMed API. Search 37M+ biomedical citations; extract title, authors, journal, publication date, DOI, PMID, article type and links. Supports PubMed field tags and sorting. For systematic reviews, medical research and bibliometrics. Keyless and fast.

PubMed Search Scraper

automation-lab/pubmed-search-scraper

Search PubMed via the official NCBI API and extract article metadata, abstracts, DOI, authors, journals, MeSH terms, and keywords.

πŸ‘ User avatar

Stas Persiianenko

2

PubMed Biomedical Paper Scraper

brilliant_gum/pubmed-scraper

Scrapes PubMed biomedical papers using the official NCBI Entrez API. Extracts full metadata including abstracts, MeSH terms, authors with affiliations, citations, grants, and more. Includes smart analytics for author networks, topic trends, and geographic distribution.

πŸ‘ User avatar

Yuliia Kulakova

4

PubMed & NCBI Databases API

alizarin_refrigerator-owner/pubmed-ncbi-databases-api

Access PubMed and NCBI databases for biomedical literature. Search 36+ million citations, get article abstracts, citation metrics, author profiles, and journal data. Essential for scientific research and pharma market intelligence.