Pricing
from $0.01 / 1,000 pubmed article extracteds
PubMed Search Scraper
Search PubMed via the official NCBI API and extract article metadata, abstracts, DOI, authors, journals, MeSH terms, and keywords.
Pricing
from $0.01 / 1,000 pubmed article extracteds
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Extract PubMed article metadata, abstracts, DOI, authors, journals, MeSH terms, keywords, and publication dates from the official NCBI E-utilities API.
Use this actor when you need a repeatable PubMed literature-monitoring pipeline for biomedical research, pharma intelligence, systematic reviews, clinical-trial landscaping, academic discovery, or RAG dataset preparation.
What does PubMed Search Scraper do?
PubMed Search Scraper turns one or more PubMed search queries into a clean Apify dataset.
It uses NCBI ESearch, ESummary, and EFetch XML endpoints.
It does not scrape PubMed HTML pages.
It does not require login cookies.
It does not require browser automation.
It can run with no NCBI API key for normal public access.
Add an optional NCBI API key only when you need higher throughput.
Who is it for?
- 𧬠Medical researchers tracking new papers for a topic.
- π Pharma and biotech analysts monitoring drug, disease, biomarker, and target literature.
- π₯ Clinical evidence teams building review queues.
- π Academic labs collecting citation metadata for literature reviews.
- π€ AI and RAG teams preparing biomedical document indexes.
- π Competitive-intelligence teams watching publications by disease area, journal, or author keyword.
- π§Ύ Systematic-review teams exporting article metadata before screening.
Why use this actor?
PubMed search results are easy to inspect manually but hard to operationalize at scale.
This actor gives you structured rows with stable identifiers and metadata that are ready for export.
You can schedule it daily or weekly to monitor new papers.
You can send the dataset to Google Sheets, S3, Make, Zapier, or your own database.
You can use PubMed query syntax directly, including field tags such as [Title] or [MeSH Terms].
Data you can extract
| Field | Description |
|---|---|
pmid | PubMed identifier |
title | Article title |
abstract | Abstract text when available |
journal | Journal name |
journalIssn | ISSN from PubMed XML when available |
publicationDate | Publication date |
epubDate | Electronic publication date from ESummary |
authors | Structured author objects with affiliations when available |
authorNames | Flat author-name list |
doi | Digital Object Identifier |
articleTypes | Publication types such as Review or Clinical Trial |
meshTerms | MeSH descriptor terms |
keywords | Author keywords |
language | PubMed language code |
url | PubMed article URL |
query | Input query that produced the article |
rank | Result rank within the query |
totalResultsForQuery | Total PubMed matches reported by ESearch |
How much does it cost to scrape PubMed search results?
This actor uses pay-per-event pricing.
You pay a small start fee plus a per-result fee for each PubMed article saved to the dataset.
The default input is intentionally small so your first run is cheap.
Large literature reviews should increase maxResultsPerQuery after you confirm the query is correct.
The actor uses the public NCBI API and no proxies, so platform costs are kept low.
How to use PubMed Search Scraper
- Open
automation-lab/pubmed-search-scraperon Apify. - Enter one or more PubMed queries.
- Choose how many articles to save per query.
- Optionally set a date range.
- Optionally restrict by article type or journal.
- Decide whether to include abstracts, MeSH terms, and keywords.
- Run the actor.
- Export the dataset as JSON, CSV, Excel, XML, RSS, or through the Apify API.
Input example
{"queries":["cancer immunotherapy","machine learning radiology"],"maxResultsPerQuery":100,"sort":"pub_date","minDate":"2024/01/01","articleTypes":["Review"],"includeAbstract":true,"includeMeshTerms":true,"requestsPerSecond":3}
Output example
{"pmid":"42345602","title":"Early neutrophil infiltration promotes TRIMELVax-induced antitumor immunity...","abstract":"Enhancing innate-adaptive immune crosstalk is key...","journal":"Oncoimmunology","publicationDate":"2026-Dec-31","authors":[{"name":"Amarilis PΓ©rez-BaΓ±os"}],"doi":"10.1080/2162402X.2026.2680766","articleTypes":["Journal Article"],"meshTerms":["Animals","Neutrophils"],"keywords":["Immunotherapy","cancer vaccine"],"url":"https://pubmed.ncbi.nlm.nih.gov/42345602/","query":"cancer immunotherapy","rank":1,"source":"PubMed"}
PubMed query tips
Use normal PubMed query syntax.
Examples:
cancer immunotherapyCRISPR[Title]"machine learning"[MeSH Terms]diabetes AND metforminNature Medicine[Journal] AND oncologyCOVID-19 vaccine AND randomized controlled trial
Keep your first run small.
Check the output.
Then increase maxResultsPerQuery for production use.
Date and article-type filtering
Use minDate and maxDate to monitor new papers.
Use dateType to decide which PubMed date field is filtered.
Use articleTypes for publication types such as:
- Review
- Clinical Trial
- Randomized Controlled Trial
- Meta-Analysis
- Systematic Review
- Case Reports
Use journals to restrict results to specific journals.
Integrations
This actor works well in automated research workflows.
- π Schedule daily searches for new biomedical papers.
- π§Ύ Export CSV for review-screening tools.
- π Send article metadata to Google Sheets.
- π§ Feed abstracts and MeSH terms into RAG pipelines.
- ποΈ Store PMIDs and DOI values in a data warehouse.
- π Trigger alerts when new papers match high-value disease or drug queries.
API usage with Node.js
import{ ApifyClient }from'apify-client';const client =newApifyClient({token: process.env.APIFY_TOKEN});const run =await client.actor('automation-lab/pubmed-search-scraper').call({queries:['cancer immunotherapy'],maxResultsPerQuery:50,sort:'pub_date',includeAbstract:true});const{ items }=await client.dataset(run.defaultDatasetId).listItems();console.log(items);
API usage with Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_APIFY_TOKEN')run = client.actor('automation-lab/pubmed-search-scraper').call(run_input={'queries':['machine learning radiology'],'maxResultsPerQuery':50,'sort':'pub_date','includeAbstract':True,})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items)
API usage with cURL
curl-X POST 'https://api.apify.com/v2/acts/automation-lab~pubmed-search-scraper/runs?token=YOUR_APIFY_TOKEN'\-H'Content-Type: application/json'\-d'{"queries":["CRISPR[Title]"],"maxResultsPerQuery":25,"includeAbstract":true}'
MCP: use PubMed Search Scraper from Claude
You can call this actor through Apify MCP from Claude Code or Claude Desktop.
MCP server URL:
https://mcp.apify.com/?tools=automation-lab/pubmed-search-scraper
Claude Code setup:
$claude mcp add apify-pubmed-search https://mcp.apify.com/?tools=automation-lab/pubmed-search-scraper
Claude Desktop JSON config:
{"mcpServers":{"apify-pubmed-search":{"url":"https://mcp.apify.com/?tools=automation-lab/pubmed-search-scraper"}}}
Example prompts:
- "Search PubMed for 50 recent review articles about CAR-T adverse events and summarize the journal distribution."
- "Run the PubMed scraper for machine learning radiology since 2024 and return DOI, title, journal, and abstracts."
- "Find recent PubMed papers about GLP-1 cardiovascular outcomes and prepare a screening table."
NCBI API key and rate limits
NCBI E-utilities works without an API key for normal public use.
Without an API key, the actor caps requests at a conservative 3 requests per second.
With an API key, you can set a higher requestsPerSecond value up to 10.
Batches are used for summaries and detail XML to reduce request count.
FAQ
Does this PubMed scraper require an API key?
No. It works with the public NCBI E-utilities API. Add an optional API key only if you need higher request throughput.
Does it download full-text articles?
No. It extracts PubMed citation metadata and abstracts available through PubMed XML. It does not bypass publisher paywalls.
Troubleshooting
Why did I get zero results?
Your query may be too narrow, the date range may exclude all records, or a publication type/journal filter may not match PubMed indexing.
Try the query in PubMed directly, remove filters, and rerun with a small limit.
Why is an abstract missing?
Not every PubMed record has an abstract in the XML response.
If PubMed does not provide an abstract, the abstract field is omitted or empty.
Why are some MeSH terms missing?
Fresh records may not have MeSH indexing yet.
PubMed indexing can lag behind publication.
Legality and responsible use
This actor uses NCBI's public E-utilities API.
It does not bypass login, paywalls, or private systems.
Respect NCBI usage guidelines and keep request rates reasonable.
If you run large scheduled workflows, provide an NCBI API key and contact email.
Related scrapers
Explore related Automation Lab actors:
- https://apify.com/automation-lab/arxiv-search-scraper
- https://apify.com/automation-lab/article-content-extractor
- https://apify.com/automation-lab/google-scholar-scraper
- https://apify.com/automation-lab/website-content-crawler
Best practices
Start with one query.
Use maxResultsPerQuery around 25 for validation.
Export the dataset and inspect fields.
Then increase volume or add more queries.
Use PubMed field tags when you need precision.
Use scheduled runs for monitoring new publications.
Store PMIDs so downstream systems can deduplicate records.
Support
If a run fails, include the run ID, input JSON, and a short description of what you expected.
For query-quality questions, include the exact PubMed query and the date filters you used.
