VOOZH about

URL: https://apify.com/parseforge/crossref-scraper

โ‡ฑ Crossref DOI Metadata Scraper ยท Apify


Pricing

Pay per event

Go to Apify Store

Crossref DOI Metadata Scraper

Export citation metadata for 155M+ DOIs from the Crossref Works API. Every published research paper, book chapter, conference proceeding, and dataset with a DOI. Search by query, filter by publisher, funder, type, or year range.

Pricing

Pay per event

Rating

5.0

(1)

Developer

๐Ÿ‘ ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

0

Monthly active users

22 days ago

Last modified

Share

๐Ÿ‘ ParseForge Banner

๐Ÿ“– Crossref DOI Metadata Scraper

๐Ÿš€ Extract citation metadata for 155M+ DOIs from Crossref in seconds. Search by query, filter by title, author, or DOI. No coding, no API keys required.

๐Ÿ•’ Last updated: 2026-04-23 ยท ๐Ÿ“Š 30+ fields ยท ๐Ÿ“š 155M+ DOIs indexed ยท ๐Ÿ” Title, author, and free-text search

Pull structured records from Crossref DOI Metadata โ€” clean fields ready as CSV, JSON, JSONL, Excel, or XML for downstream pipelines.

Copy to your AI assistant

Copy this block into ChatGPT, Claude, Cursor, or any LLM to start using this actor.

parseforge/crossref-scraper on Apify. Call:ApifyClient("TOKEN").actor("parseforge/crossref-scraper").call(run_input={...}), then client.dataset(run["defaultDatasetId"]).list_items().items for results. Key inputs:maxItems(integer,default10),query(string,default"attention is all you need"),queryTitle(string),queryAuthor(string),filter(string),doi(string). Full actor spec: fetch build via GEThttps://api.apify.com/v2/acts/parseforge~crossref-scraper(Bearer TOKEN). Get token: https://console.apify.com/account/integrations

Crossref is the largest DOI registration agency, indexing over 155 million research papers, book chapters, conference proceedings, datasets, and preprints. This scraper connects to the Crossref Works API and returns structured citation metadata including titles, authors, publication dates, journals, DOIs, citation counts, abstracts, license information, and funding details. Whether you need metadata for a single DOI or want to search across the entire Crossref database, the scraper handles pagination and rate limiting automatically.

Researchers, librarians, and data analysts use this actor to build citation databases, verify publication records, analyze research trends, and enrich existing datasets with DOI metadata. Instead of querying the Crossref API manually and parsing JSON responses, you get clean, structured data exported as JSON, CSV, or Excel. Every record includes the full title, all authors with ORCID IDs when available, journal name, volume, issue, pages, publication date, license, funder information, and reference lists.

๐ŸŽฏ Target Audience๐Ÿ’ก Use Cases
Academic researchersBuild citation databases for literature reviews
University librariansVerify and enrich publication records
Bibliometric analystsAnalyze citation patterns and research impact
Data scientistsEnrich datasets with DOI metadata
PublishersTrack citations and references across journals
Grant managersVerify publication records from funded research

๐Ÿ“‹ What the Crossref Scraper does

  • ๐Ÿ” Free-text search across titles, authors, and container titles in the 155M+ DOI database
  • ๐Ÿ“ Title-specific search to find publications matching exact title keywords
  • ๐Ÿ‘ค Author search to find all works by a specific researcher
  • ๐ŸŽฏ Single DOI lookup to fetch full metadata for a specific publication
  • ๐Ÿ”ง Filter strings to narrow results by type, date, ORCID, publisher, and more
  • ๐Ÿ“ง Polite pool access by providing an email for faster Crossref response times

The scraper queries the Crossref Works API, retrieves matching records, and extracts full citation metadata for each item. Results include the publication title, all authors (with ORCID IDs), journal or container title, volume, issue, pages, publication dates, DOI, license info, funder details, reference count, citation count, and direct links. Each record is timestamped and includes the content type (journal-article, book-chapter, etc.).

๐Ÿ’ก Why it matters: Crossref's API returns complex nested JSON that requires parsing. This scraper flattens and normalizes the data, delivering clean records ready for spreadsheets, databases, or analysis tools. Add your email to get routed to Crossref's faster "polite pool."


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon...


โš™๏ธ Input

FieldTypeRequiredDescription
maxItemsintegerNoMax records to collect. Free: up to 10. Paid: up to 1,000,000
querystringNoFree text search across titles, authors, and journals
queryTitlestringNoMatch only within publication titles
queryAuthorstringNoMatch by author name
filterstringNoCrossref filter string (e.g., "type:journal-article,from-pub-date:2024")
doistringNoFetch metadata for a single DOI (overrides query)
emailstringNoYour email for Crossref's faster "polite pool"

Example 1: Free-text search

{
"query":"attention is all you need",
"maxItems":10
}

Example 2: Filtered author search with date range

{
"queryAuthor":"Hinton, Geoffrey",
"filter":"type:journal-article,from-pub-date:2020",
"email":"your@email.com",
"maxItems":100
}

โš ๏ธ Good to Know: Providing an email address routes your requests to Crossref's "polite pool," which has faster response times and higher rate limits. The filter field accepts Crossref filter syntax. See Crossref API docs for all available filter options.


๐Ÿ“Š Output

๐Ÿงพ Schema

EmojiFieldTypeDescription
๐Ÿ“titlestringFull publication title
๐Ÿ‘ฅauthorsarrayAuthor names with ORCID IDs when available
๐Ÿ“…publishedDatestringPublication date
๐Ÿ“–containerTitlestringJournal or book title
๐Ÿ†”doistringDigital Object Identifier
๐Ÿ”—urlstringDirect URL to the publication
๐Ÿ“ŠvolumestringJournal volume
๐Ÿ“„issuestringJournal issue
๐Ÿ“pagesstringPage range
๐ŸขpublisherstringPublisher name
๐Ÿท๏ธtypestringContent type (journal-article, book-chapter, etc.)
๐Ÿ“ŠcitationCountnumberNumber of times cited
๐Ÿ“‹referenceCountnumberNumber of references in the work
๐Ÿ“„abstractstringAbstract text (when available)
๐Ÿ†”issnarrayISSN identifiers
๐Ÿ†”isbnarrayISBN identifiers
โš–๏ธlicensearrayLicense information and URLs
๐Ÿ’ฐfunderarrayFunding organizations and grant numbers
๐Ÿท๏ธsubjectarraySubject classifications
๐Ÿ“…depositedDatestringDate deposited in Crossref
๐Ÿ“…indexedDatestringDate indexed by Crossref
๐Ÿ”—referencesarrayList of referenced DOIs
โฐscrapedAtstringCollection timestamp
โš ๏ธerrorstringError message if processing failed

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

FeatureDetails
๐Ÿ“š 155M+ DOIsAccess the full Crossref database of research publications
๐Ÿ” Multi-field searchQuery by title, author, free text, or specific DOI
๐Ÿ“Š Citation countsTrack how many times each work has been cited
๐Ÿ’ฐ Funding dataIdentify funders and grant numbers for each publication
๐Ÿ†” ORCID supportAuthor identifiers included when available
โš–๏ธ License infoKnow the access rights for each publication
๐Ÿ“ง Polite poolFaster responses when you provide an email address

๐Ÿ“Š Search across 155M+ DOIs and collect up to 1,000,000 records per run with full citation metadata.


๐Ÿ“ˆ How it compares to alternatives

FeatureThis ActorManual API CallsGeneric Scrapers
Automatic paginationโœ…ManualโŒ
Polite pool routingโœ…ManualโŒ
Citation count includedโœ…โœ…โŒ
Funding data extractionโœ…โœ…โŒ
Structured JSON/CSV outputโœ…JSON onlyVaries
Bulk collection (1M+ records)โœ…ManualโŒ
Scheduled recurring runsโœ…โŒโŒ

Get structured citation metadata at scale without writing API code or managing pagination.


๐Ÿš€ How to use

  1. Create an Apify account - Sign up free with $5 credit
  2. Open the Crossref DOI Metadata Scraper - Navigate to the actor page on Apify
  3. Enter your search query - Type keywords, an author name, or a specific DOI
  4. Add optional filters - Set date range, publication type, or provide your email for faster responses
  5. Click Start - The actor collects matching records and delivers structured citation data

โฑ๏ธ A typical run with 10 records completes in under 30 seconds.


๐Ÿ’ผ Business use cases

๐ŸŽ“ Academic Research
  • Build citation databases for systematic reviews
  • Track publication records for tenure evaluations
  • Analyze citation patterns across research fields
  • Verify DOI metadata for bibliographies
๐Ÿ“Š Bibliometric Analysis
  • Measure research impact by citation count
  • Map collaboration networks through co-authorship
  • Track publication trends by subject area
  • Compare publisher output across disciplines
๐Ÿ“š Library Services
  • Enrich catalog records with DOI metadata
  • Verify publication details for acquisitions
  • Build subject-specific reference collections
  • Track open access availability by license type
๐Ÿ’ฐ Research Funding
  • Verify publication records of grant applicants
  • Track outputs from funded research programs
  • Identify high-impact journals for publication strategies
  • Monitor open access compliance by funder


๐ŸŒŸ Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

๐ŸŽ“ Research and academia

  • Empirical datasets for papers, thesis work, and coursework
  • Longitudinal studies tracking changes across snapshots
  • Reproducible research with cited, versioned data pulls
  • Classroom exercises on data analysis and ethical scraping

๐ŸŽจ Personal and creative

  • Side projects, portfolio demos, and indie app launches
  • Data visualizations, dashboards, and infographics
  • Content research for bloggers, YouTubers, and podcasters
  • Hobbyist collections and personal trackers

๐Ÿค Non-profit and civic

  • Transparency reporting and accountability projects
  • Advocacy campaigns backed by public-interest data
  • Community-run databases for local issues
  • Investigative journalism on public records

๐Ÿงช Experimentation

  • Prototype AI and machine-learning pipelines with real data
  • Validate product-market hypotheses before engineering spend
  • Train small domain-specific models on niche corpora
  • Test dashboard concepts with live input

๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

โ“ Frequently Asked Questions

๐Ÿ”Œ Automating Crossref Scraper

Integrate the Crossref Scraper into your workflow using the Apify API or client libraries.

Node.js:

import{ ApifyClient }from'apify-client';
const client =newApifyClient({token:'YOUR_API_TOKEN'});
const run =await client.actor("parseforge/crossref-scraper").call({
query:"attention is all you need",
maxItems:50,
email:"your@email.com"
});
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python:

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("parseforge/crossref-scraper").call(run_input={
"query":"attention is all you need",
"maxItems":50,
"email":"your@email.com"
})
items =list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(items)

Schedules: Set up recurring runs to track new publications matching your query, monitor citation count changes, or build growing bibliometric datasets. Configure daily, weekly, or monthly schedules from the Apify Console.

๐Ÿ”Œ Integrate with any app

  • ๐Ÿ”— Make (Integromat) - Connect citation data to Google Sheets, Notion, or any of 1,500+ apps
  • ๐Ÿ”— Zapier - Trigger workflows when new citation records are collected
  • ๐Ÿ”— Slack - Get notified when a Crossref data run completes
  • ๐Ÿ”— Airbyte - Stream citation metadata into your data warehouse
  • ๐Ÿ”— GitHub - Store citation datasets in repositories for version control
  • ๐Ÿ”— Google Drive - Automatically save CSV exports to shared folders

๐Ÿ”— Recommended Actors

ActorDescription
PubMed Citation ScraperExtract publication metadata from PubMed for biomedical research
OpenCitations ScraperCollect citation networks and bibliographic metadata
Open Library ScraperSearch and download book data from the Internet Archive
NASA Reports ScraperCollect technical reports from NASA's NTRS database
ROR ScraperCollect research organization data from the Research Organization Registry

๐Ÿ’ก Pro Tip: Combine the Crossref Scraper with the PubMed Scraper to get both citation metadata and full biomedical abstracts for the same publications.


๐Ÿ†˜ Need Help? Open our contact form and we will get back to you within 24 hours. We are happy to help with custom setups, integrations, or feature requests.


Disclaimer: This actor is not affiliated with, endorsed by, or connected to Crossref. It accesses publicly available data through the Crossref Works API. Use responsibly and in accordance with Crossref's Metadata Terms of Use.

You might also like

Crossref Scraper

crawlerbros/crossref-scraper

Scrape Crossref, the world's largest DOI registry. Search 130M+ scholarly works, fetch by DOI, filter by date / type / journal, and pull authors, references, citation counts, ISSN, ORCIDs, and more.

Crossref Works Extractor

xtracto/crossref-works

Extract scholarly publication metadata from Crossref โ€” one work per row, with DOI, title, authors, publisher, type, dates, and references. 183M+ works. Public data, no key.

๐Ÿ‘ User avatar

Farhan Febrian Nauval

2

Crossref Academic Paper Search

ryanclinton/crossref-paper-search

Search over 150 million scholarly works indexed by Crossref -- the largest open registry of DOI metadata in the world. Retrieve structured publication data including titles, authors with ORCID identifiers, citation counts, journal names, funding information, abstracts, and more. No API key required.