VOOZH about

URL: https://apify.com/crawlerbros/osf-preprints-scraper

⇱ OSF Preprints Scraper Β· Apify


Pricing

from $3.00 / 1,000 results

Go to Apify Store

OSF Preprints Scraper

This actor extracts preprint metadata from OSF's preprint archive, which hosts over 190,000 open-access scholarly works across disciplines including psychology, medicine, social sciences, engineering, and more. It supports filtering by tags, subjects, and provider, as well as direct ID-based lookup.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

14 days ago

Last modified

Share

Scrape preprints from the Open Science Framework (OSF) using its public REST API β€” no authentication or proxy required.

What It Does

This actor extracts preprint metadata from OSF's preprint archive, which hosts over 190,000 open-access scholarly works across disciplines including psychology, medicine, social sciences, engineering, and more. It supports filtering by tags, subjects, and provider, as well as direct ID-based lookup.

Key Features

  • No authentication required β€” uses the public OSF API
  • Two modes: search/browse preprints or fetch specific ones by ID
  • Filter by tags, subjects, or provider (e.g., PsyArXiv, SocArXiv, MedArXiv)
  • Pagination handled automatically β€” retrieves up to 1,000 records per run
  • Clean structured output with camelCase field names

Input Fields

FieldTypeDescription
modeSelectsearchPreprints (default) or getById
searchQueryStringFilter preprints by tag (e.g. machine learning)
subjectFilterStringFilter by subject text (e.g. Medicine and Health Sciences)
providerStringFilter by provider (e.g. psyarxiv, socarxiv, osf)
preprintIdsArrayList of OSF preprint IDs (for getById mode)
maxItemsIntegerMax number of results (1–1000, default 50)

Provider Examples

Popular OSF preprint providers you can filter by:

Provider IDDescription
osfGeneral OSF preprints
psyarxivPsychology
socarxivSocial sciences
medarxivMedicine
eartharxivEarth sciences
engrxivEngineering
biorxivBiology
ecsarxivElectrochemical Society

Output Fields

Each item in the dataset contains:

FieldTypeDescription
preprintIdStringUnique OSF preprint ID (e.g. abc12_v2)
titleStringTitle of the preprint
descriptionStringAbstract or summary
doiStringDigital Object Identifier
datePublishedStringPublication date (ISO 8601)
dateCreatedStringCreation date (ISO 8601)
dateModifiedStringLast modified date (ISO 8601)
tagsArrayAuthor-assigned tags
isPublishedBooleanWhether the preprint is publicly published
providerStringProvider ID (e.g. psyarxiv)
subjectsArraySubject classifications
licenseStringLicense name (e.g. CC-By Attribution 4.0)
sourceUrlStringDirect URL to the preprint on OSF
recordTypeStringAlways "preprint"
scrapedAtStringTimestamp when the record was scraped

Example Output

{
"preprintId":"snveb_v2",
"title":"Beyond the Resume: Comparing the Predictive Power of Personality Assessments",
"description":"This study examines employee turnover prediction using machine learning...",
"doi":"10.31234/osf.io/snveb_v2",
"datePublished":"2026-05-26T13:58:36.783000Z",
"dateCreated":"2026-05-25T09:31:34.214181Z",
"dateModified":"2026-05-26T13:58:36.814700Z",
"tags":["Machine learning","Employee turnover","Explainable AI"],
"isPublished":true,
"provider":"psyarxiv",
"subjects":["Industrial and Organizational Psychology","Quantitative Methods"],
"sourceUrl":"https://osf.io/preprints/psyarxiv/snveb_v2/",
"recordType":"preprint",
"scrapedAt":"2026-05-30T10:00:00.000000+00:00"
}

Use Cases

  • Academic research: Track preprints in specific fields
  • Literature reviews: Collect papers by subject or tag for systematic reviews
  • Trend analysis: Monitor publication rates by subject over time
  • Citation tracking: Gather DOIs for downstream citation analysis
  • Content aggregation: Build databases of open-access scholarly works

FAQs

Q: Does this require an API key? A: No. The OSF public API is freely accessible without authentication.

Q: How many results can I get? A: Up to 1,000 per run. OSF has 190,000+ preprints total.

Q: Can I filter by date? A: Not directly via this actor's inputs. You can filter by tag and subject, then sort results by datePublished in post-processing.

Q: What's the difference between providers? A: Different academic communities host preprint servers on OSF (e.g., PsyArXiv for psychology). Using the provider filter restricts results to that community.

Q: Are all preprints peer-reviewed? A: No β€” preprints are pre-peer-review. The isPublished field indicates OSF server acceptance, not journal peer review.

Q: How current is the data? A: The OSF API returns live data. New preprints appear within hours of submission.

You might also like

OSF Open Science Framework Scraper

parseforge/osf-scraper

Export public research projects, preprints, and registrations from the Open Science Framework (OSF). Search across 1M+ open science records. Filter by keyword, subject, or provider. Pull titles, descriptions, tags, DOIs, authors, institutions, dates, and full metadata.

bioRxiv & medRxiv Preprint Scraper

crawlergang/biorxiv-medrxiv-scraper

Scrape preprints from bioRxiv and medRxiv with the leading open-access preprint servers for biology and medicine. Search by date range, fetch by DOI, or retrieve published journal version information.

2

5.0

bioRxiv & medRxiv Preprint Scraper

crawlerbros/biorxiv-medrxiv-scraper

Scrape preprints from bioRxiv and medRxiv with the leading open-access preprint servers for biology and medicine. Search by date range, fetch by DOI, or retrieve published journal version information.

ArXiv Preprint Paper Search

ryanclinton/arxiv-paper-search

Search and extract preprint research papers from the ArXiv open-access repository. Query over 2.4 million academic papers across physics, mathematics, computer science, biology, economics, and more with structured JSON output, no API key required.

19

medRxiv Scraper

parseforge/medrxiv-scraper

Extract comprehensive preprint data from medRxiv, including titles, authors, abstracts, full text, DOIs, citations, and metadata. Automate access to health-science preprints with structured outputs, ideal for researchers and analysts who need reliable, large-scale article data without manual work.

arXiv Preprint Scraper

parseforge/arxiv-scraper

Export preprints from arXiv.org. Search 2.5M+ open-access papers across physics, mathematics, computer science, biology, economics, and quantitative finance. Query by keyword, author, category, or date range. Pull titles, authors, abstracts, categories, DOIs, journal refs, and PDF links.

17

5.0

OSF Open Science Framework Projects Scraper

parseforge/osf-projects-scraper

Search the Open Science Framework for public research projects by keyword or category. Returns project IDs, titles, descriptions, contributors, public flags, date created, date modified, and tag lists. Useful for meta science, scholarly discovery, and tracking research outputs across labs.

Open Library Scraper

parseforge/open-library-scraper

Comprehensive scraper for Open Library to extract books, authors, subjects, and list data from the Internet Archive’s platform. Supports multiple search types and ebook filtering, providing automated, structured access to Open Library’s extensive bibliographic collection.

12

5.0

Engineering Email Scraper

contacts-api/engineering-email-scraper

Engineering email scraper to extract verified engineer and engineering company emails from business directories, firm websites, and industry listings πŸ“§βš™οΈ Perfect for B2B outreach, recruitment, and engineering lead generation.