Arxiv Paper Scraper

Pricing

$1.00 / 1,000 papers

Try for free

Go to Apify Store

👁 Arxiv Paper Scraper

Arxiv Paper Scraper

Try for free

Pricing

$1.00 / 1,000 papers

Rating

0.0

(0)

Developer

👁 Technical Dost Solutions

Technical Dost Solutions

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

Scrape single-page in JavaScript template

A template for scraping data from a single web page in JavaScript (Node.js). The URL of the web page is passed in via input, which is defined by the input schema. The template uses the Axios client to get the HTML of the page and the Cheerio library to parse the data from it. The data are then stored in a dataset where you can easily access them.

The scraped data in this template are page headings but you can easily edit the code to scrape whatever you want from the page.

Included features

Apify SDK - toolkit for building Actors
Input schema - define and easily validate a schema for your Actor's input
Dataset - store structured data where each object stored has the same attributes
Axios client - promise-based HTTP Client for Node.js and the browser
Cheerio - library for parsing and manipulating HTML and XML

How it works

Actor.getInput() gets the input where the page URL is defined
axios.get(url) fetches the page
cheerio.load(response.data) loads the page data and enables parsing the headings

This parses the headings from the page and here you can edit the code to parse whatever you need from the page

$("h1, h2, h3, h4, h5, h6").each((_i, element)=>{...});

Actor.pushData(headings) stores the headings in the dataset

Resources

Web scraping in Node.js with Axios and Cheerio
Web scraping with Cheerio in 2023
Video tutorial on building a scraper using CheerioCrawler
Written tutorial on building a scraper using CheerioCrawler
Integration with Zapier, Make, Google Drive, and others
Video guide on getting data using Apify API
A short guide on how to build web scrapers using code templates:

Getting started

For complete information see this article. To run the Actor use the following command:

$apify run

Deploy to Apify

Connect Git repository to Apify

If you've created a Git repository for the project, you can easily connect to Apify:

Go to Actor creation page
Click on Link Git Repository button

Push project on your local machine to Apify

You can also deploy the project on your local machine to Apify without the need for the Git repository.

Log in to Apify. You will need to provide your Apify API Token to complete this action.
```
$apify login
```
Deploy your Actor. This command will deploy and build the Actor on the Apify Platform. You can find your newly created Actor under Actors -> My Actors.
```
$apify push
```

Documentation reference

To learn more about Apify and Actors, take a look at the following resources:

👁 arXiv Paper-to-JSON scraper avatar

arXiv Paper-to-JSON scraper

funny_electrician/Korak1904

arXiv Paper-to-JSON scraper: Extracts equations, tables, and text from new AI research papers.

👁 User avatar

Milton Gardener

👁 ArXiv Research Paper Scraper avatar

ArXiv Research Paper Scraper

datapilot/arxiv-research-paper-scraper

arXiv Research Paper Scraper retrieves academic paper metadata from the arXiv API based on a keyword. It extracts titles, abstracts, authors with affiliations, DOI, categories, submission dates, and PDF links. Supports proxy usage and outputs structured JSON results for research and data analysis.

👁 User avatar

Data Pilot

👁 arXiv Paper Scraper avatar

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

👁 User avatar

Daniel

👁 arXiv Search Scraper 📚 avatar

arXiv Search Scraper 📚

easyapi/arxiv-search-scraper

Extract comprehensive research paper data from arXiv search results. Get detailed metadata including titles, authors, abstracts, categories and more. Perfect for academic research monitoring, trend analysis and building paper databases. 🎓📚

👁 User avatar

EasyApi

arXiv Search & Paper Scraper

scrapeworks/arxiv-search

Search arXiv and get clean structured JSON for each paper: title, authors, abstract, categories, DOI, PDF link, and dates. Built for research, datasets, and AI pipelines.

👁 User avatar

Nicolas van Arkens

👁 arXiv Research Paper Scraper avatar

arXiv Research Paper Scraper

crawlerbros/arxiv-research-paper-scraper

Scrape research papers from arXiv.org - search by query, category, or author; lookup by arXiv ID. Returns title, authors, abstract, PDF URL, DOI, categories, and more. Uses the public arXiv Atom API. No login or proxy required.

👁 User avatar

Crawler Bros

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

👁 User avatar

cloud9

👁 arXiv Scraper — Search & Export Paper Metadata avatar

arXiv Scraper — Search & Export Paper Metadata

devilscrapes/arxiv-papers-scraper

Search arXiv by query, category, or author and export structured paper metadata — title, authors, abstract, primary category, DOI, PDF URL, submitted and updated timestamps — to JSON or CSV. An arXiv API wrapper that handles pagination, retries, and rate-limit pacing for your pipeline.

👁 User avatar

DevilScrapes

arXiv Paper Scraper

lulzasaur/arxiv-scraper

Search and scrape arXiv academic papers. Get titles, authors, abstracts, categories, PDF links, DOIs. Search by keyword, browse recent papers by category, or fetch by arXiv ID.

👁 User avatar

lulz bot

arXiv Paper Scraper

skystone_labs/arxiv-scraper

Extract research papers from arXiv using the official API. Get titles, authors, abstracts, PDF URLs, categories, and more. Perfect for research datasets and literature reviews.

👁 User avatar

Skystone

URL: https://apify.com/technicaldost/arxiv-paper-scraper

⇱ Arxiv Paper Scraper · Apify

Arxiv Paper Scraper

Scrape single-page in JavaScript template

Included features

How it works

Resources

Getting started

Deploy to Apify

Connect Git repository to Apify

Push project on your local machine to Apify

Documentation reference

You might also like

arXiv Paper-to-JSON scraper

ArXiv Research Paper Scraper

arXiv Paper Scraper

arXiv Search Scraper 📚

arXiv Search & Paper Scraper

arXiv Research Paper Scraper

arXiv Paper Scraper

arXiv Scraper — Search & Export Paper Metadata

arXiv Paper Scraper

arXiv Paper Scraper