VOOZH about

URL: https://apify.com/ellustar/my-actor-56

โ‡ฑ Python Scrapy template ยท Apify


Pricing

from $0.01 / 1,000 results

Go to Apify Store

Python Scrapy template

Template is a ready-to-use actor for building fast, scalable data extraction workflows

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Ellustar

Ellustar

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

5 months ago

Last modified

Share

Python Scrapy template

A template example built with Scrapy to scrape page titles from URLs defined in the input parameter. It shows how to use Apify SDK for Python and Scrapy pipelines to save results.

Included features

  • Apify SDK for Python - a toolkit for building Apify Actors and scrapers in Python
  • Input schema - define and easily validate a schema for your Actor's input
  • Request queue - queues into which you can put the URLs you want to scrape
  • Dataset - store structured data where each object stored has the same attributes
  • Scrapy - a fast high-level web scraping framework

How it works

This code is a Python script that uses Scrapy to scrape web pages and extract data from them. Here's a brief overview of how it works:

  • The script reads the input data from the Actor instance, which is expected to contain a start_urls key with a list of URLs to scrape.
  • The script then creates a Scrapy spider that will scrape the URLs. This Spider (class TitleSpider) is storing URLs and titles.
  • Scrapy pipeline is used to save the results to the default dataset associated with the Actor run using the push_data method of the Actor instance.
  • The script catches any exceptions that occur during the web scraping process and logs an error message using the Actor.log.exception method.

Resources

Getting started

For complete information see this article. In short, you will:

  1. Build the Actor
  2. Run the Actor

Pull the Actor for local development

If you would like to develop locally, you can pull the existing Actor from Apify console using Apify CLI:

  1. Install apify-cli

    Using Homebrew

    $brew install apify-cli

    Using NPM

    $npm-ginstall apify-cli
  2. Pull the Actor by its unique <ActorId>, which is one of the following:

    • unique name of the Actor to pull (e.g. "apify/hello-world")
    • or ID of the Actor to pull (e.g. "E2jjCZBezvAZnX8Rb")

    You can find both by clicking on the Actor title at the top of the page, which will open a modal containing both Actor unique name and Actor ID.

    This command will copy the Actor into the current directory on your local machine.

    $apify pull <ActorId>

Documentation reference

To learn more about Apify and Actors, take a look at the following resources:

You might also like

Python Scrapy template

ellustar/python-scrapy-template

โ€œA ready-to-use Python Scrapy template designed for building fast and scalable data extraction actors. Includes a clean project structure, example spiders, settings configuration, and best practices to help developers quickly create, customize, and deploy Scrapy-based workflows.โ€

Python Scraper Template

ellustar/my-actor-80

A ready-to-use Python Scrapy template actor for Apify Store. It helps developers quickly build, deploy, and scale web scraping projects with structured settings, proxy support, data extraction examples, and seamless Apify platform integration.

Python BeautifulSoup template

ellustar/my-actor-5

Python BeautifulSoup Actor Template: Streamline web scraping with this ready-to-use Python template. Effortlessly extract, parse, and manage data from websites using BeautifulSoup, with clean code, reusable functions, and flexible structure for fast, efficient automation projects.

Python Crawlee & BeautifulSoup Actor Template

ellustar/my-actor-29

Python Crawlee & BeautifulSoup Actor Template: A versatile web automation and scraping actor template designed for Python developers. Harness Crawlee for scalable crawling and BeautifulSoup for precise HTML parsing, enabling efficient data extraction, automation, and web interaction workflows.

Python Empty Template

ellustar/my-actor-32

**Python Empty Template is a minimal starter actor for building Python-based automations and scrapers on Apify. It provides a clean structure, basic input/output handling, and integration with the Apify Python SDK, letting you quickly create custom workflows.**

Python Crawlee & BeautifulSoup Actor Template

ellustar/my-actor-23

A ready-to-use Apify actor template combining Python, Crawlee, and BeautifulSoup to build scalable web scrapers. Easily crawl websites, extract structured data, handle pagination, and customize logic for scraping tasks with clean, extensible Python code. I

Python Playwright template

ellustar/my-actor-59

A ready-to-use Python Playwright template for building reliable web automation and scraping actors. Includes clean project structure, browser setup, async support, and best practices to help you quickly launch, customize, and scale Playwright-based workflows.โ€

Related articles

Web scraping with Scrapy 101
Read more
5 Scrapy alternatives for web scraping
Read more
Handling data in Scrapy: databases and pipelines
Read more