maikelvanmaurik/schrapert

Maintainers

👁 maikelvanmaurik

Package info

github.com/schrapert/framework

pkg:composer/maikelvanmaurik/schrapert

Statistics

Installs: 10

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

v0.0.3 2020-04-01 11:41 UTC

Requires

Requires (Dev)

Suggests

None

Provides

None

Conflicts

None

Replaces

None

MIT 7238230cc34325a693aed66ef305d38e46e3c131

  • Maikel van Maurik <maikelvanmaurik.woop@gmail.com>

This package is auto-updated.

Last update: 2026-06-16 10:54:36 UTC


README

Schrapert is a scraping/crawler library which is inspired by scrapy. It makes use of React for various operations such as downloading requests and writing files.

Example of a simple spider:

namespace Crawl;
use Schrapert\Spider;
use Schrapert\Crawl\ResponseInterface;
use Schrapert\Http\ResponseInterface as HttpResponse;
use Schrapert\Http\Request as HttpRequest;
use DOMDocument;
use DOMXPath;
use DOMElement;
class BlogSpider extends Spider
{ 
 public function parse(ResponseInterface $response)
 {
 if(!$response instanceof HttpResponse) {
 return;
 }
 $doc = new DOMDocument('1.0');
 $doc->loadHTML((string)$response->getBody());
 $xpath = new DOMXPath($doc);
 $nodes = $xpath->query('//a');
 foreach($nodes as $node) {
 /* @var $node DOMElement */
 $uri = $this->uri->join($node->getAttribute('href'), $response->getUri());
 yield new HttpRequest($uri);
 }
 }
}