mnb/scraperkit
MNB ScraperKit V1.0.3 - enterprise-ready PHP crawling and data extraction framework with AI crawl intelligence, search discovery, authorized mail/webmail extraction connectors, publisher metadata workflows, extraction recipes, provenance, quality reports, datasets, queues, dashboards, and compliance
Maintainers
Requires
- php: >=8.2
- ext-dom: *
- ext-json: *
- ext-mbstring: *
- symfony/console: ^6.4 || ^7.0
Requires (Dev)
None
Suggests
- ext-curl: Recommended for the cURL HTTP engine. The stream/file_get_contents engine works without cURL when allow_url_fopen is enabled.
- ext-openssl: Recommended for secure HTTPS transport and checksum/signature workflows.
- ext-pdo: Needed only for database storage features.
- ext-redis: Optional Redis extension for distributed multi-worker queue mode. File-based distributed queue fallback works without Redis.
- chrome/chromium: Needed only when using browser-assisted crawling through Panther/Chrome.
- pdo_mysql: Optional MySQL/MariaDB driver for server database storage.
- pdo_sqlite: Optional SQLite driver for local database storage.
- php-ai/php-ml: Optional machine-learning toolkit for future model training/inference. V1.0.3 includes deterministic ML-ready features, dataset annotations, evaluation reports, rule-builder assisted profile creation, training-ready exports, and export connector delivery manifests without requiring this dependency.
- symfony/panther: Optional browser-assisted crawling adapter for JavaScript-rendered pages. Normal PHP HTTP crawling works without it.
Provides
None
Conflicts
None
Replaces
None
MIT cb7e33819f745b435106ac1aa060388f55c38fda
annotationsqueuephpcsvapiclipluginsmysqlsqlitecronSitemapannotationmonitoringpdoXpathJSON-LDbenchmarkingrssscrapercrawleropengraphimapautomationdashboardpipelinechromereportsdatasetrbacworkspaceschedulingdoischedulerworkersmariadbenterpriseISSNmlwebhooksaccess-controlintelligencejob-queueEvaluationrest-apiadmin-dashboardJSON-APIworkspacesproject-templateoperationsdatasetsmulti-usermachine-learningplugin-systemsymfony-consoleredis-queuePantherphp-mltask-schedulerweb-scraping feature-extractiondata-extractionhealth-checkcomplianceexportsaudit-logdatabase-storagecss-selectorsrule-builderbrowser-sessionsdashboard-uirole-based-accesspersistent-storageheadless-browserjsonlsecurity-auditxml-exportjavascript-renderingdata-qualitygmail-apisource-connectorshtml-reportzip-bundleprofile-schemasextractor-rulesschema-driven-extractionretry-queuebrowser-assisted-crawlingfallback-crawlingspa-crawlingcrawl-historyadvanced-retryretry-policysafe-retrylocal-schedulesstale-locksconfig-pluginsprofile-addonsextensible-scrapingplugin-manifestwebhook-notificationslocal-dashboardautomation-apihealth-check-apioperations-dashboardqueue-dashboardscheduler-dashboardhtml-dashboardpage-classificationselector-suggestionsquality-predictionurl-priorityml-readydataset-versioningtraining-datadataset-diffdataset-evaluationfield-qualityselector-evaluationprofile-benchmarkannotation-coveragetraining-readinesstraining-data-qualitytraining-ready-exportauto-profileprofile-assistantrule-generationselector-testingprofile-scaffoldextraction-rules-assistantauthorized-crawlingmanual-login-assistsession-profilescookie-sessionauthenticated-crawldomain-guarded-sessionsdistributed-workersdistributed-queueredis-workersjob-leasingworker-heartbeatsmulti-workerdistributed-crawlingexport-connectorsexport-deliverydelivery-manifestchecksum-manifestwebhook-exportartifact-deliverydownstream-automationproject-templatespreset-packsworkflow-templatesstarter-projectsscraping-templatestemplate-scaffoldingprofile-packssecret-scanningresponsible-crawlingrelease-hygieneenterprise-dashboardproject-workspacesaudit-eventsteam-workflowsacademic-publisherspublisher-metadataarticle-metadatajournal-metadatascholarly-metadatamail-extractionwebmail-exportemail-to-seedsauthorized-mail
