mnb/scraperkit

MNB ScraperKit V1.0.3 - enterprise-ready PHP crawling and data extraction framework with AI crawl intelligence, search discovery, authorized mail/webmail extraction connectors, publisher metadata workflows, extraction recipes, provenance, quality reports, datasets, queues, dashboards, and compliance

Maintainers

👁 mnagendrababu23

Package info

github.com/mnagendrababu23/mnb-scraperkit

pkg:composer/mnb/scraperkit

Statistics

Installs: 4

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

v1.0.8 2026-06-12 22:40 UTC

Requires

Requires (Dev)

None

Suggests

  • ext-curl: Recommended for the cURL HTTP engine. The stream/file_get_contents engine works without cURL when allow_url_fopen is enabled.
  • ext-openssl: Recommended for secure HTTPS transport and checksum/signature workflows.
  • ext-pdo: Needed only for database storage features.
  • ext-redis: Optional Redis extension for distributed multi-worker queue mode. File-based distributed queue fallback works without Redis.
  • chrome/chromium: Needed only when using browser-assisted crawling through Panther/Chrome.
  • pdo_mysql: Optional MySQL/MariaDB driver for server database storage.
  • pdo_sqlite: Optional SQLite driver for local database storage.
  • php-ai/php-ml: Optional machine-learning toolkit for future model training/inference. V1.0.3 includes deterministic ML-ready features, dataset annotations, evaluation reports, rule-builder assisted profile creation, training-ready exports, and export connector delivery manifests without requiring this dependency.
  • symfony/panther: Optional browser-assisted crawling adapter for JavaScript-rendered pages. Normal PHP HTTP crawling works without it.

Provides

None

Conflicts

None

Replaces

None

MIT cb7e33819f745b435106ac1aa060388f55c38fda

annotationsqueuephpcsvapiclipluginsmysqlsqlitecronSitemapannotationmonitoringpdoXpathJSON-LDbenchmarkingrssscrapercrawleropengraphimapautomationdashboardpipelinechromereportsdatasetrbacworkspaceschedulingdoischedulerworkersmariadbenterpriseISSNmlwebhooksaccess-controlintelligencejob-queueEvaluationrest-apiadmin-dashboardJSON-APIworkspacesproject-templateoperationsdatasetsmulti-usermachine-learningplugin-systemsymfony-consoleredis-queuePantherphp-mltask-schedulerweb-scraping feature-extractiondata-extractionhealth-checkcomplianceexportsaudit-logdatabase-storagecss-selectorsrule-builderbrowser-sessionsdashboard-uirole-based-accesspersistent-storageheadless-browserjsonlsecurity-auditxml-exportjavascript-renderingdata-qualitygmail-apisource-connectorshtml-reportzip-bundleprofile-schemasextractor-rulesschema-driven-extractionretry-queuebrowser-assisted-crawlingfallback-crawlingspa-crawlingcrawl-historyadvanced-retryretry-policysafe-retrylocal-schedulesstale-locksconfig-pluginsprofile-addonsextensible-scrapingplugin-manifestwebhook-notificationslocal-dashboardautomation-apihealth-check-apioperations-dashboardqueue-dashboardscheduler-dashboardhtml-dashboardpage-classificationselector-suggestionsquality-predictionurl-priorityml-readydataset-versioningtraining-datadataset-diffdataset-evaluationfield-qualityselector-evaluationprofile-benchmarkannotation-coveragetraining-readinesstraining-data-qualitytraining-ready-exportauto-profileprofile-assistantrule-generationselector-testingprofile-scaffoldextraction-rules-assistantauthorized-crawlingmanual-login-assistsession-profilescookie-sessionauthenticated-crawldomain-guarded-sessionsdistributed-workersdistributed-queueredis-workersjob-leasingworker-heartbeatsmulti-workerdistributed-crawlingexport-connectorsexport-deliverydelivery-manifestchecksum-manifestwebhook-exportartifact-deliverydownstream-automationproject-templatespreset-packsworkflow-templatesstarter-projectsscraping-templatestemplate-scaffoldingprofile-packssecret-scanningresponsible-crawlingrelease-hygieneenterprise-dashboardproject-workspacesaudit-eventsteam-workflowsacademic-publisherspublisher-metadataarticle-metadatajournal-metadatascholarly-metadatamail-extractionwebmail-exportemail-to-seedsauthorized-mail

This package is auto-updated.

Last update: 2026-06-12 22:41:08 UTC