VOOZH about

URL: https://apify.com/zentrafoundry/zentra-prompt-injection-quarantine

⇱ Scan Scraped Datasets for Prompt Injection Before RAG or AI Β· Apify


πŸ‘ Prompt Injection Dataset Scanner avatar

Prompt Injection Dataset Scanner

Pricing

Pay per event

Go to Apify Store

Prompt Injection Dataset Scanner

Scan and sanitize dataset records before they enter LLM, RAG, or agent pipelines.

Pricing

Pay per event

Rating

0.0

(0)

Developer

πŸ‘ Zentra

Zentra

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

39 minutes ago

Last modified

Share

Inspect prompt injection dataset scanner workflows and return policy decisions, risk flags, cost notes, trace evidence, and recommended actions.

Who this is for

AI operations, security, governance, platform, and automation teams use this actor when they need focused prompt injection dataset scanner output instead of a broad generic scraper or manual checking.

Buyer outcomes

  • Inspect prompt injection dataset scanner behavior before it creates avoidable cost, safety, or trust issues.
  • Prioritize review with policy decisions, risk levels, budget impact, trace evidence, and recommended actions.
  • Route blocked, approved, or review-required events into audit logs and operational workflows.

Sources monitored

Inputs

  • sourceMode: use sample for a safe smoke run, or configured modes for trace/tool-call inputs.
  • startUrls: optional public actor, policy, MCP manifest, or evidence URLs when the workflow uses URL-backed review.
  • sourceIds: approved policy, dataset, manifest, or trace source identifiers.
  • maxItems: bounded number of decisions, findings, or reports to emit.
  • watchlistTerms: policy names, tools, vendors, domains, or risk terms to prioritize.
  • webhookUrl: optional destination for review-required decisions or audit reports.
  • outputMode: use sample records for Store validation or production output for normal runs.

How it transforms the input

  • Input: agent trace, tool-call request, MCP manifest, actor metadata, policy rule, or run evidence.
  • Transformation: apply policy, risk, budget, permission, side-effect, or audit checks.
  • Output: allow/block/review decisions, matched policy, risk score, budget impact, reason, and recommended next action.

Outputs

The actor returns structured AgentOps records for tool-call decisions, policy results, budget/cost signals, prompt-injection review, repair diagnosis, trace evidence, or audit reports.

Family-specific fields to expect:

  • agentGoal: What the agent or workflow was trying to accomplish.

  • toolCall: Requested tool name, arguments, and execution context.

  • policyDecision: Allow, block, review, or escalation decision.

  • riskLevel: Risk level assigned to the action or workflow.

  • budgetImpact: Estimated or observed cost impact.

  • sideEffectRisk: Potential external, write, payment, or account side-effect risk.

  • recommendedAction: Operational next step for the reviewer or automation.

  • auditEvidence: Trace, policy, manifest, or run evidence used in the decision.

  • recordId: Stable record ID for exports, dedupe, and downstream joins.

  • title: Human-readable record title for review and export.

  • sourceName: Source identifier used to trace where the record came from.

  • sourceUrl: Direct source URL for review and audit.

  • dedupeKey: Stable key used for delta mode and duplicate suppression.

  • retrievedAt: Timestamp showing when the actor retrieved or generated this record.

  • score: Normalized field for filtering, routing, or downstream review.

  • scoreReasons: Buyer-readable explanation for the score or match.

  • confidence: Normalized field for filtering, routing, or downstream review.

  • errors: Normalized field for filtering, routing, or downstream review.

  • runSummary: Run-level summary for counts, filters, charges, and next actions.

Pricing

This actor uses Apify pay-per-event pricing. Current public listing guidance: $29-$49 / 1,000 launch validation records until public data proof is complete. Charges are tied to buyer-visible value events such as record-scanned, record-quarantined, quarantine-report, dataset-processed, record-saved, enriched-record. Small validation runs are supported so you can inspect output before scaling a schedule.

  • record-scanned: Charge after producing one scanned record. Typical price: $0.002. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
  • record-quarantined: Charge after producing one quarantined risky record. Typical price: $0.020. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
  • quarantine-report: Charge after producing one quarantine report. Typical price: $0.120. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
  • dataset-processed: Base charge when Prompt Injection Dataset Scanner writes a non-empty default dataset. Typical price: $0.011. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
  • first-run-cap: Recommended first run budget cap. Typical price: $2.000. Start with the default small run, inspect the dataset, then raise maxItems or schedule recurring runs.

API example

curl-X POST "https://api.apify.com/v2/actors/zentrafoundry~zentra-prompt-injection-quarantine/runs"\
+ -H"Authorization: Bearer $APIFY_TOKEN"\
+ -H"Content-Type: application/json"\
+ -d'{"maxItems":10,"sourceIds":["OWASP-LLM01","APIFY-DATASETS","APIFY-MCP"],"includeSourceUrls":true,"includeMatchReasons":true,"outputMode":"buyer-ready-records"}'

Recommended first run

{
"maxItems":10,
"sourceIds":[
"OWASP-LLM01",
"APIFY-DATASETS",
"APIFY-MCP"
],
"includeSourceUrls":true,
"includeMatchReasons":true,
"outputMode":"buyer-ready-records"
}

Sample output

Sample status: sample_unavailable at https://zentra.nimblique.studio/external/actor-review/samples/zentra-prompt-injection-quarantine.json. No fake sample is published; run a bounded real sample refresh before using examples in promotion.

Recommended public tasks

[
{
"name":"Review 10 agent/tool decisions",
"description":"Low-cost validation run for checking policy, risk, cost, and action fields.",
"input":{
"maxItems":10,
"sourceIds":[
"OWASP-LLM01",
"APIFY-DATASETS",
"APIFY-MCP"
],
"includeSourceUrls":true,
"includeMatchReasons":true,
"outputMode":"buyer-ready-records",
"actorSlug":"zentra-prompt-injection-quarantine"
}
},
{
"name":"Daily AgentOps decision review",
"description":"Recurring review batch for tool-call risk, cost guardrails, and audit evidence.",
"schedule":"Daily during local business hours",
"input":{
"maxItems":25,
"sourceIds":[
"OWASP-LLM01",
"APIFY-DATASETS",
"APIFY-MCP"
],
"includeSourceUrls":true,
"includeMatchReasons":true,
"outputMode":"buyer-ready-records",
"actorSlug":"zentra-prompt-injection-quarantine"
}
}
]

Use cases

  • Review prompt injection dataset scanner decisions before high-risk tool calls execute.
  • Route policy violations, cost guardrails, and prompt-injection findings into audit logs or review queues.
  • Compare agent runs by risk, confidence, budget impact, and recommended next action.
  • Create customer-facing evidence for safer AI-agent operations.

Trust and compliance

  • Uses Owasp Llm01, Apify datasets/storage, Apify MCP server.
  • Keeps source URLs and source identifiers in output records for auditability.
  • Does not require private credentials unless a source is explicitly configured for approved authenticated access.
  • AgentOps outputs should be logged and reviewed before enforcing high-impact production decisions.

Limitations

  • Results depend on public-source availability, source uptime, and source update cadence.
  • Public sources can revise records after publication; rerun scheduled tasks for fresh evidence.
  • Scores and match reasons are decision-support signals, not legal, financial, procurement, medical, safety, or regulatory advice.
  • Large production runs can cost more than the default smoke run; start small, inspect output, then scale schedules.

FAQ

Can I run this without URLs? Yes. The default sample mode is designed to succeed without user-supplied URLs, and URL-backed runs can use startUrls when needed.

Can I schedule it? Yes. Use sinceLastRun, watchlistTerms, and optional webhookUrl to turn the actor into a recurring alert or report workflow.

How do I verify value before scaling? Run the recommended first-run input, review the sample output fields, then increase maxItems or schedule recurring runs after the dataset matches your use case.

You might also like

Data.gov.uk Scraper - Cheap πŸŒπŸ“ŠπŸ‡¬πŸ‡§

scrapestorm/data-gov-uk-scraper---cheap

πŸ”Ž Easily collect dataset listings from data.gov.uk Provide one or multiple search URLs and extract dataset information such as πŸ“„ Dataset Title 🏒 Published By πŸ•’ Last Updated πŸ“ Description πŸ”— Dataset URL & more Perfect for open data research, government data monitoring & dataset discovery πŸ“ŠπŸš€

1

5.0

LLM Dataset Processor

dusan.vystrcil/llm-dataset-processor

Allows you to process output of other actors or stored dataset with single LLM prompt. It's useful if you need to enrich data, summarize content, extract specific information, or manipulate data in a structured way using AI.

πŸ‘ User avatar

Duőan Vystrčil

153

Reddit RAG Dataset β€” LLM Training Data from Posts & Comments

blackfalcondata/reddit-rag-dataset

Build clean LLM and RAG datasets from Reddit. Export posts with full comment threads as ready-to-chunk text, HTML and Markdown β€” only text-bearing records with parent/child thread structure. No login or developer token needed.

πŸ‘ User avatar

Black Falcon Data

3

AI Training Data Scraper - LLM and RAG-Ready

george.the.developer/ai-training-data-scraper

Extract web content formatted for LLM fine-tuning and RAG pipelines. Output in OpenAI JSONL, Claude JSONL, Markdown, or raw text.

Dataset Download

idiatech/apify-Dataset-Download

Download any dataset from the Apify platform automatically and in any format you want. Use this actor along with a Dataset toolbox automation tool.