VOOZH about

URL: https://apify.com/maximedupre/unicode-text-inspector

⇱ Unicode Text Inspector for Hidden Characters Β· Apify


Pricing

from $0.40 / 1,000 text inspections

Go to Apify Store

Unicode Text Inspector

Inspect pasted text for hidden Unicode characters, zero-width spaces, bidi controls, control characters, and homoglyphs. Get risk levels, issue evidence, category counts, cleaned text, and batch summaries.

Pricing

from $0.40 / 1,000 text inspections

Rating

0.0

(0)

Developer

πŸ‘ Maxime DuprΓ©

Maxime DuprΓ©

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Categories

Share

πŸ”Ž Unicode text inspector for hidden characters

Unicode Text Inspector checks pasted text for hidden Unicode characters, zero-width spaces, bidi controls, control characters, homoglyphs, Unicode category counts, risk levels, and cleaned text. Paste one string or a batch of strings, then get one output item per submitted text.

Use it when you need to audit product titles, domains, email subjects, CRM fields, usernames, form submissions, code snippets, search keywords, or imported text before it enters another system. The Actor analyzes text locally. It does not fetch URLs, use cookies, require accounts, call an external Unicode API, or send your submitted text to a third-party service.

For a quick first run, keep the prefilled examples. They include a zero-width character, a Cyrillic homoglyph in a domain-like string, and a clean text sample so you can see suspicious and clean output in the same dataset.

βœ… What this Actor checks

  • Zero-width and invisible format characters such as U+200B, U+200C, U+200D, and U+FEFF.
  • Bidirectional controls used in Trojan Source-style display-order attacks, including overrides, embeddings, isolates, and marks.
  • ASCII and C1 control characters such as null bytes, escape characters, tabs, line feeds, and delete.
  • Practical homoglyphs and confusables across common Cyrillic, Greek, fullwidth Latin, mathematical digit, and typography cases.
  • Unicode category composition, including letters, numbers, punctuation, symbols, marks, separators, controls, format characters, private use, and unassigned codepoints.
  • Deterministic risk levels from none to critical.
  • Mechanical cleaned text that removes flagged invisible, control, and bidi characters without rewriting user language.

The Actor keeps all checks enabled by default. There are no strictness sliders or per-check toggles because these checks are local, useful, and do not change the price per inspected text.

πŸ“Š What data you get

Each output item represents one submitted text string. Rows can include:

FieldDescription
inputIndexPosition of the text in your submitted list.
originalTextExact text submitted for inspection.
textPreviewShort visible preview after removable hidden/control characters are stripped.
cleanedTextFull mechanically cleaned text when suspicious invisible, control, or bidi characters can be removed safely.
characterCount, codePointCount, codeUnitLengthText length counts for Unicode-aware audits.
issueCount, suspiciousContent, riskLevelTriage fields for filtering clean, low-risk, and high-risk text.
issuesExact issue evidence with type, severity, position, codepoint, decimal value, Unicode name, category, context, description, and recommendation.
issueTypeCountsPer-text counts for invisible/format, bidi, control, and homoglyph issues.
unicodeCategoryCountsUnicode category counts for the inspected text.
batchSummaryRun-level totals repeated with each row for large batch triage.
analyzedAtUTC timestamp when the text was inspected.

The output is designed for JSON, CSV, Excel, API, webhook, scheduled audit, spreadsheet, search-index QA, moderation, and security-review workflows.

πŸš€ How to run it

  1. Open the Actor input.
  2. Paste text strings into Texts to inspect. Use one string per line.
  3. Start the Actor.
  4. Open the dataset and filter by riskLevel, suspiciousContent, issueCount, or issueTypeCounts.

You can submit plain text strings from the Apify Console, API, or integrations. The Actor preserves input order with inputIndex, so you can map each output item back to your submitted batch.

🧾 Input example

{
"texts":[
"Hello​ World",
"pΠ°ypal.com",
"Normal clean text"
]
}

πŸ“€ Output example

{
"inputIndex":1,
"originalText":"Hello​ World",
"textPreview":"Hello World",
"cleanedText":"Hello World",
"characterCount":12,
"codePointCount":12,
"codeUnitLength":12,
"issueCount":1,
"suspiciousContent":true,
"riskLevel":"low",
"issues":[
{
"type":"invisible_format",
"severity":"low",
"position":5,
"codeUnitIndex":5,
"character":"​",
"codePoint":"U+200B",
"decimalCodePoint":8203,
"unicodeName":"ZERO WIDTH SPACE",
"unicodeCategory":"Cf",
"unicodeCategoryName":"Format character",
"description":"Invisible or format character can affect matching, searching, copy-paste, or display.",
"recommendation":"Remove when this text should be plain visible text.",
"context":{
"before":"Hello",
"after":" World"
}
}
],
"issueTypeCounts":{
"invisible_format":1,
"bidi_control":0,
"control_character":0,
"homoglyph_confusable":0
},
"unicodeCategoryCounts":{
"Lu":2,
"Ll":8,
"Cf":1,
"Zs":1
},
"batchSummary":{
"totalTexts":3,
"suspiciousTexts":2,
"cleanTexts":1,
"totalIssues":2,
"highestRiskLevel":"medium",
"issueTypeCounts":{
"invisible_format":1,
"bidi_control":0,
"control_character":0,
"homoglyph_confusable":1
}
},
"analyzedAt":"2026-06-15T00:00:00.000Z"
}

🎯 Common use cases

  • Find hidden copy-paste characters in product titles, slugs, names, and search keywords.
  • Catch bidi controls before text enters source code, review queues, support tools, or documentation.
  • Detect homoglyphs in domain-like strings, usernames, brand terms, and moderation inputs.
  • Clean text before importing it into a CRM, database, spreadsheet, search index, or analytics pipeline.
  • Build a scheduled Unicode quality gate for user-generated text, scraped text, or submitted forms.
  • Export issue evidence for security review, data QA, or moderation workflows.

πŸ’³ Pricing

This Actor uses pay-per-event pricing. You are charged once per submitted text string that is inspected and saved as an output item.

The current event prices are:

  • FREE: $0.60 per 1,000 inspected texts
  • BRONZE: $0.55 per 1,000 inspected texts
  • SILVER: $0.45 per 1,000 inspected texts
  • GOLD: $0.40 per 1,000 inspected texts
  • PLATINUM: $0.30 per 1,000 inspected texts
  • DIAMOND: $0.20 per 1,000 inspected texts

Runs that stop before saving any inspected text items do not create text-inspection charges.

⚠️ Limits and notes

Unicode Text Inspector is deterministic. It does not use AI, infer malicious intent, score phishing risk, decide whether a brand is impersonated, rewrite language, or claim complete Unicode TR39 coverage across every script.

Homoglyph detection focuses on practical Latin-lookalike cases that are useful for text QA and security review. Cleaned text removes hidden, control, and bidi characters when that cleanup is mechanical. It does not replace homoglyphs with guessed intended characters.

❓ FAQ

🌐 Does this Actor scrape websites?

No. It only inspects text strings that you provide. It does not fetch URLs, crawl pages, use a proxy, or call external APIs.

πŸ”Œ Can I use it from the Apify API?

Yes. Submit texts as an array of strings and read one output item per inspected text from the dataset.

🧹 Does cleaned text change what I wrote?

Cleaned text removes flagged invisible, control, and bidi characters when that can be done mechanically. It does not rewrite words, translate text, or replace homoglyphs with guessed characters.

βœ… Why are there no detection toggles?

All detection checks are local and useful. Keeping them on gives a more complete audit without changing the price per inspected text.

πŸ“ Changelog

  • 0.1: Initial release.

πŸ†˜ Support

For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h 🫑

πŸ”— Other actors

Made with ❀️ by Maxime Dupré

You might also like

Unicode Text Inspector

automation-lab/unicode-text-inspector

Scan text for hidden Unicode characters: zero-width spaces, RTL override attacks, homoglyphs, and control characters. Get risk level + full codepoint details per character.

πŸ‘ User avatar

Stas Persiianenko

7

Game of Thrones Characters API

kodyitson23n/game-of-thrones-characters-api

Game of Thrones Characters API

πŸ‘ User avatar

Mackenzie Covert

2

Text to Slug Generator

automation-lab/text-to-slug-generator

πŸ”— Convert text, titles, or headings to clean URL-friendly slugs. Batch-process thousands of strings with Unicode transliteration, stop-word removal, custom separators, and max-length truncation.

πŸ‘ User avatar

Stas Persiianenko

2

Harry Porter Characters Scraper

columban.vej/harry-porter-characters-scraper

Harry Porter Characters Scraper

πŸ‘ User avatar

Ian Schumacher

2

Text Scraper (Free)

karamelo/text-scraper-free

Website Text Extractor. Extract Text from Webpages and Feed Your LLMs

1.1K

4.1

Pdf To Text Scraper

getdataforme/pdf-to-text-scraper

The Pdf To Text Scraper is an Apify Actor that efficiently extracts text from PDFs, preserving structure and supporting batch processing....