VOOZH about

URL: https://apify.com/shoebill-dev27/jp-text-normalizer

⇱ Japanese Text Normalizer β€” NFKC, kana, whitespace, sentences Β· Apify


πŸ‘ Japanese Text Normalizer β€” NFKC, kana, whitespace, sentences avatar

Japanese Text Normalizer β€” NFKC, kana, whitespace, sentences

Pricing

Pay per usage

Go to Apify Store

Japanese Text Normalizer β€” NFKC, kana, whitespace, sentences

Normalize Japanese text for data pipelines: Unicode NFKC (full/half-width unification), wave-dash unification, whitespace cleanup, hiragana/katakana conversion, Japanese-aware sentence splitting, and per-script character stats.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

πŸ‘ Shinobu Otani

Shinobu Otani

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Categories

Share

Japanese Text Normalizer

Clean and normalize Japanese text for search indexes, datasets, and LLM pipelines β€” deterministic, instant, no LLM cost.

What it does

  • Unicode NFKC: full-width alphanumerics β†’ ASCII (οΌ£ο½Œο½ο½•ο½„ο½… β†’ Claude), half-width katakana β†’ full-width (ο½ΆοΎžο½²οΎ„οΎž β†’ ガむド)
  • Wave-dash unification: ~ (U+FF5E) β†’ γ€œ (U+301C), without touching real ASCII tildes in paths/URLs
  • Whitespace cleanup: collapses space runs (including ideographic spaces), trims line ends, collapses 3+ blank lines, normalizes CRLF
  • Kana conversion: hiragana ↔ katakana (optional)
  • Sentence segmentation: Japanese-aware (γ€‚οΌοΌŸ with closing-quote handling) plus Latin punctuation
  • Character statistics: per-script counts (hiragana / katakana / kanji / ASCII / digits) before and after

Input

{
"texts":["οΌ£ο½Œο½ο½•ο½„ο½…γ€€οΌ£ο½ο½„ο½…γ§ι–‹η™Ίγ™γ‚‹γ€‚γ€Œγ™γ”γ„γ€γ¨ζ€γ£γŸγ€‚"],
"kana":"none",
"split_sentences":true
}

Output (one dataset item per text)

{
"text":"Claude Codeγ§ι–‹η™Ίγ™γ‚‹γ€‚γ€Œγ™γ”γ„γ€γ¨ζ€γ£γŸγ€‚",
"changed":true,
"sentences":["Claude Codeで開発する。","γ€Œγ™γ”γ„γ€γ¨ζ€γ£γŸγ€‚"],
"sentence_count":2,
"stats_before":{"hiragana":8,"katakana":0,"kanji":4,"...":"..."},
"stats_after":{"...":"..."}
}

Typical uses

  • Preprocessing scraped Japanese text before indexing or embedding
  • Unifying mixed full-width/half-width product data
  • Sentence-level dataset construction from raw Japanese prose

You might also like

Japanese Text Summarizer (Groq AI)

acia/japanese-text-summarizer

Summarizes Japanese text using Groq AI (ultra-fast). Perfect for news articles, blog posts, and product descriptions. Supports batch processing.

Google Maps Japan Scraper β€” Email + Business Leads

totaka/google-maps-japan-scraper

Extract Japanese business leads from Google Maps β€” name, address, phone, email, website, rating and GPS. Emails auto-extracted from websites. Works in English and Japanese. $0.001/result.

1

Unicode Text Inspector

automation-lab/unicode-text-inspector

Scan text for hidden Unicode characters: zero-width spaces, RTL override attacks, homoglyphs, and control characters. Get risk level + full codepoint details per character.

πŸ‘ User avatar

Stas Persiianenko

7

Japanese Web Scraper - Yahoo News, Rakuten, Suumo, Tabelog

project_bbb/japanese-web-scraper

Scrape major Japanese websites: Yahoo! Japan News, Rakuten, Suumo, Tabelog. Full Shift_JIS/EUC-JP encoding support, cookie wall bypass, and JP pagination handling. Structured JSON output with optional romaji transliteration for non-Japanese data consumers.

6

Gurunavi Scraper - Japanese Restaurant Reviews & Listings

huggable_quote/gurunavi-scraper

Scrape restaurant data from Rakuten Gurunavi, Japan's top dining guide. Extract menus, prices, reviews, ratings, hours, and location info. Ideal for Japanese F&B market research and competitor analysis.

πŸ‘ User avatar

OrbitData Labs

2