tecnickcom/tc-lib-unicode

PHP library containing Unicode methods

Maintainers

๐Ÿ‘ nicolaasuni

Package info

github.com/tecnickcom/tc-lib-unicode

Homepage

pkg:composer/tecnickcom/tc-lib-unicode

Fund package maintenance!

tecnickcom

Statistics

Installs: 723โ€‰013

Dependents: 2

Suggesters: 1

Stars: 10

Open Issues: 0

2.9.0 2026-06-18 13:04 UTC

Requires

Requires (Dev)

Suggests

None

Provides

None

Conflicts

None

Replaces

None

LGPL-3.0-or-later bc9d610c50ad8017fd3b50a3ac303d32a596b25e

  • Nicola Asuni <info.woop@tecnick.com>

pdfutf-8fontunicodetc-lib-unicode

This package is auto-updated.

Last update: 2026-06-18 13:25:45 UTC


README

UTF-8 and Unicode processing utilities, including bidirectional text handling.

๐Ÿ‘ Latest Stable Version
๐Ÿ‘ Build
๐Ÿ‘ Coverage
๐Ÿ‘ License
๐Ÿ‘ Downloads

๐Ÿ‘ Sponsor on GitHub

If this project is useful to you, please consider supporting development via GitHub Sponsors.

Overview

tc-lib-unicode provides Unicode conversion helpers and bidirectional algorithm support for robust multilingual text processing.

It is built to handle multilingual text paths where normalization, code-point handling, and bidirectional ordering directly affect rendering quality. By isolating Unicode-heavy operations, dependent libraries can keep text processing accurate and easier to audit.

Namespace \Com\Tecnick\Unicode
Author Nicola Asuni info@tecnick.com
License GNU LGPL v3 - see LICENSE
API docs https://tcpdf.org/docs/srcdoc/tc-lib-unicode
Packagist https://packagist.org/packages/tecnickcom/tc-lib-unicode

Features

Unicode Utilities

  • UTF-8 character and ordinal conversion helpers
  • String/character array transformations
  • Integration-ready conversion methods for document engines

Bidirectional Support

  • Unicode Bidirectional Algorithm implementation
  • Right-to-left and mixed-direction text processing
  • Supporting shaping/step logic for complex scripts

Character Substitution

  • Context-sensitive codepoint-level substitution via Substitution::replaceChars()
  • Thai โ€” repositions leading vowels (Sara E/AE/O/AI, U+0E40โ€“U+0E44, U+0E4D) to follow their base consonant, matching PDF visual-order glyph streams
  • Devanagari โ€” moves left-positional matras (U+093F) to precede their base consonant cluster, including conjuncts joined by Virama (U+094D)
  • Hangul โ€” composes Hangul Jamo sequences (U+1100โ€“U+11FF, U+A960โ€“U+A97F, U+D7B0โ€“U+D7FF) into precomposed syllables (U+AC00โ€“U+D7A3) per Unicode Standard ยง3.12

Requirements

  • PHP 8.2 or later
  • Extensions: mbstring, pcre
  • Composer

Installation

composer require tecnickcom/tc-lib-unicode

Quick Start

<?php

require_once __DIR__ . '/vendor/autoload.php';

$bidi = new \Com\Tecnick\Unicode\Bidi('hello ', null, null, 'R', false);
echo $bidi->getString();

Character substitution

Substitution::replaceChars() takes an array of Unicode codepoints and returns a transformed array with script-specific substitutions applied. It is a pure codepoint-level transform with no font or PDF dependency.

<?php

require_once __DIR__ . '/vendor/autoload.php';

$sub = new \Com\Tecnick\Unicode\Substitution();

// Thai: leading vowel repositioned after its base consonant
// Logical order: [U+0E40 SARA E, U+0E01 KO KAI]
// Visual order: [U+0E01 KO KAI, U+0E40 SARA E]
$result = $sub->replaceChars([0x0E40, 0x0E01]);
// $result === [0x0E01, 0x0E40]

// Devanagari: left matra repositioned before its base consonant cluster
// Logical order: [U+0915 KA, U+093F VOWEL SIGN I]
// Visual order: [U+093F VOWEL SIGN I, U+0915 KA]
$result = $sub->replaceChars([0x0915, 0x093F]);
// $result === [0x093F, 0x0915]

// Hangul: Jamo composed into a precomposed syllable
// [U+1100 KIYEOK, U+1161 JUNGSEONG A, U+11A8 JONGSEONG KIYEOK] โ†’ [U+AC01 ๊ฐ]
$result = $sub->replaceChars([0x1100, 0x1161, 0x11A8]);
// $result === [0xAC01]

Supported scripts and Unicode ranges

Script Unicode range(s) Transformation
Thai U+0E00โ€“U+0E7F Leading vowels repositioned after base consonant
Devanagari U+0900โ€“U+097F Left matras repositioned before consonant cluster
Hangul Jamo U+1100โ€“U+11FF, U+A960โ€“U+A97F, U+D7B0โ€“U+D7FF Jamo composed to precomposed syllables (U+AC00โ€“U+D7A3)

Codepoints belonging to unsupported scripts are passed through unchanged.

Development

make deps
make help
make qa
make server

make server starts the local PHP development server for the example/ directory on http://localhost:8000. Use a custom port with make server PORT=8080.

Packaging

make rpm
make deb

For system packages, bootstrap with:

require_once '/usr/share/php/Com/Tecnick/Unicode/autoload.php';

Contributing

Contributions are welcome. Please review CONTRIBUTING.md, CODE_OF_CONDUCT.md, and SECURITY.md.