class PhpTransliteration extends PhpTransliteration (View source)

Enhances PhpTransliteration with an alter hook.

Properties

protected string $dataDirectory

Directory where data for transliteration resides.

from  PhpTransliteration
protected array $languageOverrides

Associative array of language-specific character transliteration tables.

from  PhpTransliteration
protected array $genericMap

Non-language-specific transliteration tables.

from  PhpTransliteration
protected $fixTransliterateForRemoveDiacritics

Special characters for ::removeDiacritics().

from  PhpTransliteration
protected ModuleHandlerInterface $moduleHandler

The module handler to execute the transliteration_overrides alter hook.

Methods

__construct(string $data_directory, ModuleHandlerInterface $module_handler)

Constructs a PhpTransliteration object.

string
removeDiacritics(string $string)

Removes diacritics (accents) from certain letters.

string
transliterate(string $string, string $langcode = 'en', string $unknown_character = '?', int $max_length = NULL)

Transliterates text from Unicode to US-ASCII.

static int
ordUTF8(string $character)

Finds the character code for a UTF-8 character: like ord() but for UTF-8.

string
replace(int $code, string $langcode, string $unknown_character)

Replaces a single Unicode character using the transliteration database.

string
lookupReplacement($code, string $unknown_character = '?')

Look up the generic replacement for a UTF-8 character code.

readLanguageOverrides($langcode)

Overrides \Drupal\Component\Transliteration\PhpTransliteration::readLanguageOverrides().

readGenericData($bank)

Reads in generic transliteration data for a bank of characters.

Details

__construct(string $data_directory, ModuleHandlerInterface $module_handler)

Constructs a PhpTransliteration object.

Parameters

string $data_directory

(optional) The directory where data files reside. If omitted, defaults to subdirectory 'data' underneath the directory where the class's PHP file resides.

ModuleHandlerInterface $module_handler

The module handler to execute the transliteration_overrides alter hook.

string removeDiacritics(string $string)

Removes diacritics (accents) from certain letters.

This only applies to certain letters: Accented Latin characters like a-with-acute-accent, in the UTF-8 character range of 0xE0 to 0xE6 and 01CD to 024F. Replacements that would result in the string changing length are excluded, as well as characters that are not accented US-ASCII letters.

Parameters

string $string

The string holding diacritics.

Return Value

string

$string with accented letters replaced by their unaccented equivalents.

string transliterate(string $string, string $langcode = 'en', string $unknown_character = '?', int $max_length = NULL)

Transliterates text from Unicode to US-ASCII.

Parameters

string $string

The string to transliterate.

string $langcode

(optional) The language code of the language the string is in. Defaults to 'en' if not provided. Warning: this can be unfiltered user input.

string $unknown_character

(optional) The character to substitute for characters in $string without transliterated equivalents. Defaults to '?'.

int $max_length

(optional) If provided, return at most this many characters, ensuring that the transliteration does not split in the middle of an input character's transliteration.

Return Value

string

$string with non-US-ASCII characters transliterated to US-ASCII characters, and unknown characters replaced with $unknown_character.

static protected int ordUTF8(string $character)

Finds the character code for a UTF-8 character: like ord() but for UTF-8.

Parameters

string $character

A single UTF-8 character.

Return Value

int

The character code, or -1 if an illegal character is found.

protected string replace(int $code, string $langcode, string $unknown_character)

Replaces a single Unicode character using the transliteration database.

Parameters

int $code

The character code of a Unicode character.

string $langcode

The language code of the language the character is in.

string $unknown_character

The character to substitute for characters without transliterated equivalents.

Return Value

string

US-ASCII replacement character. If it has a mapping, it is returned; otherwise, $unknown_character is returned. The replacement can contain multiple characters.

protected string lookupReplacement($code, string $unknown_character = '?')

Look up the generic replacement for a UTF-8 character code.

Parameters

$code

The UTF-8 character code.

string $unknown_character

(optional) The character to substitute for characters without entries in the replacement tables.

Return Value

string

US-ASCII replacement characters. If it has a mapping, it is returned; otherwise, $unknown_character is returned. The replacement can contain multiple characters.

protected readLanguageOverrides($langcode)

Overrides \Drupal\Component\Transliteration\PhpTransliteration::readLanguageOverrides().

Allows modules to alter the language-specific $overrides array by invoking hook_transliteration_overrides_alter().

Parameters

$langcode

Code for the language to read.

protected readGenericData($bank)

Reads in generic transliteration data for a bank of characters.

The data is read in from a file named "x$bank.php" (with $bank in hexadecimal notation) in PhpTransliteration::$dataDirectory. These files should set up a variable $bank containing an array whose numerical indices are the remaining two bytes of the character code, and whose values are the transliterations of these characters into US-ASCII. Note that the maximum Unicode character that can be encoded in this way is 4 bytes.

Parameters

$bank

First two bytes of the Unicode character, or 0 for the ASCII range.