Drupal\search\SearchTextProcessorInterface

interface SearchTextProcessorInterface (View source)

Processes search text for indexing.

Constants

PREG_CLASS_NUMBERS

Matches all 'N' Unicode character classes (numbers)

PREG_CLASS_PUNCTUATION

Matches all 'P' Unicode character classes (punctuation)

PREG_CLASS_CJK

Matches CJK (Chinese, Japanese, Korean) letter-like characters.

This list is derived from the "East Asian Scripts" section of http://www.unicode.org/charts/index.html, as well as a comment on http://unicode.org/reports/tr11/tr11-11.html listing some character ranges that are reserved for additional CJK ideographs.

The character ranges do not include numbers, punctuation, or symbols, since these are handled separately in search. Note that radicals and strokes are considered symbols. (See http://www.unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt)

Methods

array

process(string $text, string|null $langcode = NULL)

Processes text into words for indexing.

string

analyze(string $text, string|null $langcode = NULL)

Runs the text through character analyzers in preparation for indexing.

Details

at line 79
`array process(string $text, string|null $langcode = NULL)`

Processes text into words for indexing.

Parameters

string	$text	Text to process.
string\|null	$langcode	Language code for the language of $text, if known.

Return Value

array

Array of words in the simplified, preprocessed text.

at line 104
`string analyze(string $text, string|null $langcode = NULL)`

Runs the text through character analyzers in preparation for indexing.

Processing steps:

Entities are decoded.
Text is lower-cased and diacritics (accents) are removed.
hook_search_preprocess() is invoked.
CJK (Chinese, Japanese, Korean) characters are processed, depending on the search settings.
Punctuation is processed (removed or replaced with spaces, depending on where it is; see code for details).
Words are truncated to 50 characters maximum.

Parameters

string	$text	Text to simplify.
string\|null	$langcode	(optional) Language code for the language of $text, if known.

Return Value

string

Simplified and processed text.

SearchTextProcessorInterface

Constants

Methods

Details

at line 79
`array process(string $text, string|null $langcode = NULL)`

Parameters

Return Value

See also

at line 104
`string analyze(string $text, string|null $langcode = NULL)`

Parameters

Return Value

See also

SearchTextProcessorInterface

Constants

Methods

Details

at line 79 array process(string $text, string|null $langcode = NULL)

Parameters

Return Value

See also

at line 104 string analyze(string $text, string|null $langcode = NULL)

Parameters

Return Value

See also

at line 79
`array process(string $text, string|null $langcode = NULL)`

at line 104
`string analyze(string $text, string|null $langcode = NULL)`