Alnum analyzer

The Alnum analyzer is useful if you want to index words that contain numbers and other characters.

The Alnum analyzer processes text in the following ways:

  • Indexes numbers as part of the word.
  • Does not index stopwords.
  • Converts alphabetic characters to lowercase.
  • Treats as white space all non-alphanumeric characters unless the characters are included in the characters list. Non-alphanumeric characters include: #, %, $, @, &, :, ', (, ) , -, _, \, and /.

Include a list of characters to index as part of words by using the alnum+characters syntax. List characters without spaces. The maximum length of the character list is 128 bytes.

Examples

In these examples, the input string is shown on the first line and the resulting tokens are shown on the second line, each surrounded by square brackets.

In the following example, words that contain both numbers and letters are indexed together and special characters are treated as white spaces:

1002A 3234 abc123 xyz-abc lmn_opq
[1002a] [3234] [abc123] [xyz] [abc] [lmn] [opq]

In the following example, the analyzer index parameter is set to alnum+_-. The hyphen and underscore characters are indexed as part of words:

1002A 3234 abc123 xyz-abc lmn_opq
[1002a] [3234] [abc123] [xyz-abc] [lmn_opq]