Simple analyzer

The Simple analyzer converts text to tokens that contain only alphabetic characters.

The Simple analyzer is useful if you want to index every word and ignore non-alphabetical characters.

The Simple analyzer processes text characters in the following ways:

  • Each word is processed into a separate token.
  • Alphabetic characters are converted to lowercase.
  • Numeric and special characters are treated as white spaces.
  • Stopword lists are ignored. All words are indexed.

Because the Simple analyzer does not support stopwords, omit the word TO from range queries.

Examples

In these examples, the input string is shown on the first line and the resulting tokens are shown on the second line, each surrounded by square brackets.

In the following example, every word is converted to a lowercase token:

The Quick Brown Fox Jumped Over The Lazy Dog
[the][quick][brown][fox] [jumped] [over] [the] [lazy] [dog]

In the following example, the @ symbol and period are treated as white spaces:

xyz@example.com
[xyz] [example] [com]

In the following example, numbers are not included in the tokens:

1abc 12abc abc1 abc12
[abc] [abc] [abc] [abc]