Filtering

To avoid indexing binary data (which is not useful in etx searches), filter your documents before they are indexed. Filtering refers to the process of stripping away all the proprietary formatting information from a document so that only its text content remains in ASCII format.

For example, Microsoft™ Word documents usually contain formatting information that describes the fonts, paragraph styles, character styles, and layout of the text. Although this information can be indexed, it is not useful for users who want to search the content of the document. Its inclusion in an etx index can significantly increase the size of the index and affect the performance of text searches. Filtering removes all this information and leaves standard ASCII text.

To create an index on the filtered text of a column, specify the FILTER index parameter when you create your etx index. The following statement, for example, creates an etx index on the abstract column of the my_table table and specifies that the documents in the abstract column are to be filtered before they are added to the index:
CREATE INDEX abstract_index ON my_table (abstract etx_clob_ops)
    USING etx (FILTER = 'STOP_ON_ERROR');