Estimate the size of an etx index

This section describes the factors that might make your etx index larger than it needs to be and consequently cause your queries to run slower than they need to run.

The size of an etx index can vary widely, depending on a number of factors. These factors include:
  • The nature of the data to be indexed

    If your documents contain much numeric data, such as the data found in financial reports, and you specify a character set that indexes numeric characters, each string of numbers is indexed as if it were a word. This can increase the number of unique words in the index.

  • The index parameters that you specify

    If you specify PHRASE_SUPPORT=MEDIUM or PHRASE_SUPPORT=MAXIMUM, the index will be two to four times larger than if you specify PHRASE_SUPPORT=NONE.

  • Stopword lists

    An etx index built with a stopword list (specified by STOPWORD_LIST='my_stopwordlist') is generally slightly smaller than an index that contains all the words in a document. However, if you also specify INCLUDE_STOPWORDS='TRUE', the index is approximately 50% larger.

Predicting the size of an etx index is made more complicated by the fact that it is not directly proportional to the number of documents or words in a document. The following table shows etx index sizes for various combinations of index parameters.
Word support Phrase support Stopword list Include stopwords Total index size, in disk pages, for an etx index containing 100,000 documents
Exact None No False 29 KB
Exact None Yes False 28 KB
Exact Medium No False 69 KB
Exact Maximum No False 87 KB
Exact Maximum Yes False 62 KB
Exact Maximum Yes True 87 KB
Pattern None No False 34 KB
Pattern None Yes True 34 KB
Pattern Medium No False 73 KB
Pattern Maximum No False 92 KB
Pattern Maximum Yes False 67 KB
Pattern Maximum Yes True 92 KB
The average size of the documents in this table is 2 KB.
Tip: The size of document you want to index is limited by the amount of virtual memory on your machine. For example, if you have 1 GB of virtual memory, you can only index documents that are smaller than 1 GB.