Attachment full-text indexing with conversion filters

When conversion filters are used for attachment full-text indexing, the Domino® server and Notes® standard client use Apache Tika 2.4.1 open source conversion filters to extract text for full-text searches of attachments.

Tika replaces the KeyView conversion filter used prior to Domino 10. The implementation of Tika supports the ability to:
  • Filter a wide range of formats.
  • Filter ASCII text files that contain UTF-8 encoding.
Note: On IBM i, the title and author, if present for full-text-indexed file attachments, will not be indexed and will not be searchable.

Tika runs as a Java process when you start the Notes® standard client or Domino®. The process calls tika-server.jar, which starts an HTTP server and listens for text extraction requests on port 9998, by default. If you upgrade to the Notes® standard client or Domino® 10 or above, full-text indexes that previously used KeyView filters to extract text are rebuilt using the Tika filters.

For the list of file formats supported by Tika 2.4.1, see the Apache Tika web site.

Be aware that full text searches sometimes don't return expected results when some documents with PDF attachments are involved. The search results might contain false-negative or false-positive results. For a workaround, see the article Full Text Index: some PDFs are not tokenized correctly using Tika default settings on the HCL Support site.

Note: The tika-server.jar starts an HTTP server and listens for text extraction requests on port 9998. If this port is already in use by another application, use the following notes.ini setting to change the Tika port to 9997:

The Notes® basic client does not use Tika filters for attachment filtering for local databases. The Notes® basic client users can choose to index attachments for local databases but only ASCII text attachments are indexed and searchable.