Attachment full-text indexing

The Domino® server and Notes® standard client use Apache Tika 1.18 open source conversion filters to extract text for full-text searches of attachments. Tika replaces the KeyView conversion filter used in previous releases.

The implementation of Tika supports the ability to:
  • Search a wide range of formats, including container files such as .zip and .tar files.
  • Search ASCII text files that contain UTF-8 encoding.
  • Customize which attachment types can be full-text indexed and the maximum attachment size allowed for full-text indexing.

Tika runs as a Java™ process when you start the Notes standard client or Domino. The process calls tika-server.jar, which starts the HTTP task and listens for text extraction requests on port 9998, by default. If you upgrade to Notes or Domino 10, full-text indexes that previously used KeyView filters to extract text are rebuilt using the Tika filters.

For the list of file formats supported by Tika 1.18, see the Apache Tika web site.

Note: The tika-server.jar starts an HTTP server and listens for text extraction requests on port 9998. If this port is already in use by another application, use the following notes.ini setting to change the Tika port to 9997:

The Notes basic client does not use Tika filters for attachment searches in local databases. (Limitation does not apply to the Notes standard client or to searches of server-based databases). The Notes basic client users can choose to index attachments but only ASCII text attachments are searched.