Troubleshooting: Excessive "Deleting documents" log file entries

When building the unstructured index, you may see an excessive number of entries in the search server log indicating "Deleting documents from Solr with query." This behavior is caused by the deletion of temporary test files associated with product attachments. It is generally benign and can be ignored. If the log shows that deleting these files is taking an excessive amount of time, and you do not use product attachments, you can use a simple workaround to improve indexing performance.

Problem

While building the unstructured index, a delete by query is executed for every catalog entry found in the TI_CATENTRY_0 table. This query is used to delete any possible attachment files stored in the temp directory under the unstructured directory.

Product attachments are indexed in two stages. First, a record of the attachment is stored in a .txt file, with the catalog entry id as the file name (for instance, 10001.txt). Following this, the record is indexed in the Catalog Entry index. The delete by query is necessary to delete the .txt files so that they are not indexed again. When there are numerous such records, a number of similar entries will be found in the server log, as in the following example.
[8/15/16 0:53:37:152 EDT] 00000b72 SolrWriter    I org.apache.solr.handler.dataimport.SolrWriter deleteByQuery 
        Deleting documents from Solr with query: catentry_id:6061726
...
[8/15/16 1:48:54:873 EDT] 00000b72 SolrWriter    I org.apache.solr.handler.dataimport.SolrWriter deleteByQuery 
        Deleting documents from Solr with query: catentry_id:6436126  

Solution

These entries can be safely ignored. If for any reason you observer that the delete by queries are creating additional overhead in the indexing time, then you can modify the indexing file to remove such logic.
Note: Modifying the indexing file is only an option if your catalog does not make use of product attachments, and you will not be using product attachments in the future.

Procedure

  1. Edit the following file Search_home/solr/home/MC_10001/en_US/CatalogEntry/unstructured/conf/wc-data-config.xml.
  2. Locate the following entity within the file, and comment it out.
    <entity name="deleteByCatentryId" pk="CATENTRY_ID" transformer="TemplateTransformer"
            query="select DISTINCT CATENTRY_ID, -2 as attachmentrel_id, -2 as attachment_id from TI_CATENTRY_0_A"
            deltaImportQuery="select DISTINCT CATENTRY_ID, -2 as attachmentrel_id, -2 as attachment_id from TI_CATENTRY_0_A"
            deltaQuery="SELECT DISTINCT CATENTRY_ID FROM TI_CATENTRY_0_A FETCH FIRST 1 ROWS ONLY">
        <field column="$deleteDocByQuery"
            template="catentry_id:${deleteByCatentryId.CATENTRY_ID}"/>
        <field column="$skipDoc" template="true"/>
    </entity>
  3. Save and close the file.
  4. Restart the search server.
Note: As this is a customization change, you must be prepared to re-apply it after installing any Interim Fixes.