Customizing the stopwords.txt file

In this lesson, you edit configuration files to influence the behavior of the Solr 7.3.1 search engine. The particular example is customization of the stopwords.txt file.

The stopwords.txt file is a configuration file that lists the words used by the Solr stop filter. In HCL Commerce Version 9, you can change the behavior of the stop filter by pointing the engine at your own stopwords.txt file.

In the following tutorial, you will customize the English stopwords.txt file, and verify that you haves successfully changed the behavior of the Solr search engine.

Before you begin

  1. Ensure that you are working on the correct version of the stopwords.txt file. The default file is solrhom/v3/CatalogEntry/conf/stopwords.txt, but it may have been extended, as described in Limiting search terms and characters from the search query. Locate the extended file, or create a new one to work on.
    To ensure that your system is referring to the default stopwords.txt file or its extended counterpart:
    1. Determine what the content of the name field is in either the default solrhome/v3-index/CatalogGroup/conf/schema.xml or extended solrhome/v3-index-ext/CatalogGroup/x-schema.xml file. Look for a definition similar to the following:
      <field name="name" type="wc_text_${lang:en}" indexed="true" stored="true" multiValued="false"/>
    2. In this example, the en language code has been assigned to name. This language code will be used as part of the reference to the stopwords.txt file, making its name stopwords_en. You can the path that this name is associated with by looking in the solrhome/v3/common/schema-field-types.xml file. Look for the target of the solr.StopFilterFactory filter. It will resemble the following:
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="${stopwords_en:../../common/stopwords.txt}"/>
      In this case, the stopwords_en name has been associated with ../../common/stopwords.txt. If stopwords_en is not otherwise specified in SCHCONFIG, this will be the default file.
  2. Add the parameter stopwords= stopwords_file_path to the CONFIG column of the SRCHCONFEXT database table, where stopwords_file_path is the path to your customized stopwords.txt file. In the container environment, you would use an SQL command similar to the following:
    update SRCHCONFEXT set CONFIG='stopwords=/opt/WebSphere/Liberty/usr/servers/default/resources/search/index/managed-solr/config/v3-index-ext/common/stopwords.txt, original_config' 
    where indextype='CatalogEntry' and indexscope=masterCatalogId and indexsubtype='Structured';
    
    Where the highlighted original_config is the original CONFIG value for the record, and masterCalatogId should be changed into your own master catalogId.
  3. You can add stop words for specific languages. To make a stopwords.txt file language-specific, add the line stopwords_lang= stopwords_lang_file_path to the CONFIG column of the SRCHCONFEXT table, where lang is the language code. For example, if you want to add your own French stop words, add the line stopwords_fr=stopwords_fr_file_path to the SRCHCONFEXT table CONFIG column, where stopwords_fr_file_path is the path to the French stop words file.

Procedure

  1. In the storefront, search for the string "can." You should see a result similar to the following:


  2. Copy the solrhome/MC_masterCatalogID/locale/CatalogEntry/conf/stopwords.txt file to the directory workspace_dir\search-config-ext\src\index\managed-solr\config\v3\common. Open the file in an editor.
  3. The file contains words such as "will" and "was" that help filter out unhelpful clauses in search queries. As an example that will be easy to test, add the word “can” at the bottom of the file. If you are have copied the default file, the result should look something like the following:
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    # Licensed to the Apache Software Foundation (ASF) under one or more
    # contributor license agreements.  See the NOTICE file distributed with
    # this work for additional information regarding copyright ownership.
    # The ASF licenses this file to You under the Apache License, Version 2.0
    # (the "License"); you may not use this file except in compliance with
    # the License.  You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    # a couple of test stopwords to test that the words are really being
    # configured from this file:
    stopworda
    stopwordb
    
    # Standard english stop words taken from Lucene's StopAnalyzer
    a
    an
    and
    are
    as
    at
    be
    but
    by
    for
    if
    in
    into
    is
    it
    no
    not
    of
    on
    or
    such
    that
    the
    their
    then
    there
    these
    they
    this
    to
    was
    will
    with
    can
    >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    
  4. Add the value stopwords=stopwords_file_path to the CONFIG column of the SRCHCONFEXT database table, where stopwords_file_path is the relative path to the file discoverable in the container. The following command will insert the data.
    sql: update SRCHCONFEXT set CONFIG=stopwords=workspace_dir\search-config-ext\src\index\managed-solr\config\v3\common\stopwords.txt, original_config 
    where srchconfext_id=1;
  5. Restart the HCL Commerce Search server.

Results

Search again for the string "can" in the storefront. The search should return no results.