Optimizing Search Results in Hebrew Language Stores

For optimal catalog search results when your catalog includes data in the Hebrew language, generate a catalog-specific dictionary. The default catalog is not optimized for Hebrew-language catalog items. You can use the createTokens.db2.sh script to build a catalog dictionary that is properly optimized for Hebrew storefronts. If you catalog does not include Hebrew data, running this script does not affect the quality of search results.

Before you begin

  • Use this script at least one time after your initial creation and upload of the catalog to the system.
  • For optimal results, run the script each time that the catalog undergoes a significant change. For instance, run this script if you add many catalog products, update them or change their descriptions.

Procedure

  1. Save the following shell code into a file with the name createTokens.db2.sh in the /opt/IBM/WebSphere/CommerceServer/bin directory:
    #!/bin/sh
    
    #-----------------------------------------------------------------
    # Licensed Materials - Property of IBM
    #
    # WebSphere Commerce
    #
    # (C) Copyright IBM Corp. 2006, 2016 All Rights Reserved.
    #
    # US Government Users Restricted Rights - Use, duplication or
    # disclosure restricted by GSA ADP Schedule Contract with
    # IBM Corp.
    #-----------------------------------------------------------------
    
    #
    #
    
    # show usage of the command
    showusage()
    {
    echo "Usage:"
    echo "------"
    echo "createTokens.db2.sh [dbname userId password langid"
    echo "where dbname is the name of the database to be populated"
    echo "where userId is the userid of the user who owns the database"
    echo "where password is the password of the user"
    echo "where langid is the Hebrew language id in the database"
    
    exit 1
    }
    
    # end with failure
    endfailure()
    {
    echo "Error connecting database.  Check log for details."
    exit 1
    }
    
    CURDIR=`pwd`
    BINDIR=`dirname $0`
    
    # change current directory to bin
    cd $BINDIR
    
    # Set up environment variables needed for Oracle DB connection
    if [ -f $BINDIR/config_env.db2.sh ]; then
       . $BINDIR/config_env.db2.sh         
    else
       . $CURDIR/config_env.db2.sh         
    fi
    
    if [ $# -eq 4 ]; then
    	
    	DATABASE=$1
    	USER=$2
    	PASSWORD=$3
    	LANGID=$4
    	LOG=$WCLOGDIR/createTokens.db2.log
    	TOKENS=$WCLOGDIR/Tokens.txt
    	TOKENSZIP=$WCLOGDIR/Tokens.zip
    	
    	if [ "$DATABASE" = "" ]; then
    	     showusage
    	fi
    	if [ "$USER" = "" ]; then
     	    showusage
    	fi
    	if [ "$PASSWORD" = "" ]; then
    	     showusage
    	fi
    
    	SUBDIR=`dirname $LOG`
    	if [ -f $LOG ]; then
    		mv $LOG $LOG.orig
    	elif [ ! -d $SUBDIR ]; then
    		mkdir -p $SUBDIR
    		touch $LOG
    	else
    		touch $LOG
    	fi
    		
    	cat /dev/null > $TOKENS
    	rm -f $TOKENSZIP
    	db2 -v connect to $DATABASE user $USER using $PASSWORD >> $LOG 2>&1
    	if [ $? -ne 0 ]
    	then
    		endfailure  
    	fi
    	db2 -v SELECT NAME, SHORTDESCRIPTION, LONGDESCRIPTION FROM CATGRPDESC WHERE LANGUAGE_ID=-11 | while read line
    	do
    		for word in $line
    		do
    			echo $word | grep -P "[\x80-\xFF]" >> $TOKENS   
    		done
    	done
    	db2 -v SELECT NAME, SHORTDESCRIPTION, LONGDESCRIPTION FROM CATENTDESC WHERE LANGUAGE_ID=-11 | while read line
    	do
    		for word in $line
    		do
    			echo $word | grep -P "[\x80-\xFF]" >> $TOKENS   
    		done
    	done
    	db2 -v SELECT STRINGVALUE FROM ATTRVALDESC WHERE LANGUAGE_ID=-11 | while read line
    	do
    		for word in $line
    		do
    			echo $word | grep -P "[\x80-\xFF]" >> $TOKENS   
    		done
    	done
    	zip $TOKENSZIP $TOKENS
           
    else
    	showusage
    fi
    
  2. Run the script as follows.
    ./createTokens.db2.sh <DBName> <dbUser> <pwd> <langID>
    Where
    DBName
    The Commerce database Name. For instance, MALL.
    DbUser
    The db user name. For instance, dbinst1.
    Pwd
    The password for this user.
    LangID
    The Hebrew language ID used by your database. For instance, -11.
  3. Running the script might take 5 - 10 minutes. It produces the file Tokens.zip in the directory /opt/IBM/WebSphere/CommerceServer/logs.
  4. Update the Tokens.zip entry in the file lucene-analyzers-common-xxx.jar, which can be found in org/apache/lucene/analysis/he. The JAR file lucene-analyzers-common-xxx.jar must be updated in the two following locations: Opt/IBM/WebSphere/AppServer/profiles/demo/installedApps/WC_demo_cell/WC_demo.ear/lib/lucene-analyzers-common-xxx.jar and Opt/IBM/WebSphere/AppServer/profiles/demo_solr/installedApps/demo_search_cell/Search_demo.ear/lib/lucene-analyzers-common-xxx.jar.
  5. Rebuild the index with the di-buildindex.bat command. See Building the WebSphere Commerce Search index.

Example

./createTokens.db2.sh MALL dbinst1 guest -11