crawler utility

You can use the crawler utility to crawl HTML and other site files from WebSphere Commerce starter stores to help populate the site content search index.

Since your store catalog is structured data, it can be indexed from your database. Your store however can have site-wide pages that are not necessarily associated to your catalog. This utility can produce two sets of artifacts:
  • HTML pages that are compiled crawled pages.
  • A manifest file that acts as a directory of the compiled pages.

The utility output is used by the indexer. For more information, see Indexing site content with WebSphere Commerce Search

Syntax diagram for crawler utility

Parameter values

cfg
The location of the site content crawler configuration file. For example, solrhome/droidConfig.xml
instance
The name of the WebSphere Commerce instance with which you are working (for example, demo).
dbtype
Optional: The database type. For example, cloudscape, db2, or oracle.
dbname
Optional: The database name to be connected.
dbhost
Optional: The database host to be connected.
dbport
Optional: The database port to be connected.
dbuser

DB2Optional: The name of the user that is connecting to the database.

OracleOptional: The user ID that is connecting to the database.

dbuserpwd
Optional: The password for the user that is connecting to the database.
If the dbuser and dbuserpwd values are not specified, the crawler can run successfully, but cannot update the database.
searchuser
Optional: The user name for the search server.
searchuserpwd
Optional: The password for the search server user.

Example

From the following directory:
  • LinuxAIXFor IBM i OS operating systemWC_installdir/bin
  • WebSphere Commerce DeveloperWCDE_installdir\bin
Run the following command:
  • Windows crawler.bat -cfg cfg -instance instance_name [-dbtype dbtype] [-dbname dbname] [-dbhost dbhost] [-dbport dbport] [-dbuser db_user] [-dbuserpwd db_password] [-searchuser searchuser] [-searchuserpwd searchuserpwd]
  • LinuxAIXFor IBM i OS operating systemcrawler.sh -cfg cfg -instance instance_name [-dbtype dbtype] [-dbname dbname] [-dbhost dbhost] [-dbport dbport] [-dbuser db_user] [-dbuserpwd db_password] [-searchuser searchuser] [-searchuserpwd searchuserpwd]
  • WebSphere Commerce DeveloperDB2Oracle crawler.bat -cfg cfg -instance instance_name [-dbtype dbtype] [-dbname dbname] [-dbhost dbhost] [-dbport dbport] [-dbuser db_user] [-dbuserpwd db_password] [-searchuser searchuser] [-searchuserpwd searchuserpwd]
  • Apache DerbyWebSphere Commerce Developercrawler.bat -cfg cfg [-searchuser searchuser] [-searchuserpwd searchuserpwd]

Running the utility using a URL

You can run the utility by using a URL on the WebSphere Commerce Search server.

http://solrHost:port/solr/crawler?action=actionValue&cfg=pathOfdroidConfig&
Where action is the action that the crawler should perform. The possible values are:
start
Starts the crawler.
status
Shows the crawler status.
stop
Stops the crawler.
The utility generates a log file. You can use this log file to refine your search parameters or diagnose failures. By default, the log file is named crawler.log and is written into the logs directory.
Note: You can change the log file's name and location by editing the crawler-logging.properties file, which by default is located in the directory %WCTOOLKIT%\workspace\WC\xml\config\dataimport. The file path and filename for the log file are defined in the java.util.logging.filehandler.pattern entry in this file.