Crawling site content remotely

You must consider extra factors when you build the search index for unstructured content with the site content crawler remotely.
When you deploy WebSphere Commerce Search remotely and setting up the search index on both servers:
  • The droidConfig.xml and filters.txt files are copied to the Solr home directory on the remote search server. These files are required on the WebSphere Commerce server. Therefore, you must copy the files to your WebSphere Commerce server for use.
  • The base path in the SRCHCONFEXT table points to the store directory on the WebSphere Commerce server.
When you run the build index utility, the WebContent is not built because the manifest.txt file is not found at the specified path from the remote server:
  • On the remote server, create a mapped network drive to the root directory on the WebSphere Commerce server. For example, mapping the entire C:\ of the WebSphere Commerce server to Z:\ on the remote server.
  • Update the base path to point to the mapped drive. For example, in the preceding mapping, change BasePath=C:\rest_of_base_path to BasePath=Z:\rest_of_base_path.
  • Run the build index utility and the WebContent index is built successfully.
When you run the crawler:
  • Run the crawler on the WebSphere Commerce server, updating the database but not automatically indexing. This action updates the database with a location of C:\, instead of the mapped drive. You must update the database to use the mapped Z:\ instead.
  • Run the crawler on the WebSphere Commerce server without updating the database, but with automatic indexing. This action builds the index, with the most up to date manifest.txt stored either locally or on the mapped drive.
More notes when you work with the crawler:
  • The basePath parameter is passed to the di-buildindex utility in WebSphere Commerce Developer. The production environment can read the value from the SRCHCONFEXT table.
  • The basePath value is initially set into the SRCHCONFEXT table by the setupSearchIndex utility, relative to the WebSphere Commerce server.
  • The basePath value is updated every time that the crawler is run, only if the database information is passed to the crawler utility
  • In remote configurations, the manifest.txt file and the generated files must be mounted to the remote search server, and the basePath updated to match the new network drive.
  • If automatic indexing is enabled in the droidConfig.xml file, data can be indexed directly without looking up the basePath parameter from the database.
  • The crawler is a WebSphere Commerce utility. Therefore, it must be run on the WebSphere Commerce server.