Merging indexes by shard using a single JVM

When additional storage devices are available and indexing takes too long to finish, consider distributing the indexing workload across multiple devices. You can do this by adding additional index shards to your existing Search indexing server.

Before you begin

To distribute an index, you divide it into price shards, which are index cores within the same Solr instance. They are only used for indexing and merging. Once all of the index shards are successfully created, they can be merged in one optimized index, to be used with your storefront for sorting, faceting and filtering.

In order to perform index sharding with the Search server, first set up your shard environment using the following guidelines.
  • Determine the number of shards that you will be using, based on the available physical storage devices on your server. It is recommended that you assign separate storage devices for read and write operations, so that read performance is not affected by write operations.
  • Ensure that you have run the SetupSearchIndex command to create the Price index, using the indexSubType option. For detailed instructions on setting up the Price index, see step 2 of Indexing contract prices using Index Load.
  • In your authoring environment, create additional shards on your existing Master Search indexing server using the following command.
    http://hostname:3737/solr/admin/cores?action=create&name=MC_catalogId_CatalogEntry_PriceN_generic&instanceDir=MC_catalogId/generic/CatalogEntry/Price&dataDir=shardN
    where
    hostname
    The host name of the Master indexing server.
    catalogId
    The master catalog ID of the index.
    N
    The shard number.
    Note that if you are using index replication, do not configure any of your index shards to participate in your index replication network. These shards are only used for index building. The final version of the index should be on the Master server, which should then be replicated to the Repeater and then to the Subordinates.
  • Allocate enough heap memory for each of your index shards. Refer to WebSphere Commerce Search performance tuning for recommendations on how to configure the solrconfig.xml configuration file.
  • Edit the solrconfig.xml configuration file for the Price core. Change the locktype parameter to "single." In addition, in order to speed up the merge operation, consider increasing the ramBufferSize size in the same configuration file. The default value is 64 (which stands for 64 megabytes). The maximum number you can set to it is 2048. Once you have made these changes, restart the Master Search server to activate them.

Once you have set up your shard environment, you can then perform indexing to each shard using Index Load. The following diagram shows the two stages involved in sharding and rebuilding the index.

Diagram showing the process of splitting data into shards and then merging them.

Procedure

  1. In the first stage, you prepare the data from the existing indexes.
    1. Split your business data into equal catentry_id ranges for use with your index shards.
    2. Set up Index Load configuration files for each of your shards. For detailed instructions, see Index Load configuration files for indexing from database.
    3. Use Index Load to index your data into each index shard. For more information, see Index Load.
  2. In the second stage, you merge the indexes using Index Load. You will need merge configuration files that specify your source and target directories. To create these files, follow the instructions in Index Load configuration files for merging indexes.
    1. Run Index Load Merge against all shard index data directories. Index Load Merge processes your data in two steps, an index merge step and an optimization step.

Results

Once the merge operation is complete, the merged index will be online and immediately available for use.