Performing a background crawl

You can use a SearchService command to perform a background crawl of the Search seedlists without creating a Search index.

Before you begin

See Starting the wsadmin client for information about how to start the wsadmin command-line tool.

About this task

The SearchService.startBackgroundCrawl command allows you to crawl the application seedlists and save those seedlists to a specified location. You might want to use this command if you are experiencing issues with crawling and you want to verify that the crawling process is completing successfully.

Procedure

To perform a background crawl of the Search seedlists, complete the following steps.

Start the wsadmin client from one of the following directories on the system on which you installed the Deployment Manager:
Linux: app_server_root\profiles\dm_profile_root\bin
Windows: app_server_root/profiles/dm_profile_root/bin
where app_server_root is the WebSphere® Application Server installation directory and dm_profile_root is the Deployment Manager profile directory, typically dmgr01.
You must start the client from this directory or subsequent commands that you enter do not execute correctly.
After the wsadmin command environment has initialized, enter the following command to initialize the Search environment and start the Search script interpreter:
```
execfile("searchAdmin.py")
```
If prompted to specify a service to connect to, type 1 to pick the first node in the list. Most commands can run on any node. If the command writes or reads information to or from a file using a local file path, you must pick the node where the file is stored.
When the command is run successfully, the following message displays:
```
Search Administration initialized
```
Enter the following command:
SearchService.startBackgroundCrawl(String persistenceLocation, String components)
Crawls the seedlists for the specified applications and then saves the seedlists to the specified location. This command does not build an index.
The command takes the following parameters:

persistenceLocation

A string that specifies the path to which the seedlists are to be saved.

components

A string that specifies the applications whose seedlists are to be crawled. The following values are valid:

activities

all_configured

blogs

calendar

communities

dogear

ecm_files

files

forums

people_finder

profiles

status_updates

wikis

Use all_configured instead of listing all indexable services when you want to crawl all the applications.

For example:
SearchService.startBackgroundCrawl("/opt/IBM/Connections/backgroundCrawl", "activities, forums, communities, wikis")

What to do next

After completing a background crawl, perform one of the following options:

Extract file content. For more information, see Extracting file content.
Create a background index. For more information, see Creating a background index.
Create a foreground index. For more information, see Recreating the Search index.
If you want to create a foreground index, copy the persisted seedlists from the persistence location that you specified when you used the startBackgroundIndex command to the CRAWLER_PAGE_PERSISTENCE_DIR directory on the node that is doing the indexing.
In a multi-node system, you might want to copy the seedlists to the CRAWLER_PAGE_PERSISTENCE_DIR directory on all nodes. Alternatively, you can set the CRAWLER_PAGE_PERSISTENCE_DIR variable to a network location and copy the persisted seedlists from the persistence location you specified to that location.