You can use a SearchService command to perform a background
crawl of the Search seedlists without creating a Search index.
About this task
The SearchService.startBackgroundCrawl command allows you
to crawl the application seedlists and save those seedlists to a specified
location. You might want to use this command if you are experiencing
issues with crawling and you want to verify that the crawling process
is completing successfully.
Procedure
To perform a background crawl of the Search seedlists,
complete the following steps.
-
Start the wsadmin client from one of the following directories on the system on which you
installed the Deployment Manager:
Linux: app_server_root\profiles\dm_profile_root\bin
Windows:
app_server_root/profiles/dm_profile_root/bin
where
app_server_root is the WebSphere®
Application Server installation directory and
dm_profile_root is the Deployment
Manager profile directory, typically dmgr01.
You must start the client from this directory or
subsequent commands that you enter do not execute correctly.
- After the wsadmin command environment
has initialized, enter the following command to initialize the Search
environment and start the Search script interpreter:
execfile("searchAdmin.py")
If prompted to specify a service to connect to, type 1 to pick
the first node in the list. Most commands can run on any node. If
the command writes or reads information to or from a file using a
local file path, you must pick the node where the file is stored.
When
the command is run successfully, the following message displays:
Search Administration initialized
- Enter the following command:
- SearchService.startBackgroundCrawl(String persistenceLocation,
String components)
Crawls the seedlists for the specified applications and then
saves the seedlists to the specified location. This command does not
build an index.
The command takes the following parameters:
- persistenceLocation
- A string that specifies the path to which the seedlists are to
be saved.
- components
- A string that specifies the applications whose seedlists are to be crawled. The following values
are valid:
- activities
- all_configured
- blogs
- calendar
- communities
- dogear
- ecm_files
- files
- forums
- people_finder
- profiles
- status_updates
- wikis
Use all_configured instead of listing all indexable services when you want to crawl all the
applications.
For example:
SearchService.startBackgroundCrawl("/opt/IBM/Connections/backgroundCrawl",
"activities, forums, communities, wikis")
What to do next
After completing a background crawl, perform one of the following
options:
- Extract file content. For more information, see Extracting
file content.
- Create a background index. For more information, see Creating
a background index.
- Create a foreground index. For more information, see Recreating
the Search index.
If you want to create a foreground index,
copy the persisted seedlists from the persistence location that you
specified when you used the startBackgroundIndex command to the CRAWLER_PAGE_PERSISTENCE_DIR
directory on the node that is doing the indexing.
In a multi-node
system, you might want to copy the seedlists to the CRAWLER_PAGE_PERSISTENCE_DIR
directory on all nodes. Alternatively, you can set the CRAWLER_PAGE_PERSISTENCE_DIR
variable to a network location and copy the persisted seedlists from
the persistence location you specified to that location.