Applying filter rules

Portal Search provides a facility for applying filter rules to the crawler process. The crawler filters control the crawler progress and the type of documents that are indexed and cataloged.

You can define the filter rules when creating a content source of type Web site only. You define the filters under the Filters tab. You can combine any combination of the following filtering options for a filtering rule:
Table 1. Filter rule options
Filter option Possible settings 
Apply rule while:Collecting documents Indexing documents
Rule type:Exclude Include
Rule basis:URL textFile type
Depending on your choices on the options, the filter rules result in the following behavior for selection of documents or pages:
Table 2. Document selection behavior by filter rules
Option for applying the ruleSelected rule type option:   ExcludeSelected rule type option:   Include
Apply rule while Collecting documentsThe page or document is excluded, and links on the page are not explored.Only pages or documents that meet the criteria and that have a link on a parent page that meets the criteria, starting with the initial site.
Apply rule while Adding documents to indexThe page or document is excluded, and links on the page are explored.The entire site is searched, and pages or documents that meet the filtering criteria will be included.
Note: When you use the option Apply rule while Collecting documents with Rule type: Include, make sure that the URL in the field Collect documents linked from this URL: fits the specified rule; otherwise no documents will be collected. For instance, crawling the URL http://www.ibm.com/products with the URL filter */products/* will not give any results, because the rule has a trailing slash, but the URL does not. But either crawling http://www.ibm.com/products/ with the URL filter */products/* (both with trailing slash) or crawling http://www.ibm.com/products with the URL filter */products* (no trailing slash) will work.

For more details about filter rules and how to apply them refer to the Manage Search portlet and its help.