Filtering similar pages based on structure (DOM)

Page Structure (DOM) Filtering can greatly reduce scan time by identifying pages that are similar enough to pages already scanned, that they can safely be ignored. AppScan compares new pages with those pages already scanned for structural (DOM) similarity, which indicates the new page contains no new links or contains content that requires more testing. For example, on a commercial site there might be a catalog with individual pages for a thousand different items that are identical in all other ways. There is usually no need to scan all of those pages. Filtering based on DOM similarity can greatly reduce scan time.

About this task

User role: Product Administrator, Security Analyst

Procedure

  1. For an existing scan in the Scans view, go to the scan's Explore Options page and make sure the Filter similar pages based on structure (DOM) check box is enabled.
  2. For a new scan in the Monitor view,
    1. For a new scan, launch the AppScan Dynamic Analysis Client and go to the Explore Options screen. By default both check boxes are selected. If you find that non-duplicate pages were filtered out of the scan, try clearing the second check box, or disable the feature by clearing the first check box.
    2. Continue configuring the scan, and then click Create Job.
    3. In AppScan Enterprise, click Done.
  3. After the scan runs, examine the Progress tab of the scan to see whether unique requests were mistakenly filtered out of the scan. Look for “CRWAD03010I: Skipping URL (Similar DOM was already scanned) in the scan log. This message indicates a page that was filtered from the scan because its structure (DOM) is similar to that of a previously explored page, and probably contains no new elements to test.
  4. You can also check the Filtered Links report on the Results page of the scan. Look for “Similar content limit exceeded? as the reason. If this happened, try the Filter less pages option of the Filter similar pages based on structure (DOM) check box in the AppScan Dynamic Analysis Client, which maintains a steady, lower level of filtering, or disable DOM filtering altogether. You can also try clearing the Filter pages that are likely to be similar based on structure (DOM) check box, in case unique requests were mistakenly filtered out of the scan.