Workflow for scanning a website

Websites today use many different kinds of technologies and configurations.

Before you begin

To successfully scan your site, you must ensure:

As much of your site as possible is being scanned.
Duplicate or irrelevant content is not being scanned.
The reports results are useful and accurate.

Having a successful scan that meets these conditions requires an iterative approach. You also must possess a sound understanding of the site's structure and technology.

It is best to limit the number of starting URLs on your first scan so the scan can progress quicker.
You can add invalid URLs as starting URLs to scan pages that are outside those found in and below your typical starting URL.
It is a good idea to know the exact names of the URLs to be scanned. Each URL should be valid or the job cannot scan it. When the job encounters an invalid URL while scanning, it will move on to the next URL in the Existing Starting URLs table. However, an invalid URL can be used to scan directories that are not included in the starting URL.

Procedure

Make a list of URLs, ports and domains to be included in the scan and then add these to the jobs that will scan your site.
Browse the site, looking for items that might interfere with your scan. Some examples include:
- Login pages
- Exclusions, such as "Add to" shopping carts, print this page, and column head sorting
- Session IDs, and other parameters
- Custom error pages
- Forms that might require values
- Flash or JavaScript™
Create a folder for the job, its reports and its dashboards or place them in their own folders.
Configure a scan and run it.
Refine and expand the scan. Your test scan report results should indicate how the scan needs to be refined.
1. Did the scan end prematurely? If so, a logout page might have caused the scan to stop.
2. Check the reports for false positives.
3. Determine if you must remove identical pages or identical forms from your reports. Pages and forms can have parameters (query string or POST data) or cookies that might cause them to appear different to the scan, when in fact they are identical. You must normalize the URLs and forms so parameters, such as session IDs, are removed from them. The scan will then recognize them as being identical. Normalization is performed at the Server and Domain level by editing a global domain, or at the individual scan job level. All URLs and forms scanned in a particular domain can have the same rules applied to them.
4. If some of the application's URLs were not discovered, use Manual Explore to manually explore the site and add URLs to the scan.
5. Consult the Website Architecture report to determine if there are more domains or directories that require scanning. Domains that are identified as external can be added to the What to Scan page so that they are included in the scan. If there are any domains listed on this page that have not been scanned, login pages might have prevented them from being reached.
6. Reevaluate how the properties are configured based on the results. You might need to configure additional properties or you might need to change the settings of existing properties.
Repeat the last two steps until you are certain that your entire website or application has been scanned and analyzed properly (no false positives, and so on).
Run the job and its report packs again. If the report results are not as expected, see Best practices for performance. If the report results are as expected, turn on the security tests and rerun the job.
Distribute the report results to your audience:
1. Email the URL of the report packs or dashboards to Report Consumers. The users must be given permission to see the report packs and dashboards.
2. Export the reports to Excel, PDF, XML, or CSV.