Modifying the training set for phase 2

Phase 2 uses supervised machine learning techniques on all remaining findings to determine if a finding is actionable.

The machine learning employed during phase 2 of IFA uses a training set to build a classification model. The model is then used to perform predictions for one of four classifications, as follows:
    • high actionable
    • medium actionable
    • low actionable
    • not interesting

IFA phase 2 assigns each classification a prediction value between 0 and 1 that represents the probability the machine is correct in the classification based on this training set. The system then looks at the resulting probabilities for each classification for each finding and chooses a response by selecting the highest resulting probability.

The training set “learns” by manually classifying new assessments. IFA reads the classification from the assessment files using the following algorithm in order of preference:
  • Notes
    • 1=High
    • 2=Medium
    • 3=Low
    • 4=Not Interesting
  • Modified Severity
    • High=High
    • Medium=Medium
    • Low=Low
    • Info=Not Interesting
  • Severity
    • High=High
    • Medium=Medium
    • Low=Low
    • Info=Not Interesting
  • Excluded Findings=Not Interesting

Ideally the severity is adjusted to match what is desired for classification as it is the easiest path. Any findings deemed not interesting can be excluded; they will be applied to the training set as not interesting findings.

The training assessments are all stored at:

<data_dir>\ml\spark\train
where <data_dir> is the location of your AppScan® Source program data, as described in Installation and user data file locations.

Any modification to the factory set or addition of a new assessment triggers IFA to rebuild the models for prediction. Restart the server to take advantage of the updated training files.