Normalization :
-
All numeric attribute values result in being in the interval [0, 1].
-
The data sets, both training and test, were normalized at
the same time so that the results would be consistent
between the data sets.
Discretization :
-
The same method used for
Project 2: Decision Trees
was used here. The only difference being that continuous attributes
could be split into more than two bins.
Missing Values :
-
These were simply skipped over for the purpose of generating
the probabilities for the classifier. It is the equivelant
of making each missing value contribute a factor of 1 to the
probablity, in effect, changing nothing.
-
This same approach was used in classifing the test exmaples.