Keith A. Pray : Home : Academic : Machine Learning : Naive Bayes : Report : Experiments

Experiments

[ Intro ] [ discretizedtest1 ] [ discretizedtest2 ] [ normalized-discretizedtest1 ] [ normalized-discretizedtest2 ] [ normalizedtest1 ] [ normalizedtest2 ]

[ Up: Report ]

Experiment Results

All experiments here were done with the full training data set and tested with the full test data set. For the details of each experiment, please see the test.txt file for that test.

Normalization :

All numeric attribute values result in being in the interval [0, 1].
The data sets, both training and test, were normalized at the same time so that the results would be consistent between the data sets.

Discretization :

The same method used for Project 2: Decision Trees was used here. The only difference being that continuous attributes could be split into more than two bins.

Missing Values :

These were simply skipped over for the purpose of generating the probabilities for the classifier. It is the equivelant of making each missing value contribute a factor of 1 to the probablity, in effect, changing nothing.
This same approach was used in classifing the test exmaples.

The best results from this set of experiments was from the test using the discretized data sets. The accuracy rate was 84.27%

Normalization seemed to not effect the accuracy at all, achieving a rate equivalent of the previous default test, 82.9924%. This seems to be because the means and relative standard deviations of those continuous attributes have the same characteristics as the original data. No big surprise there.

Out of curiosity, I repeated these experiments with the more complex Weka Naive Bayes Classifier. The result were the same for the discretized data sets and slight worse for the normalized data sets with am accuracy of 82.974%.

Summary of these results:

Test	Result
Normalized	82.9924
Descretized	84.27
Normalized - Descretized	84.27
Normalized (Complex Weka)	82.974
Descretized (Complex Weka)	84.27
Normalized - Descretized (Complex Weka)	84.27

If time allows, different techniques for descretizing should be explored. Alternative methods for handling missing values could also yeild interesting results.

by: Keith A. Pray
Last Modified: July 4, 2004 8:59 AM