Keith A. Pray - Professional and Academic Site | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Other Experiments
[ Up: Report ]
Other Experiment Results All these experiments use all the training data and test data available. Attempts are made to produce trees whose performance is better than that seen in the control experiments. Pruning is always used unless specified otherwise. Detailed results including decision tree for each test can be found in the corresponding test-*.txt file. Click to jump to a particular section of this page. This used the entire training data set to build tree. The entire test data set was used to test. There was no pruning used. This was done so that a valid comparison between this set of experiments and those done in the control set could be made.
Correctly Classified Instances 13724 84.2946 % Incorrectly Classified Instances 2557 15.7054 % Mean absolute error 0.1841 Root mean squared error 0.3491 Relative absolute error 50.6803 % Root relative squared error 82.1735 % Total Number of Instances 16281 While using all the training data to build the tree improved accuracy, it seems the tree is being over fitted. With number of leaves in this tree at 6812, it is clear this is the case. All data usedHere is the first example of pruning in this experiment set. The pruning method used is recursive, each sub-tree is pruned if current "node" is not a leaf. The error for the largest branch is calculated. The the estimated error of the current sub-tree is calculated as if it were a leaf. Lastly, the error for the entire sub-tree. Using these estimates of error, it is decided if it is best to have this sub-tree be a leaf. If not, the largest branch of the sub-tree is considered for pruning.
Correctly Classified Instances 13995 85.9591 % Incorrectly Classified Instances 2286 14.0409 % Mean absolute error 0.1944 Root mean squared error 0.3209 Relative absolute error 53.5048 % Root relative squared error 75.5484 % Total Number of Instances 16281 Here we see an improvement in accuracy over all the control experiments and the first test in this series. Also, the number of leaves in the tree is only 546 compared to 6812 above. This seems much more reasonable. No subtree raisingThis generally results in a more efficient decision tree. Since this seems to go beyond the scope of this assignment, the algorithm and implementation details have been excluded, The experiment was performed out of curiosity and the results presented here out of frugality.
Correctly Classified Instances 13972 85.8178 % Incorrectly Classified Instances 2309 14.1822 % Mean absolute error 0.1972 Root mean squared error 0.3257 Relative absolute error 54.2801 % Root relative squared error 76.674 % Total Number of Instances 16281 This method alone does not not seem to bear any great benefits over the basic pruning method. It did result in a tree with only 1019 leafs, which is an improvement over the non-pruning method used in the first test of this set without compromising accuracy. Unfortunately time does not allow for this to be investigated further in this assignment. Reduced error pruningThis method is very similar to the regular pruning method. The difference being that some of the training data is held to the side during learning of the tree. It is then used to estimate error during the pruning phase.
Correctly Classified Instances 13920 85.4984 % Incorrectly Classified Instances 2361 14.5016 % Mean absolute error 0.1928 Root mean squared error 0.33 Relative absolute error 53.0843 % Root relative squared error 77.6828 % Total Number of Instances 16281 This did not yield a more accurate tree than the regular pruning method. This may be due to the default values for the number of instances to be set aside. With a large training set, this method could very well lead to better results than regular pruning. Min. 15 instances per leafThis requires that for a leaf to exist in the tree there must be at minimum, 15 instances classified by that leaf in the training data. This helps reduce the effect of noise.
Correctly Classified Instances 14016 86.0881 % Incorrectly Classified Instances 2265 13.9119 % Mean absolute error 0.1981 Root mean squared error 0.3186 Relative absolute error 54.5218 % Root relative squared error 75.006 % Total Number of Instances 16281 Here we see the highest accuracy yet. The number of leafs present is a greatly reduced 142. This further illustrates the earlier over fitting problem. Below are further results from varying the number of instances required. Unfortunately none performed better than a size of 15. Min. 20 instances per leafVariations on a theme. Since this was the most beneficial method found in an initial round of testing, varying the number of instances required for a leaf was tried. The 20 instance test was the first to be tried. Above (15 instance) is the most successful of the attempt made. The results from the other trials are presented below for completeness.
Correctly Classified Instances 14010 86.0512 % Incorrectly Classified Instances 2271 13.9488 % Mean absolute error 0.1985 Root mean squared error 0.3183 Relative absolute error 54.6294 % Root relative squared error 74.9424 % Total Number of Instances 16281Min. 25 instances per leaf
Correctly Classified Instances 14009 86.0451 % Incorrectly Classified Instances 2272 13.9549 % Mean absolute error 0.1985 Root mean squared error 0.3185 Relative absolute error 54.6544 % Root relative squared error 74.9772 % Total Number of Instances 16281Min. 30 instances per leaf
Correctly Classified Instances 13981 85.8731 % Incorrectly Classified Instances 2300 14.1269 % Mean absolute error 0.1992 Root mean squared error 0.3188 Relative absolute error 54.8475 % Root relative squared error 75.0523 % Total Number of Instances 16281 by: Keith A. Pray Last Modified: July 4, 2004 8:58 AM |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Kapowee Hosted | Kapow Generated in 0.008 second | XHTML | CSS