Keith A. Pray - Professional and Academic Site
About Me
·
·
·
LinkedIn Profile Facebook Profile GoodReads Profile
Professional
Academic
Teaching
                                          
Printer Friendly Version
Other Experiments

[ Intro ] Makefile ] test-15-instances-per-leaf ]
test-20-instances-per-leaf ] test-25-instances-per-leaf ] test-30-instances-per-leaf ]
test-no-pruning ] test-no-subtree-raising ] test-reduced-error-pruning ]
test ]

Up: Report ]

Other Experiment Results

      All these experiments use all the training data and test data available. Attempts are made to produce trees whose performance is better than that seen in the control experiments. Pruning is always used unless specified otherwise. Detailed results including decision tree for each test can be found in the corresponding test-*.txt file.


Click to jump to a particular section of this page.


No pruning, all data used

      This used the entire training data set to build tree. The entire test data set was used to test. There was no pruning used. This was done so that a valid comparison between this set of experiments and those done in the control set could be made.

      Correctly Classified Instances       13724               84.2946 %
      Incorrectly Classified Instances      2557               15.7054 %
      Mean absolute error                      0.1841
      Root mean squared error                  0.3491
      Relative absolute error                 50.6803 %
      Root relative squared error             82.1735 %
      Total Number of Instances            16281
    

      While using all the training data to build the tree improved accuracy, it seems the tree is being over fitted. With number of leaves in this tree at 6812, it is clear this is the case.

Back to Top

All data used

      Here is the first example of pruning in this experiment set. The pruning method used is recursive, each sub-tree is pruned if current "node" is not a leaf. The error for the largest branch is calculated. The the estimated error of the current sub-tree is calculated as if it were a leaf. Lastly, the error for the entire sub-tree. Using these estimates of error, it is decided if it is best to have this sub-tree be a leaf. If not, the largest branch of the sub-tree is considered for pruning.

      Correctly Classified Instances       13995               85.9591 %
      Incorrectly Classified Instances      2286               14.0409 %
      Mean absolute error                      0.1944
      Root mean squared error                  0.3209
      Relative absolute error                 53.5048 %
      Root relative squared error             75.5484 %
      Total Number of Instances            16281  
    

Here we see an improvement in accuracy over all the control experiments and the first test in this series. Also, the number of leaves in the tree is only 546 compared to 6812 above. This seems much more reasonable.

Back to Top

No subtree raising

      This generally results in a more efficient decision tree. Since this seems to go beyond the scope of this assignment, the algorithm and implementation details have been excluded, The experiment was performed out of curiosity and the results presented here out of frugality.

      Correctly Classified Instances       13972               85.8178 %
      Incorrectly Classified Instances      2309               14.1822 %
      Mean absolute error                      0.1972
      Root mean squared error                  0.3257
      Relative absolute error                 54.2801 %
      Root relative squared error             76.674  %
      Total Number of Instances            16281
    

      This method alone does not not seem to bear any great benefits over the basic pruning method. It did result in a tree with only 1019 leafs, which is an improvement over the non-pruning method used in the first test of this set without compromising accuracy. Unfortunately time does not allow for this to be investigated further in this assignment.

Back to Top

Reduced error pruning

      This method is very similar to the regular pruning method. The difference being that some of the training data is held to the side during learning of the tree. It is then used to estimate error during the pruning phase.

      Correctly Classified Instances       13920               85.4984 %
      Incorrectly Classified Instances      2361               14.5016 %
      Mean absolute error                      0.1928
      Root mean squared error                  0.33  
      Relative absolute error                 53.0843 %
      Root relative squared error             77.6828 %
      Total Number of Instances            16281
    

      This did not yield a more accurate tree than the regular pruning method. This may be due to the default values for the number of instances to be set aside. With a large training set, this method could very well lead to better results than regular pruning.

Back to Top

Min. 15 instances per leaf

      This requires that for a leaf to exist in the tree there must be at minimum, 15 instances classified by that leaf in the training data. This helps reduce the effect of noise.

      Correctly Classified Instances       14016               86.0881 %
      Incorrectly Classified Instances      2265               13.9119 %
      Mean absolute error                      0.1981
      Root mean squared error                  0.3186
      Relative absolute error                 54.5218 %
      Root relative squared error             75.006  %
      Total Number of Instances            16281
    

      Here we see the highest accuracy yet. The number of leafs present is a greatly reduced 142. This further illustrates the earlier over fitting problem. Below are further results from varying the number of instances required. Unfortunately none performed better than a size of 15.

Back to Top

Min. 20 instances per leaf

      Variations on a theme. Since this was the most beneficial method found in an initial round of testing, varying the number of instances required for a leaf was tried. The 20 instance test was the first to be tried. Above (15 instance) is the most successful of the attempt made. The results from the other trials are presented below for completeness.

      Correctly Classified Instances       14010               86.0512 %
      Incorrectly Classified Instances      2271               13.9488 %
      Mean absolute error                      0.1985
      Root mean squared error                  0.3183
      Relative absolute error                 54.6294 %
      Root relative squared error             74.9424 %
      Total Number of Instances            16281
    

Back to Top

Min. 25 instances per leaf

      Correctly Classified Instances       14009               86.0451 %
      Incorrectly Classified Instances      2272               13.9549 %
      Mean absolute error                      0.1985
      Root mean squared error                  0.3185
      Relative absolute error                 54.6544 %
      Root relative squared error             74.9772 %
      Total Number of Instances            16281
    

Back to Top

Min. 30 instances per leaf

      Correctly Classified Instances       13981               85.8731 %
      Incorrectly Classified Instances      2300               14.1269 %
      Mean absolute error                      0.1992
      Root mean squared error                  0.3188
      Relative absolute error                 54.8475 %
      Root relative squared error             75.0523 %
      Total Number of Instances            16281
    

Back to Top


by: Keith A. Pray
Last Modified: July 4, 2004 8:58 AM
© 2004 - 1975 Keith A. Pray.
All rights reserved.

Current Theme: 

Kapowee Hosted | Kapow Generated in 0.007 second | XHTML | CSS