| 
Project 2 Report
       
      [ Intro ] 
      [ C45PruneableClassifierTree.java ] 
      [ ClassifierTree.java ] 
      [ J48.java ] 
      [ code ] 
      [ codedescription ] 
      [ summary ]  
 
      Summary of Results
       The most accurate decision tree built, as measured by the test data, was the minimum of 15 training instances per leaf tree. The tree and results can be found in that link. (See attached sheet if this is a hard copy.) The main weakness in this system seems to be its tendency to create large, complicated trees with many leaves. Since a good reference point for accuracy seemed unavailable, there is trouble evaluating the results as good, bad, etc. On the whole, smaller trees handled the test data set better than larger trees. The Weka package does have the ability to pre-process data of a continuous nature and bin or box ranges according to specified parameters or run-time determined (estimated) optimal values. The time frame of this project did not allow for this to be explored. Binary splitting of non-discrete attribute value ranges seems to be supported by the Weka package, but resulted in run time errors. This may have been due to memory restrictions imposed by the Java Virtual Machine in use and the large training data set. Since the focus of this project was to explore and understand decision tree learning and not the inner workings of the JVM, this path was not explored further. In closing, the results obtained seem very reasonable but there seems to be much room for improvement. Exploration of the various pruning method, value ranging schemes, confidence level settings, etc. available in decision tree learning algorithms, should yield better results. | |||||
| 
 | |||||