Report

Intro ] [ codedescription ] [ summary ]

Experiments ] [ Initial Experiments ] [ Slides ] [ Up: Naive Bayes ]

Code Description


Click to jump to a particular section of this page.


Code Reference

      The original version, documented version and ported versions of the code used can be found through the links below.

      The file and data handling code used in this project is from the Weka project (www.cs.waikato.ac.nz/ml/weka/).

      Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka is also well-suited for developing new machine learning schemes. Weka is open source software issued under the GNU public license.

      The Weka code used in this project was that which handles files and various data manipulation tasks. It was also used to collect statistics on the performance of the Naive Bayes Classifier algorithm implemented. The version of the Naive Bayes classifier was adapted from the Weka package as commented to match the basic algorithm blocks implemented by the code supplied with the book.

Back to Top

Code Adaptation

      Code from Weka was adapted to match basic blocks used in the code from the book. There were several reasons for this. The C code supplied handled input that was in the form of English text which severely limited the use of the code for general learning problems. I, of course, wanted to use the data handling used in the previous projects.

There were several options:

  1. Rip apart the C code provided and glue together a more general Bayes classifier written in C and either:
    • Port file and data handling functionality from Java to C

    • or
    • call native C functions from Java
  2. Rip apart the C code provided and port the pieces to Java
  3. Adapt the Weka Simple Bayes Classifier to match the methods (where applicable) used in the provided C code.
The last choice was the most attractive since it required a thorough understanding of two different implementations of a Bayes Classifier. It was also the most likely method to produce usable code in the time frame available.

Back to Top

 

by: Keith A. Pray
Last Modified: July 4, 2004 8:59 AM
© 2004 - 1975 Keith A. Pray.
All rights reserved.