wpi.associations
Class AprioriSetsAndSequences

java.lang.Object
  extended byweka.associations.Associator
      extended bywpi.associations.Associator
          extended bywpi.associations.AprioriSetsAndSequences
All Implemented Interfaces:
java.lang.Cloneable, weka.core.OptionHandler, java.io.Serializable

public class AprioriSetsAndSequences
extends Associator
implements weka.core.OptionHandler

Class implementing an Apriori-type algorithm that may understand sets. Sets must be loaded as strings into WEKA and are then parsed using ^ or & to delimit items. Items are split into individual Attribute-Value pairs for the item based ARMiner algorithm. WEKA iteratively reduces the minimum support until it finds the required number of rules with the given minimum confidence.

This class implements the Apriori algorithm for finding large itemsets. (see "Fast Algorithms for Mining Association Rules" by Rakesh Agrawal and Ramakrishnan Srikant from IBM Almaden Research Center 1994) Valid options are:

-N required number of rules
The required number of rules (default: 10).

-C minimum confidence score of a rule
The minimum confidence of a rule (default: 0.9).

-D delta for minimum support
The delta by which the minimum support is decreased in each iteration (default: 0.05). -U upper bound for minimum support
The upper bound for minimum support. Don't explicitly look for rules with more than this level of support.

-M lower bound for minimum support
The lower bound for the minimum support (default = 0.1).

-A
The range of required attributes for the antecedents.

-B
The maximum number of antecedent items.

-W
Maximum number of event items of the same type allowed in a rule.

-Y
The range of required attributes for the consequents.

-Z
The minimum number of consequent items.

-F
The cache of frequent itemsets to use. -L
Write statistics to a log file.

Version:
1.11
Author:
Eibe Frank (eibe@cs.waikato.ac.nz), Mark Hall (mhall@cs.waikato.ac.nz), Zack Stoecker-Sylvia (zss@wpi.edu), Keith A. Pray (kap@wpi.edu)
See Also:
Serialized Form

Field Summary
protected  java.lang.String cacheFileName
          File name to load cached frequent itemsets from
private static int debug
          Specifies debug info level 0: no debug info 1: input to methods 2: and output from methods 3: and all sorts of stuff
protected  boolean logStats
          Specifies statistics should be written to a log file.
protected  java.util.Hashtable m_attributeHash
          A hash listing attribute value pair Strings by Integer key
protected  int m_cycles
          Number of cycles used before required number of rules was one.
protected  double m_delta
          Delta by which m_minSupport is decreased in each iteration.
protected  weka.core.Instances m_instances
          The instances (transactions) to be used for generating the association rules.
protected  double m_lowerBoundMinSupport
          The lower bound for the minimum support.
protected  int m_maxAntecedent
          The maximum number of antecedents in a rule
protected  int m_maxConsequent
          The maximum number of consequents in a rule
protected  int m_maxRulesOutput
          The maximum number of rules that are output.
protected  int m_minAntecedent
          The minimum number of antecedents in a rule
protected  double m_minConfidence
          The minimum confidence score.
protected  int m_minConsequent
          The minimum number of consequents in a rule
protected  double m_minSupport
          The minimum support.
protected  weka.core.Range m_requiredAntecedentAttributes
          The attributes required to be in the antecedents of the rules
protected  weka.core.Range m_requiredConsequentAttributes
          The attributes required to be in the consequents of the rules
protected  java.util.Vector m_rulesFound
          The vector containing all found AssociationRules
protected  double m_upperBoundMinSupport
          The upper bound on the support
protected  int maxEvents
          The maximum number of events of the same kind allowed in a rule.
 
Fields inherited from class wpi.associations.Associator
logger, logStatsFileName
 
Constructor Summary
AprioriSetsAndSequences()
          Constructor that allows to sets default values for the minimum confidence and the maximum number of rules the minimum confidence.
 
Method Summary
 java.lang.String cacheFileNameTipText()
           
 java.lang.String deltaTipText()
           
private  java.util.Vector eventConfidence(java.util.Vector rules, DBReader instanceReader, float minConfidence)
          Calculates the confidence of rules containing event items in both the antecedent and consequent.
 java.lang.String getCacheFileName()
           
 double getDelta()
           
 boolean getLogStats()
           
 java.lang.String getLogStatsFileName()
           
 double getLowerBoundMinSupport()
           
 int getMaxAntecedent()
           
 int getMaxConsequent()
           
 int getMaxEvents()
           
 int getMinAntecedent()
           
 double getMinConfidence()
           
 int getMinConsequent()
           
 int getNumRules()
           
 java.lang.String[] getOptions()
          Gets the current settings of the Apriori object.
 java.lang.String getRequiredAntecedents()
           
 java.lang.String getRequiredConsequents()
           
 double getUpperBoundMinSupport()
           
 java.lang.String globalInfo()
          Returns a string describing this associator
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options
 java.lang.String logStatsFileNameTipText()
           
 java.lang.String logStatsTipText()
           
 java.lang.String lowerBoundMinSupportTipText()
           
static void main(java.lang.String[] options)
          Main method for testing this class.
 java.lang.String maxAntecedentTipText()
           
 java.lang.String maxConsequentTipText()
           
 java.lang.String maxEventsTipText()
           
 java.lang.String minAntecedentTipText()
           
 java.lang.String minConfidenceTipText()
           
 java.lang.String minConsequentTipText()
           
 void mineAssociations(weka.core.Instances instances)
          Method that generates all large itemsets with a minimum support, and from these all association rules with a minimum confidence.
 java.lang.String numRulesTipText()
           
 java.lang.String requiredAntecedentsTipText()
           
 java.lang.String requiredConsequentsTipText()
           
 void resetOptions()
          Resets the options to the default values.
 void setCacheFileName(java.lang.String v)
           
 void setDelta(double v)
           
 void setLogStats(boolean l)
           
 void setLogStatsFileName(java.lang.String v)
           
 void setLowerBoundMinSupport(double v)
           
 void setMaxAntecedent(int i)
           
 void setMaxConsequent(int i)
           
 void setMaxEvents(int i)
           
 void setMinAntecedent(int i)
           
 void setMinConfidence(double v)
           
 void setMinConsequent(int i)
           
 void setNumRules(int v)
           
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setRequiredAntecedents(java.lang.String s)
           
 void setRequiredConsequents(java.lang.String s)
           
 void setUpperBoundMinSupport(double v)
           
 java.lang.String toString()
          Outputs the association rules and their confidences and supports.
 java.lang.String upperBoundMinSupportTipText()
           
 
Methods inherited from class wpi.associations.Associator
buildAssociations
 
Methods inherited from class weka.associations.Associator
forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_minSupport

protected double m_minSupport
The minimum support.


m_upperBoundMinSupport

protected double m_upperBoundMinSupport
The upper bound on the support


m_lowerBoundMinSupport

protected double m_lowerBoundMinSupport
The lower bound for the minimum support.


m_minConfidence

protected double m_minConfidence
The minimum confidence score.


m_maxRulesOutput

protected int m_maxRulesOutput
The maximum number of rules that are output.


m_delta

protected double m_delta
Delta by which m_minSupport is decreased in each iteration.


m_cycles

protected int m_cycles
Number of cycles used before required number of rules was one.


m_instances

protected weka.core.Instances m_instances
The instances (transactions) to be used for generating the association rules.


m_minAntecedent

protected int m_minAntecedent
The minimum number of antecedents in a rule


m_maxAntecedent

protected int m_maxAntecedent
The maximum number of antecedents in a rule


m_minConsequent

protected int m_minConsequent
The minimum number of consequents in a rule


m_maxConsequent

protected int m_maxConsequent
The maximum number of consequents in a rule


maxEvents

protected int maxEvents
The maximum number of events of the same kind allowed in a rule.


m_rulesFound

protected java.util.Vector m_rulesFound
The vector containing all found AssociationRules


m_requiredAntecedentAttributes

protected weka.core.Range m_requiredAntecedentAttributes
The attributes required to be in the antecedents of the rules


m_requiredConsequentAttributes

protected weka.core.Range m_requiredConsequentAttributes
The attributes required to be in the consequents of the rules


m_attributeHash

protected java.util.Hashtable m_attributeHash
A hash listing attribute value pair Strings by Integer key


cacheFileName

protected java.lang.String cacheFileName
File name to load cached frequent itemsets from


logStats

protected boolean logStats
Specifies statistics should be written to a log file.


debug

private static final int debug
Specifies debug info level 0: no debug info 1: input to methods 2: and output from methods 3: and all sorts of stuff

See Also:
Constant Field Values
Constructor Detail

AprioriSetsAndSequences

public AprioriSetsAndSequences()
Constructor that allows to sets default values for the minimum confidence and the maximum number of rules the minimum confidence.

Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this associator

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

resetOptions

public void resetOptions()
Resets the options to the default values.


mineAssociations

public void mineAssociations(weka.core.Instances instances)
                      throws java.lang.Exception
Method that generates all large itemsets with a minimum support, and from these all association rules with a minimum confidence.

Specified by:
mineAssociations in class Associator
Parameters:
instances - the instances to be used for generating the associations
Throws:
java.lang.Exception - if rules can't be built successfully

eventConfidence

private java.util.Vector eventConfidence(java.util.Vector rules,
                                         DBReader instanceReader,
                                         float minConfidence)
Calculates the confidence of rules containing event items in both the antecedent and consequent.

Parameters:
rules - association rules already found
instanceReader - data set from which to calculate confidence
minConfidence - minimum confidence, rules which do not meet this are not included in the list of rules returned
Returns:
the new list of rules

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options

Specified by:
listOptions in interface weka.core.OptionHandler
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-N required number of rules
The required number of rules (default: 10).

-C minimum confidence score of a rule
The minimum confidence of a rule (default: 0.9).

-D delta for minimum support
The delta by which the minimum support is decreased in each iteration (default: 0.05). -U upper bound for minimum support
The upper bound for minimum support. Don't explicitly look for rules with more than this level of support.

-M lower bound for minimum support
The lower bound for the minimum support (default = 0.1).

-V
If set then progress is reported iteratively during execution.

-A
The range of required attributes for the antecedents.

-B
The maximum number of antecedent items.

-W
Maximum number of event items of the same type allowed in a rule.

-Y
The range of required attributes for the consequents.

-Z
The minimum number of consequent items.

-F
The cache of frequent itemsets to use. -L
Write statistics to a log file.

Specified by:
setOptions in interface weka.core.OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Apriori object.

Specified by:
getOptions in interface weka.core.OptionHandler
Returns:
an array of strings suitable for passing to setOptions

toString

public java.lang.String toString()
Outputs the association rules and their confidences and supports.


requiredAntecedentsTipText

public java.lang.String requiredAntecedentsTipText()

setRequiredAntecedents

public void setRequiredAntecedents(java.lang.String s)

getRequiredAntecedents

public java.lang.String getRequiredAntecedents()

requiredConsequentsTipText

public java.lang.String requiredConsequentsTipText()

setRequiredConsequents

public void setRequiredConsequents(java.lang.String s)

getRequiredConsequents

public java.lang.String getRequiredConsequents()

minAntecedentTipText

public java.lang.String minAntecedentTipText()

setMinAntecedent

public void setMinAntecedent(int i)

getMinAntecedent

public int getMinAntecedent()

maxAntecedentTipText

public java.lang.String maxAntecedentTipText()

setMaxAntecedent

public void setMaxAntecedent(int i)

getMaxAntecedent

public int getMaxAntecedent()

minConsequentTipText

public java.lang.String minConsequentTipText()

setMinConsequent

public void setMinConsequent(int i)

getMinConsequent

public int getMinConsequent()

maxConsequentTipText

public java.lang.String maxConsequentTipText()

setMaxConsequent

public void setMaxConsequent(int i)

getMaxConsequent

public int getMaxConsequent()

maxEventsTipText

public java.lang.String maxEventsTipText()

setMaxEvents

public void setMaxEvents(int i)

getMaxEvents

public int getMaxEvents()

upperBoundMinSupportTipText

public java.lang.String upperBoundMinSupportTipText()

getUpperBoundMinSupport

public double getUpperBoundMinSupport()

setUpperBoundMinSupport

public void setUpperBoundMinSupport(double v)

lowerBoundMinSupportTipText

public java.lang.String lowerBoundMinSupportTipText()

getLowerBoundMinSupport

public double getLowerBoundMinSupport()

setLowerBoundMinSupport

public void setLowerBoundMinSupport(double v)

minConfidenceTipText

public java.lang.String minConfidenceTipText()

getMinConfidence

public double getMinConfidence()

setMinConfidence

public void setMinConfidence(double v)

numRulesTipText

public java.lang.String numRulesTipText()

getNumRules

public int getNumRules()

setNumRules

public void setNumRules(int v)

deltaTipText

public java.lang.String deltaTipText()

getDelta

public double getDelta()

setDelta

public void setDelta(double v)

cacheFileNameTipText

public java.lang.String cacheFileNameTipText()

getCacheFileName

public java.lang.String getCacheFileName()

setCacheFileName

public void setCacheFileName(java.lang.String v)

setLogStatsFileName

public void setLogStatsFileName(java.lang.String v)

logStatsFileNameTipText

public java.lang.String logStatsFileNameTipText()

getLogStatsFileName

public java.lang.String getLogStatsFileName()

logStatsTipText

public java.lang.String logStatsTipText()

getLogStats

public boolean getLogStats()

setLogStats

public void setLogStats(boolean l)

main

public static void main(java.lang.String[] options)
Main method for testing this class.