wpi.associations.arminerSequence
Class AprioriRules

java.lang.Object
  extended bywpi.associations.arminerSequence.AprioriRules
All Implemented Interfaces:
AssociationsFinder

public class AprioriRules
extends java.lang.Object
implements AssociationsFinder

AprioriRules.java

This class implements the Apriori algorithm for finding association rules. (see "Fast Algorithms for Mining Association Rules" by Rakesh Agrawal and Ramakrishnan Srikant from IBM Almaden Research Center 1994)

Author:
Keith A. Pray (kap@wpi.edu)

Field Summary
private static int debug
          Specifies debug info level 0: no debug info 1: input to methods 2: and output from methods 3: and all sorts of stuff
private  ARMinerItemset is_ignored
           
private  ARMinerItemset is_in_antecedent
           
private  ARMinerItemset is_in_consequent
           
private  int maxAntecedent
           
private  int maxConsequent
           
private  int minAntecedent
           
private  float minConfidence
           
private  int minConsequent
           
private  float minSupport
           
private  java.util.Hashtable numberHash
          Used to look up actual value of an item using its integer representation.
private  int numInstances
          Number of instances in the data set.
private  java.util.Vector requiredAntecedents
          Required antecedents for rule generated
private  java.util.Vector requiredConsequents
          Required consequents for rule generated
private  java.util.Vector rules
           
private  SET supports
           
 
Constructor Summary
AprioriRules()
           
 
Method Summary
private  void ap_genrules_constraint(ARMinerItemset is_frequent, java.util.Vector consequents)
          This is the ap-genrules procedure that generates rules out of a frequent itemset.
private  void ap_genrules_constraint(ARMinerItemset is_frequent, java.util.Vector consequents, java.util.Vector antecedentAttributes, java.util.Vector consequentAttributes)
          This is the ap-genrules procedure that generates rules out of a frequent itemset.
private  void ap_genrules(ARMinerItemset is_frequent, java.util.Vector consequents)
          This is the ap-genrules procedure that generates rules out of a frequent itemset.
private  java.util.Vector apriori_gen(java.util.Vector itemsets)
          This is the apriori_gen procedure that generates starting from a k-itemset collection a new collection of (k+1)-itemsets.
private  void buildRule(ARMinerItemset itemset, int[] antIndexes, int[] conIndexes)
          Builds a rule based on the specified itemset indexes and adds it to the rules to be retuned.
 java.util.Vector findAssociations(DBCacheReader cacheReader, float min_Support, float min_Confidence)
          Find association rules in a database, given the set of frequent itemsets.
 java.util.Vector findAssociations(DBCacheReader cacheReader, float min_Support, float min_Confidence, ARMinerItemset inAntecedent, ARMinerItemset inConsequent, ARMinerItemset ignored, int max_Antecedent, int min_Consequent)
          Find association rules in a database, given the set of frequent itemsets and a set of restrictions.
 java.util.Vector findAssociations(DBCacheReader cacheReader, float min_Support, float min_Confidence, java.util.Vector antecedentAttributes, java.util.Vector consequentAttributes, int max_Antecedent, int min_Consequent)
          ------------------- WEKA VERSION ------------------ Find association rules in a database, given the set of frequent itemsets and a set of restrictions.
 java.util.Vector findAssociations(DBCacheReader cacheReader, float min_Support, float min_Confidence, java.util.Vector antecedentAttributes, java.util.Vector consequentAttributes, int min_Antecedent, int max_Antecedent, int min_Consequent, int max_Consequent)
          -------------- ANOTHER VERSION FOR WEKA ------------- Find association rules in a database, given the set of frequent itemsets and a set of restrictions.
private  void generateFromMaximal(ARMinerItemset itemset)
          Generates all the possible rules from a maximal frequent itemset that meet all our criteria and adds them to the rules.
private  void generateFromMaximal(ARMinerItemset itemset, int[] antIndexes, int[] conIndexes, int conLevel)
          Generates all the possible rules from a maximal frequent itemset that meet all our criteria and adds them to the rules.
private  void generateFromMaximal(ARMinerItemset itemset, int ant, int con)
          Generates all the possible rules from a maximal frequent itemset that meet all our criteria and adds them to the rules.
private  void generateFromMaximal(ARMinerItemset itemset, int con, int[] antIndexes, int antLevel)
          Generates all the possible rules from a maximal frequent itemset that meet all our criteria and adds them to the rules.
 java.util.Hashtable getNumberHash()
          Returns the number hash used for looking up values in this itemset.
 int getNumInstances()
          Returns the number of instances contained in the data set from which the frequent itemset were mined.
private  void initializeSupports(DBCacheReader cacheReader)
          This method stores all frequent itemsets that have support greater than the minimum support in a SET for more efficient access times.
private  int removeDuplicateRules()
          Removes duplicate rules from the rules list.
private  int removeDuplicateRules(int index)
          Removes duplicate rules from the rules list.
 void setNumberHash(java.util.Hashtable h)
          Sets the number hash to use for looking up values in this itemset
 void setNumInstances(int n)
          Sets the number of instances contained in the data set from which the frequent itemset were mined.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

supports

private SET supports

rules

private java.util.Vector rules

minSupport

private float minSupport

minConfidence

private float minConfidence

is_in_antecedent

private ARMinerItemset is_in_antecedent

is_in_consequent

private ARMinerItemset is_in_consequent

is_ignored

private ARMinerItemset is_ignored

minAntecedent

private int minAntecedent

maxAntecedent

private int maxAntecedent

minConsequent

private int minConsequent

maxConsequent

private int maxConsequent

requiredAntecedents

private java.util.Vector requiredAntecedents
Required antecedents for rule generated


requiredConsequents

private java.util.Vector requiredConsequents
Required consequents for rule generated


numInstances

private int numInstances
Number of instances in the data set. Used for computing statistical measures


numberHash

private java.util.Hashtable numberHash
Used to look up actual value of an item using its integer representation.


debug

private static final int debug
Specifies debug info level 0: no debug info 1: input to methods 2: and output from methods 3: and all sorts of stuff

See Also:
Constant Field Values
Constructor Detail

AprioriRules

public AprioriRules()
Method Detail

setNumberHash

public void setNumberHash(java.util.Hashtable h)
Sets the number hash to use for looking up values in this itemset

Parameters:
h - the number hash

getNumberHash

public java.util.Hashtable getNumberHash()
Returns the number hash used for looking up values in this itemset.

Returns:
the number hash

setNumInstances

public void setNumInstances(int n)
Sets the number of instances contained in the data set from which the frequent itemset were mined.

Parameters:
n - the number of instances or rows

getNumInstances

public int getNumInstances()
Returns the number of instances contained in the data set from which the frequent itemset were mined.

Returns:
the number of instances or rows

initializeSupports

private void initializeSupports(DBCacheReader cacheReader)
This method stores all frequent itemsets that have support greater than the minimum support in a SET for more efficient access times.


findAssociations

public java.util.Vector findAssociations(DBCacheReader cacheReader,
                                         float min_Support,
                                         float min_Confidence)
Find association rules in a database, given the set of frequent itemsets.

Specified by:
findAssociations in interface AssociationsFinder
Parameters:
cacheReader - the object used to read from the cache
min_Support - the minimum support
min_Confidence - the minimum confidence
Returns:
a Vector containing all association rules found

ap_genrules

private void ap_genrules(ARMinerItemset is_frequent,
                         java.util.Vector consequents)
This is the ap-genrules procedure that generates rules out of a frequent itemset.


apriori_gen

private java.util.Vector apriori_gen(java.util.Vector itemsets)
This is the apriori_gen procedure that generates starting from a k-itemset collection a new collection of (k+1)-itemsets.


findAssociations

public java.util.Vector findAssociations(DBCacheReader cacheReader,
                                         float min_Support,
                                         float min_Confidence,
                                         ARMinerItemset inAntecedent,
                                         ARMinerItemset inConsequent,
                                         ARMinerItemset ignored,
                                         int max_Antecedent,
                                         int min_Consequent)
Find association rules in a database, given the set of frequent itemsets and a set of restrictions.

Specified by:
findAssociations in interface AssociationsFinder
Parameters:
cacheReader - the object used to read from the cache
min_Support - the minimum support
min_Confidence - the minimum confidence
inAntecedent - the items that must appear in the antecedent of each rule, if null then this constraint is ignored
inConsequent - the items that must appear in the consequent of each rule, if null then this constraint is ignored
ignored - the items that should be ignored, if null then this constraint is ignored
max_Antecedent - the maximum number of items that can appear in the antecedent of each rule, if 0 then this constraint is ignored
min_Consequent - the minimum number of items that should appear in the consequent of each rule, if 0 then this constraint is ignored
Returns:
a Vector containing all association rules found

findAssociations

public java.util.Vector findAssociations(DBCacheReader cacheReader,
                                         float min_Support,
                                         float min_Confidence,
                                         java.util.Vector antecedentAttributes,
                                         java.util.Vector consequentAttributes,
                                         int max_Antecedent,
                                         int min_Consequent)
------------------- WEKA VERSION ------------------ Find association rules in a database, given the set of frequent itemsets and a set of restrictions.

Parameters:
cacheReader - the object used to read from the cache
min_Support - the minimum support
min_Confidence - the minimum confidence
antecedentAttributes - a vector of ARMinerItemsets that each contains only items in a single attribute used to restrict which attributes must appear in the antecedent, ignored if null
consequentAttributes - a vector of ARMinerItemsets that each contains only items in a single attribute used to restrict which attributes must appear in the consequent, ignored if null
max_Antecedent - the maximum number of items that can appear in the antecedent of each rule, if 0 then this constraint is ignored
min_Consequent - the minimum number of items that should appear in the consequent of each rule, if 0 then this constraint is ignored
Returns:
a Vector containing all association rules found

findAssociations

public java.util.Vector findAssociations(DBCacheReader cacheReader,
                                         float min_Support,
                                         float min_Confidence,
                                         java.util.Vector antecedentAttributes,
                                         java.util.Vector consequentAttributes,
                                         int min_Antecedent,
                                         int max_Antecedent,
                                         int min_Consequent,
                                         int max_Consequent)
-------------- ANOTHER VERSION FOR WEKA ------------- Find association rules in a database, given the set of frequent itemsets and a set of restrictions.

Parameters:
cacheReader - the object used to read from the cache
antecedentAttributes - a vector of ARMinerItemsets that each contains only items in a single attribute used to restrict which attributes must appear in the antecedent, ignored if null
consequentAttributes - a vector of ARMinerItemsets that each contains only items in a single attribute used to restrict which attributes must appear in the consequent, ignored if null
Returns:
a Vector containing all association rules found

ap_genrules_constraint

private void ap_genrules_constraint(ARMinerItemset is_frequent,
                                    java.util.Vector consequents,
                                    java.util.Vector antecedentAttributes,
                                    java.util.Vector consequentAttributes)
This is the ap-genrules procedure that generates rules out of a frequent itemset.


ap_genrules_constraint

private void ap_genrules_constraint(ARMinerItemset is_frequent,
                                    java.util.Vector consequents)
This is the ap-genrules procedure that generates rules out of a frequent itemset.


generateFromMaximal

private void generateFromMaximal(ARMinerItemset itemset)
Generates all the possible rules from a maximal frequent itemset that meet all our criteria and adds them to the rules.

Parameters:
itemset - a maximal frequent itemset from which to build rules

generateFromMaximal

private void generateFromMaximal(ARMinerItemset itemset,
                                 int ant,
                                 int con)
Generates all the possible rules from a maximal frequent itemset that meet all our criteria and adds them to the rules.

Parameters:
itemset - a maximal frequent itemset from which to build rules
ant - the size to make the antecedent
con - the size to make the consequent

generateFromMaximal

private void generateFromMaximal(ARMinerItemset itemset,
                                 int con,
                                 int[] antIndexes,
                                 int antLevel)
Generates all the possible rules from a maximal frequent itemset that meet all our criteria and adds them to the rules.

Parameters:
itemset - a maximal frequent itemset from which to build rules
con - the size to make the consequent
antIndexes - indexes into itemset to use for the antecedent
antLevel - the ant index to increment

generateFromMaximal

private void generateFromMaximal(ARMinerItemset itemset,
                                 int[] antIndexes,
                                 int[] conIndexes,
                                 int conLevel)
Generates all the possible rules from a maximal frequent itemset that meet all our criteria and adds them to the rules.

Parameters:
itemset - a maximal frequent itemset from which to build rules
antIndexes - the antecedent indexes
conIndexes - the consequent indexes
conLevel - the consequent index to increment

buildRule

private void buildRule(ARMinerItemset itemset,
                       int[] antIndexes,
                       int[] conIndexes)
Builds a rule based on the specified itemset indexes and adds it to the rules to be retuned. Note: confidence is not calculated for rules whose antecedent and consequent contains event items. The confidence of these rules is the responsiblity of the calling method.

Parameters:
itemset - a maximal frequent itemset from which to build rules
antIndexes - the antecedent indexes
conIndexes - the consequent indexes

removeDuplicateRules

private int removeDuplicateRules()
Removes duplicate rules from the rules list. The number of rules removed is returned.

Returns:
the number of duplicate rules found and removed

removeDuplicateRules

private int removeDuplicateRules(int index)
Removes duplicate rules from the rules list. The rule at the specified index is compared to all rules appearing later in the list of rules. The number of rules removed is returned.

Parameters:
index - the index of the rule list where comparisons should start
Returns:
the number of duplicate rules found and removed