|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectwpi.associations.arminerSequence.ARMinerApriori
Implements the AprioriSetsAndSequences version of the Apriori algorithm for finding frequent itemsets from datasets containing time sequence attributes.
Field Summary | |
private java.util.Vector |
allCandidates
The list of all candidates including those with duplicate events items. |
private DBCacheWriter |
cache_writer
Writes itemsets to a cache file |
private java.util.Vector |
candidates
The list of candidate itemsets to count support for. |
private DBReader |
db_reader
Reads rows from a database/dataset file |
private static int |
debug
Specifies debug info level 0: no debug info 1: input to methods 2: and out from method 3: and all sorts of stuff |
private java.util.Hashtable |
duplicateHash
The map between duplicate itemsets (containing duplicate event items) and the first itemset. |
private int |
haveEvents
Indicates if events are present in the data set. |
private HashTree |
ht_candidates
The hashtree of candidate itemsets |
private HashTree |
ht_k_frequent
The hashtree of frequent itemsets of size k |
private static int |
INITIAL_CAPACITY
Size to initialize data structures holding itemsets |
private java.util.Vector |
k_frequent
The list of frequent itemsets of size k |
protected Logger |
logger
The object responsible for logging. |
private int |
maxEvents
Maximum number of events from the same attribute allowed in a rule |
private long |
min_weight
Minimum number of rows/instances needed for an itemset to be considered frequent |
private static int |
NO
Don't use a feature |
private long |
num_rows
Number of rows in the dataset |
private int |
numPruned
Tracks how many itemsets are pruned using checkSubsets or hash tree's count subsets |
private int |
pass_num
This remembers the number of passes and also indicates the current cardinality of the candidates. |
private java.util.Vector |
possibleDuplicates
Itemsets that could possibly have duplicates later on. |
private ItemsetPrefixTree |
possibleDuplicatesTree
Itemsets that could possibly have duplicates later on. |
private java.util.Hashtable |
previousDuplicateHash
The map between duplicate itemsets (containing duplicate event items) and the first itemset that were found to be frequent. |
private static int |
useItemsetPrefixTree
Specify use of itemset prefix tree instead of vector for looking up possible duplicate itemsets during generation. |
private static int |
usePostGenerationGC
Specify post candidate generation garbage collection. |
private static int |
usePreviousDuplicateHash
Specify use of previous duplicate hash for testing possibility of an itemset being a duplicate or first of a set of duplicate itemsets. |
private static int |
usePruneCandidateGeneration
Specify use of pruneCandidateGeneration to save calls to getCandidate |
private static int |
YES
Use a feature |
Constructor Summary | |
ARMinerApriori()
|
Method Summary | |
private boolean |
checkSubsets(ARMinerItemset itemset)
Checks to see if all the subsets of the specified itemset are frequent. |
private void |
evaluateCandidates()
This procedure checks to see which itemsets are frequent |
int |
findFrequentItemsets(DBReader dbReader,
DBCacheWriter cacheWriter,
float minSupport)
Find the frequent itemsets in a database |
private void |
generateCandidates()
Generates new candidates out of itemsets that are frequent. |
private boolean |
getCandidate(ARMinerItemset is_i,
ARMinerItemset is_j)
This procedure tries to combine itemsets i and j and returns true if succesful, false if it can't combine them. |
private boolean |
getCandidate(int i,
int j)
This procedure tries to combine itemsets i and j and returns true if succesful, false if it can't combine them. |
int |
getMaxEvents()
Returns the number of event items of the same kind allowed in an itemset. |
private boolean |
pruneCandidateGeneration(ARMinerItemset isi,
ARMinerItemset isj)
This method is used to prune candidate generation when the dataset contains events. |
boolean |
setLogger(Logger log)
Sets the logging mechanism. |
void |
setMaxEvents(int num)
Sets the number of event items of the same kind allowed in an itemset. |
private void |
weighCandidates()
This procedure scans the database and computes the weight of each candidate. |
private void |
weighItemset(ARMinerItemset itemset)
This procedure scans the database and computes the weight of the itemset. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
private static final int INITIAL_CAPACITY
private static final int YES
private static final int NO
private static int usePruneCandidateGeneration
private static int useItemsetPrefixTree
private static int usePostGenerationGC
private static int usePreviousDuplicateHash
private java.util.Vector allCandidates
private java.util.Vector candidates
private java.util.Vector k_frequent
private HashTree ht_candidates
private HashTree ht_k_frequent
private java.util.Hashtable duplicateHash
private java.util.Hashtable previousDuplicateHash
private java.util.Vector possibleDuplicates
private ItemsetPrefixTree possibleDuplicatesTree
private int pass_num
private DBReader db_reader
private DBCacheWriter cache_writer
private long num_rows
private long min_weight
protected Logger logger
private int maxEvents
private int haveEvents
private int numPruned
private static final int debug
Constructor Detail |
public ARMinerApriori()
Method Detail |
public int findFrequentItemsets(DBReader dbReader, DBCacheWriter cacheWriter, float minSupport)
findFrequentItemsets
in interface FrequentItemsetsFinder
dbReader
- the object used to read from the databasecacheWriter
- the object used to write to the cache if this is null,
then nothing will be saved, this is useful for
benchmarkingminSupport
- the minimum support
private void weighCandidates()
private void weighItemset(ARMinerItemset itemset)
private void evaluateCandidates()
private void generateCandidates()
private boolean getCandidate(int i, int j)
private boolean getCandidate(ARMinerItemset is_i, ARMinerItemset is_j)
private boolean checkSubsets(ARMinerItemset itemset)
itemset
- the itemset to check the subsets of
private boolean pruneCandidateGeneration(ARMinerItemset isi, ARMinerItemset isj)
isi
- the first itemset to compareisj
- the itemset to compare to the first
public int getMaxEvents()
public void setMaxEvents(int num)
num
- the maximum number of event itemspublic boolean setLogger(Logger log)
log
- the logging mechanism, null turns logging off
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |