Apriori Sets And Sequences - Keith's MS Thesis
Apriori Sets And Sequences
About
Code
Performance Data Collection
·
Data Sets
Documents
Results
·
                                          
Printer Friendly Version
performance-improvements

Intro ] [ check-subsets ] [ post-candidate-generation-garbage-collection ] [ prune-candidate-generation ] [ remove-candidates ] [ template ]

Itemset Prefix Tree ] [ Previous Duplicate Hash ] [ Up: Results ]

Description of feature:

This feature attempts to save time and memory during support
counting. 

For a candidate
itemset to be considered frequent its support must meet a user
specified minimum. A minimum number of instances in the dataset must
contain the itemset. Since the number of instances in the data set is
known the number of instances needed for the itemset to be frequent
can be calculated. When the number of instances needed becomes lower
than the number of instances left to compare to the candidate itemset
list the candidate itemset will not be frequent. compare this itemset
to the rest of the instances will be in vain. This feature removes the
itemset from the candidate list so no further time or memory will be
used to counts its support.

History of this feature:

With event items the time it takes to test if an
instance from the dataset supports a candidate itemset becomes
nontrivial. Event items cannot be simply compared using the integer
representation for that particular item as can be done with regular
items. The event items in the candidate itemset must be mapped to
events in the instance. Once a valid mapping is found it can be said
that the instance supports the itemset. Furthermore, there may be more
than one valid mapping of candidate event items to instance event
items. Each valid mapping must be counted. This count is used to
calculate confidence when a rule has event items in both the
antecedent and consequent of the rule. All this required improvements
to how support is counted.

Testing this feature:

The time difference using and not using this feature can be
measured. The number of comparisons between candidates and instances
saved is also counted. 

Tasks:

1. Choose main testing dataset.

2. Run Experiments

3. Put results into spreadsheet
   
4. Answer original question, what is the time difference and how many
   comparisons are saved?


by: Keith A. Pray
Last Modified: July 4, 2004 8:03 AM
© 2004 - 1975 Keith A. Pray.
All rights reserved.

Current Theme: