Apriori Sets And Sequences - Keith's MS Thesis
Apriori Sets And Sequences
About
Code
Performance Data Collection
·
Data Sets
Documents
Results
·
                                          
Printer Friendly Version
Itemset Prefix Tree

[ Intro ]

Up performance-improvements ]

Using the itemset prefix tree effect the time it takes to generate
candidates.

How does it effect the time?

The itemset prefix tree indexes itemsets. It is used for looking up
duplicate candidate itemsets during generation. This is faster than
searching through a simple list. It only takes a number of comparisons
equal to or less than the number of items in the itemset to find a
duplicate itemset if one exists. This is comparing items. If searching
a list the number of comparisons between itemsets can be very large,
each comparing up to the number of items in an itemset.

History of this feature:

The alternative to discovering duplicate itemsets is to simply count
support for all the itemsets generated. From preliminary testing
during implementation it was seen that counting support for all the
candidate itemsets became prohibitively long at an earlier level than
identifying duplicate itemsets. 

The performance improvement gained by using the itemset prefix tree
will be measured as how many itemsets are saved from support
counting. The time it takes to identify the duplicate itemsets will
be compared to the time it takes to identify them using a simple list
search.

Testing the feature:

The same data set and test set from the previous duplicate hash
testing will be used.

This feature can be used in conjunction with the previous duplicate
hash feature. The test set will be run twice, once with the previous
duplicate has feature off and the other with it on.

by: Keith A. Pray
Last Modified: July 4, 2004 8:03 AM
© 2004 - 1975 Keith A. Pray.
All rights reserved.

Current Theme: