next up previous contents
Next: Level 2 Candidate Generation Up: Apriori Sets And Sequences Previous: Level 1 Candidate Generation   Contents

Level 1 Counting Support

The support of an itemset is the total percentage of instances in the data set that contain that itemset. The weight of an itemset is the total number of instance that contain that itemset. It is more efficient to count weight than calculate support during this stage. Apriori simply scans all the instances in the dataset, comparing each to each candidate itemset. If the instance contains the itemset the weight for that itemset is incremented.

In Apriori Sets And Sequences an instance counts towards the support of an itemset when it contains all the regular items in that itemset. Additionally, the instance must contain a set of event items that can be mapped to the event items in the itemset. A valid mapping of event items means that there exists a set of event items in the instance that have the same event type as those in the itemset. Furthermore, when considered by themselves, apart from any other event items in the instance, their relative begin and end times are the same as those found in the itemset.

Mapping event items is trivial when the itemset contains a single event item. This is because the relative begin and end times are always 0 and 1, respectively. As long as the instance contains an event item of the same type a mapping is guaranteed.

It is not only necessary to count the weight of an itemset in Apriori Sets And Sequences. The number of different valid mappings that can be made between an itemset and all the instances in the data set must also be counted. This count is the event weight. In the case of an itemset containing only one event item this is simply the number of times that event type appears in the data set. It is necessary to count event weight in order to correctly calculate the confidence of a rule. This is discussed in Calculating Confidence.

In Apriori Sets And Sequences only the candidate itemsets in the candidate list are compared to the instance in the data set. There is an additional list name all candidates that contains all the itemsets in the candidates and all the duplicate itemsets that were generated. Time is saved by not counting the weight of the duplicate itemsets. After the support counting process is done the duplicate itemsets are assigned the weight of the itemset in the candidate list that it is a duplicate of. The mapping between the all candidate itemsets and the candidate itemsets are stored in a has table created during candidate generation.


next up previous contents
Next: Level 2 Candidate Generation Up: Apriori Sets And Sequences Previous: Level 1 Candidate Generation   Contents
Keith A. Pray 2003-06-17