next up previous contents
Next: Level 3 (and up) Up: Apriori Sets And Sequences Previous: Level 2 Candidate Generation   Contents

Level 2 (and up) Counting Support

Candidate itemset weight is counted in Apriori in the same manner as described in Level 1 Support Counting. For Apriori Sets And Sequences the process is much different. Each instance in the data set is still compared to each candidate itemset in the candidate list. First, the instance must contain the regular items in the itemset. If they are all present the event items are checked. If the itemset contains only one event item the checking is trivial. An event item of the same type must be located in the instance. If it is the weight of the itemset is incremented. The number of event items of that type in that instance is counted. The event weight of the itemset is increased by this value.

If the candidate itemset contains two event items (or more in subsequent levels) then a mapping must be made from event items in the itemset and event items in the instance. The relative begin and end times of event items are stored for quick reference in an array. Each index of the array represents that relative time along the time line. In each index is a list of begin and end labels for each event with that corresponding begin and end time. A mapping between relative times in the itemset and in the instance represents a partial possible mapping between the begin and end labels of the event items, and hence a mapping between the event items themselves.

A mapping is made between an itemset's relative times and an instance's relative times by starting at time 0 on the itemset's time line. The list of event item begin labels is retrieved. Starting at time 0 in the instance, the earliest time index containing a list of begin labels for the same type of events as those in the itemset is found. Once this corresponding time index is found it is noted as a possible map for that relative time in the itemset. The remaining time indexes in the itemset are mapped in the same way, always starting at the next relative time in the instance.

Once a possible map is created it is tested using the end time labels. Begin and end time labels are paired together, belonging to a single event item in the itemset. The corresponding begin and end labels in the instance must belong to a single event item as well. Furthermore, an event item in the instance can only be mapped to a single event item in the itemset as the begin and end labels must. Once the mapping of relative times is validated by creating a valid map of event begin and end labels the mapping is added to the list of valid maps. If the map is found to be invalid the last relative time used in the instance is skipped and another possible map of relative times is created and tested. This continues until there are no more relative times left to try in the instance.

This process repeats for each possible relative time in the instance that can be mapped to the itemset's relative time 0. The number of successful mappings found is the event weight contributed by the instance to the itemset. If the number of maps found is one or more, the weight of the itemset is increased.

After all the itemsets in the candidate list have been tested for inclusion in all the instances in the data set, the weights and event weights of the duplicate itemsets in the all candidates list is updated.


next up previous contents
Next: Level 3 (and up) Up: Apriori Sets And Sequences Previous: Level 2 Candidate Generation   Contents
Keith A. Pray 2003-06-17