next up previous contents
Next: Level 1 Candidate Generation Up: Apriori Sets And Sequences Previous: Brief Outline   Contents

Input

Apriori Sets And Sequences takes as input a data set consisting of instances. Each instance has a set of attribute value pairs. Attributes can be of type nominal, set, and event set. If temporal associations are to be mined the data set must contain event sets. An event set is simply a set who's items are events. An event consists of three kinds of information. First is the time sequence attribute in which the event occurred. Second is the name of the event. Third is the period of time during which the event happened. The time sequence attribute name and the event name together define the event item type. Time sequence attributes can also be contained in the data set but they are not used directly during association rule mining.

Because of the simple requirements of an event Apriori Sets And Sequences can be applied to numeric and symbolic sequences in any domain. There has been much work on finding interesting patterns in sequential data and finding those patterns in sequences. This work serves as an excellent source of event detection tools for many domains.

The attribute value format of the data set is common among data mining tools. In order to mine association rules from this data it is translated into an item representation. Each possible attribute value pair that exists in the data set is defined as an item and given an integer number called the item number. In Apriori Sets And Sequences there are two kinds of items. Regular items are formed from attribute value pairs where the value is not a sequence or an event set. The regular items are uniquely identified by their integer number. Event items are formed from attributes who's values are event sets. While event items are given an item number it is still necessary to interpret the item during the association rule mining process.

Apriori Sets And Sequences takes parameters that are similar to those accepted by regular Apriori. In addition to minimum support and minimum confidence is the maximum number of event items of the same type allowed in a rule. Normally all the items present in the data set are candidates for appearing in a resulting association rule. This is not the case for event items in Apriori Sets and Sequences. Since event items are interpreted rather than being treated as a literal a representative event item is used during mining in the itemsets for each type of event item. These representative event items are the only event items that can appear in a rule. If the user wants to allow more than one event of the same type in a rule there must be more than one representative event of that type present in the items used during mining. The user selects the maximum number of these event items to be used. The more allowed the more event items there are to count support for and use to generate new candidate itemsets. There is a negative impact on performance the higher the maximum is set.


next up previous contents
Next: Level 1 Candidate Generation Up: Apriori Sets And Sequences Previous: Brief Outline   Contents
Keith A. Pray 2003-06-17