Apriori Sets And Sequences - Keith's MS Thesis
Apriori Sets And Sequences
About
Code
Performance Data Collection
·
Data Sets
Documents
Results
·
                                          
Printer Friendly Version
Temporal

Intro ] [ Beyond Intratransaction Association Analysis-Mining Multidimensional Intertransaction Association Rules.pdf ] [ Data Mining Introduction and Advanced Topics-Notes ] [ Detecting Complex Dependencies in Categorical Data-Notes ]

Up: References ]

Thoughts on:

	 "Data Mining, Introduction and Advanced Topics",
	 Chapter 7, Web Mining, pp. 195-219
	 Chapter 9, Temporal Mining, pp. 245-273
	 Margaret H. Dunham,
	 Southern Methodist University,
	 Pearson Education, Inc. 2003

There's no electronic copy available.

The term "time series" is defined as a set of attribute values over a
period of time. So a single attribute's value over time is represented
in a single time series. The author mentions that some common
interpretations include that the values are numeric and the times are
spaced evenly between each value in the series.

The author's general definition describes my time sequence attribute.

Data mining applications to time series mentioned includes predicting
future values of a time series, clustering or identifying alike time
series, classifying time series, and identifying patterns in a single
time series.

An important note is that these applications work with a single time
series or time series comparison. The goal is often concerned with a
single time series or type of series.

The term "sequence" is defined as an ordered list of itemsets. So each
itemset represents all the attributes present at a single position in
the sequence.

This representation was used in 

      "Mining Inter-Transaction Associations with Templates",
      Ling Feng, Hongjun Lu, Jeffery Xu Yu, Jiawei Han
      Hong Kong Polytechnic University, Hong Kong University of
      Science and Technology, Australian National University, Simon
      Fraser University, 
      CIKM 1999 

---------
Chapter 7
---------

AprioriAll is described. It is an Apriori like algorithm for finding
frequent sequences of items. First it finds frequent itemsets. Second it
relaces the original data set with the frequent itemsets. Third it
finds sequential patterns. 

A single sequence here is similar to an event in Apriori Sets and Sequences.

The term "episode" is introduced. An episode is a set of events that
occur within a specified window of time. The events are instantaneous
occurring at a single point in time. There is no notion of duration.



---------
Chapter 9
---------

An entire data set represented in the manner described above would
constitute a single instance in my data set. 

Apriori Sets And Sequences can be used on both time series and
sequences as defined here. The transformation between them and my data
set representation is trivial.

The author briefly explains the notion of a frequent sequence. In a
data set where each instance is a transaction of items bought by a
particular customer at a specific time, the support of a sequence is
percentage of total customers contain the sequence.

This is the same as considering all the transactions of a single
customer as a single instance.

The idea of accounting for sequences that appear more than once
for a certain customer is not addressed. If a sequence S = ({a}, {b})
and we have an instance I that contains ({a, c, e}, {b, d}, {a}) then
even though the item a appears without a following b, instance I will
still be counted as wholly supporting sequence S.

Since association rules consisting of these sequences do not relate
the occurrence of items in time between the sequences, given sequence S
and T, S => T, says nothing about the respective order of either S or
T. Only the items contained in a single sequence have known order in
respect to the other items in that sequence.

AprioriAll and SPADE are described briefly. 

Generalizations mentioned include a maximum time between elements in a
sequence constraint, a sliding window, a minimum and maximum time between
elements, and a concept hierarchy for grouping items. 

All of these can easily be applied to Apriori Sets And Sequences.

Candidate generation is mentioned. The authors discuss the necessity
of generating more than one candidate from a single pair of itemsets,
sequences in this case. With sequences ({A}) and ({B}) frequent the
following candidates would be generated: 
	  ({A, B}), ({A}, {B}), ({B}, {A})

This resembles the candidate generation in Apriori Sets and
Sequences. 

Feature extraction can be used to make analysis of the sequences
easier. 

All pattern matching, feature extraction, frequent sequence mining, and
other such techniques can be used to identify events. Apriori Sets And
Sequences takes the events as input.

I didn't think much about it before but it seems Apriori Sets And
Sequences could be used to find patterns in sequences. If each value
occurring in a sequences was considered an event we could mine the
frequent sub-sequences. It should work for both symbolic and numeric
data. Numeric data could be discretized or rounded to generalize the
values if needed... I wonder how that would go.

Inter-transaction rules are mentioned. Basically a time stamp for each
transaction is given and a sliding window is placed over the ordered
transactions for a particular customer. 

This work cited as TLHF99 is commented in my notes for:

     "Mining Inter-Transaction Associations with Templates",
     Ling Feng, Hongjun Lu, Jeffery Xu Yu, Jiawei Han
     Hong Kong Polytechnic University, Hong Kong University of Science
     and Technology, Australian National University, Simon Fraser
     University, 
     CIKM 1999 

Episode rules are described. These take the form A => B where B is a
sub-episode of A. 

We've seen Apriori Sets And Sequences produce similar rules in the
stock data domain where one event is contained in another.

Trend dependencies are discussed. This is mainly a rule of the form 
A => B where B is a trend such as an attributes value increasing. This
implies for the two database states, or ordered transactions
containing A the attribute referenced in B will increase in the
subsequent transaction. 

If a sub-sequence, or event as I prefer to refer to things, describes
a trend in a time sequence attribute Apriori Sets and Sequences does
find rules resembling trend dependencies.

Sequence Association Rules are discussed. They follow the description
give earlier for frequent sequences.

Calendric Association Rules are discussed. These are rules that for a
set of transactions occurring inside a defined window, and occur
during specified periods of time, meets the minimum support and minimum
confidence requirements. So this deals with identifying periods of
time as defined in a calendar and granularities of time (time unit;
minute, hour, day, etc) for which an association rule is valid.

The way calendric association rules are described each instance
belongs to a particular part of the calendar. The calendar bins the
instances according to the time unit specified. If month is chosen for
the unit we might expect more instance to belong to any time unit than
if the unit was day. If the month, day, and other possible time units,
an instance could belong to, our rules would naturally contain this
information if those rules met the 

I have to make notes on a paper Discovering Calendar-based Temporal
Association Rules not referenced by the authors of the book. I don't
know if it is related or not.
  
I am quite encouraged after reading this. It would seem Apriori sets
And Sequences either has the capability to produce equivalent rules
using a single general algorithm or does not duplicate the
functionality, such as with calendric rules, at all.

Intro
A Framwork for Tempora Data Mining-Notes
A Survey of Temporal Knowledge Discovery Paradigms and Methods-Notes
A Survey of Temporal Knowledge Discovery Paradigms and Methods.pdf
Adding Temporal Semantics to Association Rules-Notes
An Approach To Discovering Temporal Association Rules
An Approach to Discovering Temporal Association Rules-Notes
An Approach to Discovering Temporal Association Rules.PDF
Beyond Intratransaction Association Analysis-Mining Multidimensional Intertransaction Association Rules.pdf
Data Mining Introduction and Advanced Topics-Notes
Detecting Complex Dependencies in Categorical Data-Notes
Detecting Complex Dependencies in Categorical Data.pdf
Discovering Calendar-based Temporal Association Rules.pdf
Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences-Notes
Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences.pdf
Discovering Temporal Association Rules-Algorithms Language and System.pdf
Discovering Temporal Association Rules in Temporal Databases-Notes
Discovering Temporal Patterns in Multiple Granularities-Notes
Discovering frequent episodes in sequences (Extended Abstract).ps
Discovery of Association Rules in Temporal Databases.ps
Discovery of Frequent Episodes in Event Sequences.pdf
Discovery of frequent episodes in event sequences-Notes
Discovery of frequent episodes in event sequences.ps
Efficient Mining of Intertransaction Association Rules-Notes
Mining Inter Transaction Associations with Templates-Notes
Mining Inter Transaction Associations with Templates.pdf
Mining Temporal Features in Association Rules-Notes
On Mining General Temoral Association Rules in a Publicaton Database.pdf
On the Discovery of Interesting Patterns in Association Rules.pdf
Representing Temporal Relationships Between Events and Their Effects.doc
Rule Discovery From Time Series-Notes
Testing Complex Temporal Relationships Involving Multiple Granularities and Its Application to Data Mining (Extended Abstract).pdf
roddick.pdf
roddick.ps

by: Keith A. Pray
Last Modified: July 4, 2004 7:20 AM
© 2004 - 1975 Keith A. Pray.
All rights reserved.

Current Theme: