Apriori Sets And Sequences
|
|
|
|
|
Performance Data Collection
|
|
|
|
|
|
|
|
Preliminary Notes
In order to design such algorithm, we plan to follow
the steps below.
-
June 19th
Learn about association rule mining.
We'll start by reading the following two papers:
-
R. Agrawal, T. Imielinski, A. Swami.
``Mining Associations between Sets of Items in Massive
Databases'', Proc. of the ACM SIGMOD Int'l Conference on
Management of Data, Washington D.C., May 1993, 207-216.
-
R. Agrawal, R. Srikant: ``Fast Algorithms for Mining
Association Rules'', Proc. of the 20th Int'l Conference on
Very Large Databases, Santiago, Chile, Sept. 1994.
Expanded version available as IBM Research Report RJ9839,
June 1994.
-
June 19th
Learn about sequence mining.
We'll start by reading:
-
R. Srikant, R. Agrawal: ``Mining Sequential Patterns:
Generalizations and Performance Improvements'', Proc. of
the Fifth Int'l Conference on Extending Database Technology (EDBT),
Avignon, France, March 1996.
Expanded version available as IBM Research Report RJ 9994,
December 1995.
-
June 26th
Look at Chris Shoemaker's approach to mine association
rules over set-valued data. To some extent, the problem
he solved is similar to ours, except that he considers
attributes whose values can be sets. In our work those
sets can sequences. That is, the order in which the
elements appear in the set matters.
For this part we'll read, Chris Shoemaker's MS thesis.
-
July 3rd
Learn about the ARMiner.
-
July 13th
Learn about mining temporal databases - look for papers
and find out approaches.
-
July 31st
Write an algorithm to mine over "sequence-based attributes".
-
Aug. 10th
Implement the algorithm reusing parts of the ARMiner and
Chris' implementation of the his algorithm.
-
Aug. 20th
Experimental evaluation of the algorithm (perhaps using
"Keith's dataset"). Find relevant datasets and define
an experimental protocol.
-
Aug. 20th
(to be developed during the course of the whole term).
Write a paper summarizing our work on this course.
We'll use Latex to write the paper. See a very nice Latex
template that Christos Faloutsos uses for his class projects
at the following URL:
www.cs.cmu.edu/~christos/courses/826.S01/proj.html
It should serve as a good guide.
-----
Schedule/Topics:
- For June 19th: written summary of the 3 papers.
- Describe the type of patterns that we want to
mine for Keith's dataset (system performance
data). Those patterns will
most likely correlate static parameters with
dynamic (time series) parameters. Perhaps one
of the uses of those association rules will be
to recommend settings/machine-configurations
would be appropriate for specific tasks.
|
|