Apriori Sets And Sequences - Keith's MS Thesis
Apriori Sets And Sequences
About
Code
Performance Data Collection
·
Data Sets
Documents
Results
·
                                          
Printer Friendly Version
Preliminary Notes

Intro ] [ Algorithm-Ideas ] [ Domain-and-Data-Set ] [ Goal ] [ ISP Intro ] [ Plan ] [ approach.tex ]

Up: Documents ]

In order to design such algorithm, we plan to follow the steps below.

  1. June 19th
    Learn about association rule mining.

    We'll start by reading the following two papers:

    • R. Agrawal, T. Imielinski, A. Swami. ``Mining Associations between Sets of Items in Massive Databases'', Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Washington D.C., May 1993, 207-216.

    • R. Agrawal, R. Srikant: ``Fast Algorithms for Mining Association Rules'', Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, Sept. 1994. Expanded version available as IBM Research Report RJ9839, June 1994.

  2. June 19th
    Learn about sequence mining.

    We'll start by reading:

    • R. Srikant, R. Agrawal: ``Mining Sequential Patterns: Generalizations and Performance Improvements'', Proc. of the Fifth Int'l Conference on Extending Database Technology (EDBT), Avignon, France, March 1996. Expanded version available as IBM Research Report RJ 9994, December 1995.

  3. June 26th
    Look at Chris Shoemaker's approach to mine association rules over set-valued data. To some extent, the problem he solved is similar to ours, except that he considers attributes whose values can be sets. In our work those sets can sequences. That is, the order in which the elements appear in the set matters.

    For this part we'll read, Chris Shoemaker's MS thesis.

  4. July 3rd
    Learn about the ARMiner.

  5. July 13th
    Learn about mining temporal databases - look for papers and find out approaches.

  6. July 31st
    Write an algorithm to mine over "sequence-based attributes".

  7. Aug. 10th
    Implement the algorithm reusing parts of the ARMiner and Chris' implementation of the his algorithm.

  8. Aug. 20th
    Experimental evaluation of the algorithm (perhaps using "Keith's dataset"). Find relevant datasets and define an experimental protocol.

  9. Aug. 20th
    (to be developed during the course of the whole term).
    Write a paper summarizing our work on this course.
    We'll use Latex to write the paper. See a very nice Latex template that Christos Faloutsos uses for his class projects at the following URL: www.cs.cmu.edu/~christos/courses/826.S01/proj.html It should serve as a good guide.
-----
Schedule/Topics:

- For June 19th: written summary of the 3 papers.

- Describe the type of patterns that we want to mine for Keith's dataset (system performance data). Those patterns will most likely correlate static parameters with dynamic (time series) parameters. Perhaps one of the uses of those association rules will be to recommend settings/machine-configurations would be appropriate for specific tasks.

by: Keith A. Pray
Last Modified: July 4, 2004 7:01 AM
© 2004 - 1975 Keith A. Pray.
All rights reserved.

Current Theme: