Apriori Sets And Sequences - Keith's MS Thesis
Apriori Sets And Sequences
About
Code
Performance Data Collection
·
Data Sets
Documents
Results
·
                                          
Printer Friendly Version
Thesis Report Latex

[ Intro ] Fast Algorithms for Mining Association Rules.tex ] Mining Association Rules between Sets of Items in Large Databases.tex ]
Mining Association Rules from Set Valued Data.tex ] abstract.tex ] algorithmic.sty ]
appendix-log-metrics.tex ] appendix-performance-data.tex ] appendix-performance-metrics.tex ]
appendix-readme.tex ] appendix-sleep-data.tex ] appendix-stock-data.tex ]
asas-1-candidate.tex ] asas-1-support.tex ] asas-2-candidate.tex ]
asas-2-support.tex ] asas-3-candidate.tex ] asas-algorithm.tex ]
asas-confidence.tex ] asas-details.tex ] asas-duplicate-item-sets.tex ]
asas-general.tex ] asas-implementation-filters.tex ] asas-implementation-other-features.tex ]
asas-implementation-rule-generation.tex ] asas-implementation-support-counting-prune.tex ] asas-implementation.tex ]
asas-input.tex ] asas-multiple-events.tex ] asas.bib ]
asas.tex ] background-apriori-sets.tex ] background-apriori.tex ]
background-association-rules.tex ] background-pattern-matching.tex ] background-weka-arff.tex ]
background.tex ] conclusions-future-work.tex ] contribution.tex ]
data-representation.tex ] event-attributes.tex ] experimental-evaluation.tex ]
future-work.tex ] intro-context.tex ] intro-definition.tex ]
intro-motivation.tex ] intro.tex ] itemset-data-structures.tex ]
kap.bib ] perf-experiments-data-collection.tex ] perf-experiments.tex ]
performance-evaluation.tex ] related-work.tex ] sleep-experiments-old.tex ]
sleep-experiments.tex ] stock-experiments.tex ] thesis-2-2column.tex ]
thesis.aux ] thesis.bbl ] thesis.blg ]
thesis.dvi ] thesis.lof ] thesis.log ]
thesis.lot ] thesis.pdf ] thesis.ps ]
thesis.tex ] thesis.toc ]

ASAS Thesis ] Solution ] WPI-CSGSO-thesis-template ]
algorithm-example ] figures ] financial-events ]
old ] other-bib-files ] Up: Documents ]

-------
Topics:
-------

1. Running Weka

   1.1 Adding to the Weka GUI

   1.2 Using the GUI

   1.3 Using the Command Line

2. Data Sets Containing Time Sequence Attributes

   2.1 Time Sequence Attributes

   2.2 Filtering Events

   2.3 Event Weight

   2.4 Tips and Tricks

3. Detailed Documentation

---------------
1. Running Weka
---------------

   --------------------------
   1.2 Adding to the Weka GUI
   --------------------------

   To include a new classifier, filter, association algorithm, etc in
   the Weka GUI follow these steps.

      1. If there is not a GenericObjectEditor.props file in the same
         directory as this Readme.txt file copy the one found in
	 weka/gui/. 

      2. To this file add lines to the appropriate section. An example
         of such lines can be found in the
         AddToGenericObjectEditor.props file also located in the same
         directory as this Readme.txt file.

      3. If you are adding the WPI Weka add on package to Weka please
         add the lines found in AddToGenericObjectEditor.props to the
         specified sections in the GenericObjectEditor.props file you
         copied to this directory.

   -----------------
   1.2 Using the GUI
   -----------------

   To run Weka use the following command line from the directory containing
   this Readme.txt file:

      java -classpath . weka/gui/GUIChooser

   If you run into "out of memory" problems use the -Xmx flag as follows:

      java -Xmx1024m -classpath . weka/gui/GUIChooser

   The previous line allocates 1024 MB of memory to the java virtual
   machine. Choose the amount that works best for you.

   To use the Apriori Sets And Sequences algorithm first select the Weka
   Explorer interface from the GUI Chooser window.

   Under the Preprocess tab in the Explorer interface select "Open File"
   and browse to the data set you wish to use. 

       Note: it is possible to use other data sources besides a file
       but I haven't used them so you'd be on your own. 

   Switch to the Associate tab and click on the Associator algorithm
   shown. This will pop a window up showing the current algorithm and
   user specified parameters.

   Choose AprioriSetsAndSequences from the drop down list. 
   By clicking the "More" button or hovering the mouse pointer
   over each parameter you can get an explanation for each option.

   Press "Start" and the system will begin mining for association
   rules. In the command line window you started Weka in you'll see
   status and messages about the mining progress.

   --------------------------
   1.3 Using the Command Line
   --------------------------

   You can also run AprioriSetsAndSequences directly from the command
   line. An example is shown below:

      java -Xmx1024m -classpath . wpi/associations/AprioriSetsAndSequences -t perfdata-with-increase-decrease-events.arff

   The full list of command line options is below. You can get a list
   of these options by simply running AprioriSetsAndSequences with no
   arguments.

   Apriori Sets And Sequences options:

   -t 
        The name of the training file.
   -N 
        The required number of rules. (default = 10)
   -C 
        The minimum confidence of a rule. (default = 0.9)
   -D 
        The delta by which the minimum support is decreased in
        each iteration. (default = 0.05)
   -U 
        The upper bound for the minimum support. (default = 1.0)
   -M 
        The lower bound for the minimum support. (default = 0.1)
   -A 
        The set of attributes required to be in the antecedent.
   -Y 
        The set of attributes required to be in the consequent.
   -B 
        The maximum number of attributes in the antecedent. (0 = no maximum)
   -E 
        The minimum number of attributes in the antecedent. (0 = no minimum)
   -W 
        The maximum number of event items of the same type allowed in
        a rule. (default = 2)
   -X 
        The maximum number of attributes in the consequent. (0 = no maximum)
   -Z 
        The minimum number of attributes in the consequent. (0 = no minimum)
   -F 
        The file from which to load frequent item sets from

------------------------------------------------
2. Data Sets Containing Time Sequence Attributes
------------------------------------------------

This Section includes useful information for using building and using
data sets containing time sequence attributes.

Please Note: 

   AprioriSetsAndSequences will run with normal data sets and data sets 
   containing time sequence events. 

   It will NOT handle data sets that contain numeric attributes.

   It will NOT recognize an event attribute if the attribute name
   contains a "=".

   ----------------------------
   2.1 Time Sequence Attributes
   ----------------------------

   In the arff file format Weka uses declare a time sequence attribute
   as follows:

      @attribute 'my-time-sequence-attribute' string   

      Note: "=" can NOT be used in the attribute name.

   Each value in a sequence is separated by colons ":". An example
   value of a time sequence attribute is shown below: 

      0:1:2:3:4:5:6:7:8:9

   The values should be numeric in most cases. 
   While it is possible to have symbolic sequences the current system
   provides filters that recognize events for numeric sequences only.
   If you wish to use data consisting of symbolic sequences you must
   also provide the events or filters that recognize symbolic events.

   You must transform time sequences into Events in order to use the
   Apriori Sets And Sequences algorithm. More information on Events 
   can be found in the next section. 

   ------------------------
   2.2 Filtering For Events
   ------------------------

   Provided in the WPI package that currently accompanies the
   weka package are the various utilities and functionality
   needed to work with time sequences.

   This includes a set of filters that can be applied to data sets
   containing numeric time sequence attributes. These filters detect
   events (Increase, Decrease, and Sustain) which are simply
   predetermined sequence patterns.

   A new attribute is created for each event type and time sequence
   attribute. 

   A time sequence event attribute is declared in arff as follows:

      @attribute 'my-time-sequence-attribute-my-events' string        

   The begin time and end time of each event is noted,
   separated by a colon ":" and put into a set. Each event in the set
   is separated by a caret "^". An example follows: 

      '{1:3^4:6^7:8}'

   The event attribute value above contains three events of the
   "my-events" type. The events begin and end at time 1 and 3, 4 and
   6, and 7 and 8, respectively.

   This EventFilter is available from the Weka GUI and can be used
   like any other Weka filter. Open the data file you wish to filter
   in the Weka GUI, select the EventFilter from the filter drop down
   list, specify the options you want to use and Add the filter to
   list of filters Weka will apply to the dataset. Press "Apply
   Filters".

   To apply a filter to your data set using the command line:

      java -classpath . wpi/filters/EventFilter -i perfdata.arff -o perfdata-increase-decrease-events.arff -I -D

   The above command specifies the input file as perfdata.arff and the
   output file as perfdata-increase-decrease-events.arff. -I and -D
   specify the events to search for. A complete list of options are below.

   Filter options:

   -A 
        Specify the attributes to find events in.
   -I
        Find increase events.
   -D
        Find decrease events.
   -S
        Find sustain events.
   -T 
        Specifies the tolerance for including values in events.
   -N 
        Specifies the minimum required number of values in events.

   General options:

   -h
        Get help on available options.
        (use -b -h for help on batch mode.)
   -i 
        The name of the file containing input instances.
        If not supplied then instances will be read from stdin.
   -o 
        The name of the file output instances will be written to.
        If not supplied then instances will be written to stdout.
   -c 
        The number of the attribute to use as the class.
        "first" and "last" are also valid entries.
        If not supplied then no class is assigned.
      
   If your time sequence attributes are not numeric or the existing
   types of events does not fit your needs you can create your own
   filters or provide just the event attributes the data set you use
   with Weka. If you provide your own filter you can add it to the
   Weka GUI by modifying the GenericObjectEditor.props file contained
   in the same directory as this Readme.txt file. Also add your entry
   to the AddToGenericObjectEditor.props file to make upgrading to new
   Weka versions easier.

   -------------------
   2.3 Event Weight
   -------------------

   In addition to support and confidence a new rule metric has been
   added. Event weight is the number of times all the events in a rule
   appear in the data set. This can be more the number of instances in
   the data set. If both the antecedent and the consequent of the rule
   have events then the event weight is used to calculate confidence.

   -------------------
   2.4 Tips and Tricks
   -------------------

   -Remove sequences from data set before mining for association
    rules. 

       Since the events are used for the actual mining process and
       there is very little chance of entire sequences being used in a
       rule it saves a lot of mining time to simply remove the
       sequences from the data set before mining.

   -The events you use are very specific to your domain.
 
      If the events you use have no meaning in your data set's domain
      then the rules will have just as much meaning.

------------------------------------------------
3. Detailed Documentation
------------------------------------------------

Detailed documentation on all the WPI Weka code can be found
in the docs directory.

-------------------------------------------------------------------------
This Readme.txt file brought to you by Keith A. Pray. Feel free to
send questions to kap@wpi.edu if you encounter any problems running
AprioriSetsAndSequences or find this Readme.txt file lacking.

Good luck,
Keith

by: Keith A. Pray
Last Modified: July 4, 2004 7:35 AM
© 2004 - 1975 Keith A. Pray.
All rights reserved.

Current Theme: