-------
Topics:
-------
1. Running Weka
1.1 Adding to the Weka GUI
1.2 Using the GUI
1.3 Using the Command Line
2. Data Sets Containing Time Sequence Attributes
2.1 Time Sequence Attributes
2.2 Filtering Events
2.3 Event Weight
2.4 Tips and Tricks
3. Detailed Documentation
---------------
1. Running Weka
---------------
--------------------------
1.2 Adding to the Weka GUI
--------------------------
To include a new classifier, filter, association algorithm, etc in
the Weka GUI follow these steps.
1. If there is not a GenericObjectEditor.props file in the same
directory as this Readme.txt file copy the one found in
weka/gui/.
2. To this file add lines to the appropriate section. An example
of such lines can be found in the
AddToGenericObjectEditor.props file also located in the same
directory as this Readme.txt file.
3. If you are adding the WPI Weka add on package to Weka please
add the lines found in AddToGenericObjectEditor.props to the
specified sections in the GenericObjectEditor.props file you
copied to this directory.
-----------------
1.2 Using the GUI
-----------------
To run Weka use the following command line from the directory containing
this Readme.txt file:
java -classpath . weka/gui/GUIChooser
If you run into "out of memory" problems use the -Xmx flag as follows:
java -Xmx1024m -classpath . weka/gui/GUIChooser
The previous line allocates 1024 MB of memory to the java virtual
machine. Choose the amount that works best for you.
To use the Apriori Sets And Sequences algorithm first select the Weka
Explorer interface from the GUI Chooser window.
Under the Preprocess tab in the Explorer interface select "Open File"
and browse to the data set you wish to use.
Note: it is possible to use other data sources besides a file
but I haven't used them so you'd be on your own.
Switch to the Associate tab and click on the Associator algorithm
shown. This will pop a window up showing the current algorithm and
user specified parameters.
Choose AprioriSetsAndSequences from the drop down list.
By clicking the "More" button or hovering the mouse pointer
over each parameter you can get an explanation for each option.
Press "Start" and the system will begin mining for association
rules. In the command line window you started Weka in you'll see
status and messages about the mining progress.
--------------------------
1.3 Using the Command Line
--------------------------
You can also run AprioriSetsAndSequences directly from the command
line. An example is shown below:
java -Xmx1024m -classpath . wpi/associations/AprioriSetsAndSequences -t perfdata-with-increase-decrease-events.arff
The full list of command line options is below. You can get a list
of these options by simply running AprioriSetsAndSequences with no
arguments.
Apriori Sets And Sequences options:
-t
The name of the training file.
-N
The required number of rules. (default = 10)
-C
The minimum confidence of a rule. (default = 0.9)
-D
The delta by which the minimum support is decreased in
each iteration. (default = 0.05)
-U
The upper bound for the minimum support. (default = 1.0)
-M
The lower bound for the minimum support. (default = 0.1)
-A
The set of attributes required to be in the antecedent.
-Y
The set of attributes required to be in the consequent.
-B
The maximum number of attributes in the antecedent. (0 = no maximum)
-E
The minimum number of attributes in the antecedent. (0 = no minimum)
-W
The maximum number of event items of the same type allowed in
a rule. (default = 2)
-X
The maximum number of attributes in the consequent. (0 = no maximum)
-Z
The minimum number of attributes in the consequent. (0 = no minimum)
-F
The file from which to load frequent item sets from
------------------------------------------------
2. Data Sets Containing Time Sequence Attributes
------------------------------------------------
This Section includes useful information for using building and using
data sets containing time sequence attributes.
Please Note:
AprioriSetsAndSequences will run with normal data sets and data sets
containing time sequence events.
It will NOT handle data sets that contain numeric attributes.
It will NOT recognize an event attribute if the attribute name
contains a "=".
----------------------------
2.1 Time Sequence Attributes
----------------------------
In the arff file format Weka uses declare a time sequence attribute
as follows:
@attribute 'my-time-sequence-attribute' string
Note: "=" can NOT be used in the attribute name.
Each value in a sequence is separated by colons ":". An example
value of a time sequence attribute is shown below:
0:1:2:3:4:5:6:7:8:9
The values should be numeric in most cases.
While it is possible to have symbolic sequences the current system
provides filters that recognize events for numeric sequences only.
If you wish to use data consisting of symbolic sequences you must
also provide the events or filters that recognize symbolic events.
You must transform time sequences into Events in order to use the
Apriori Sets And Sequences algorithm. More information on Events
can be found in the next section.
------------------------
2.2 Filtering For Events
------------------------
Provided in the WPI package that currently accompanies the
weka package are the various utilities and functionality
needed to work with time sequences.
This includes a set of filters that can be applied to data sets
containing numeric time sequence attributes. These filters detect
events (Increase, Decrease, and Sustain) which are simply
predetermined sequence patterns.
A new attribute is created for each event type and time sequence
attribute.
A time sequence event attribute is declared in arff as follows:
@attribute 'my-time-sequence-attribute-my-events' string
The begin time and end time of each event is noted,
separated by a colon ":" and put into a set. Each event in the set
is separated by a caret "^". An example follows:
'{1:3^4:6^7:8}'
The event attribute value above contains three events of the
"my-events" type. The events begin and end at time 1 and 3, 4 and
6, and 7 and 8, respectively.
This EventFilter is available from the Weka GUI and can be used
like any other Weka filter. Open the data file you wish to filter
in the Weka GUI, select the EventFilter from the filter drop down
list, specify the options you want to use and Add the filter to
list of filters Weka will apply to the dataset. Press "Apply
Filters".
To apply a filter to your data set using the command line:
java -classpath . wpi/filters/EventFilter -i perfdata.arff -o perfdata-increase-decrease-events.arff -I -D
The above command specifies the input file as perfdata.arff and the
output file as perfdata-increase-decrease-events.arff. -I and -D
specify the events to search for. A complete list of options are below.
Filter options:
-A
Specify the attributes to find events in.
-I
Find increase events.
-D
Find decrease events.
-S
Find sustain events.
-T
Specifies the tolerance for including values in events.
-N
Specifies the minimum required number of values in events.
General options:
-h
Get help on available options.
(use -b -h for help on batch mode.)
-i
The name of the file containing input instances.
If not supplied then instances will be read from stdin.
-o
The name of the file output instances will be written to.
If not supplied then instances will be written to stdout.
-c
The number of the attribute to use as the class.
"first" and "last" are also valid entries.
If not supplied then no class is assigned.
If your time sequence attributes are not numeric or the existing
types of events does not fit your needs you can create your own
filters or provide just the event attributes the data set you use
with Weka. If you provide your own filter you can add it to the
Weka GUI by modifying the GenericObjectEditor.props file contained
in the same directory as this Readme.txt file. Also add your entry
to the AddToGenericObjectEditor.props file to make upgrading to new
Weka versions easier.
-------------------
2.3 Event Weight
-------------------
In addition to support and confidence a new rule metric has been
added. Event weight is the number of times all the events in a rule
appear in the data set. This can be more the number of instances in
the data set. If both the antecedent and the consequent of the rule
have events then the event weight is used to calculate confidence.
-------------------
2.4 Tips and Tricks
-------------------
-Remove sequences from data set before mining for association
rules.
Since the events are used for the actual mining process and
there is very little chance of entire sequences being used in a
rule it saves a lot of mining time to simply remove the
sequences from the data set before mining.
-The events you use are very specific to your domain.
If the events you use have no meaning in your data set's domain
then the rules will have just as much meaning.
------------------------------------------------
3. Detailed Documentation
------------------------------------------------
Detailed documentation on all the WPI Weka code can be found
in the docs directory.
-------------------------------------------------------------------------
This Readme.txt file brought to you by Keith A. Pray. Feel free to
send questions to kap@wpi.edu if you encounter any problems running
AprioriSetsAndSequences or find this Readme.txt file lacking.
Good luck,
Keith
|