wpi.filters
Class AttributeSetFilter

java.lang.Object
  extended byweka.filters.Filter
      extended bywpi.filters.AttributeSetFilter
All Implemented Interfaces:
weka.core.OptionHandler, java.io.Serializable

public class AttributeSetFilter
extends weka.filters.Filter
implements weka.core.OptionHandler

This filter converts a string attribute to a set of yes & no nominal attributes. The new attributes will be created for every unique entry separated by delimiter in the string. Each of the new attributes will be named as 'attributeName-entryValue'.
(Note: The string attribute with no delimiter in any of its instance is not affected.)

Valid options are:

-D delimiters Specify the delimiters of the set. (default "|")
-N nominal_values Specify the nominal values used to classify an attribute. Valid values are "yesno", "truefalse", and "onezero". (default "yesno")
-R range Specify the range of filtering. e.g. 'first-last', or '1-2,6,8-last'. (default "all")
-I Use this flag to inverse the range.

Version:
$Revision: 0.1 $
Author:
Takeshi Kawato (takeshi@wpi.edu)
See Also:
Serialized Form

Field Summary
protected  java.lang.String m_delimiters
          Stores the delimeter(s) used for separating entries in a set.
protected  int m_hashCapacity
          Stores the initial hash capacity for a hash used internally.
protected  weka.core.Instances m_InputFormat
          Stores the input format.
protected  java.lang.String m_nom_no
          Stores the nominal value for an instance without an entry.
protected  int m_nom_values
          The integer defining current nominal values.
protected  java.lang.String m_nom_yes
          Stores the nominal value for an instance with an entry.
protected  weka.core.Instances m_OutputFormat
          Stores the output format.
protected  java.lang.String m_range
          Stores the range of filtering as a string.
protected  boolean m_rangeInverse
          Stores whether the range of filtering should be inversed, or not.
protected  weka.core.FastVector m_SetFormat
          Stores the set format.
e.g.
protected  java.lang.String NONSET
          The string defining a non-set.
static int ONEZERO
          The static integer used to define nominal values of 1 and 0.
protected  java.lang.String SET
          The string definining a set.
static weka.core.Tag[] TAGS_NOMINAL_VALUES
          Define possible nominal values.
static int TRUEFALSE
          The static integer used to define nominal values of true and false.
static int YESNO
          The static integer used to define nominal values of yes and no.
 
Fields inherited from class weka.filters.Filter
m_NewBatch
 
Constructor Summary
AttributeSetFilter()
           
 
Method Summary
 boolean batchFinished()
          Signify that this batch of input to the filter is finished.
protected  void convertInstance(weka.core.Instance instance)
          Convert an instance to the output format.
protected  void convertInstanceFormat(weka.core.Instances instanceFormat)
          Convert an instance format to output instance format.
protected  java.util.Hashtable createHash(weka.core.Attribute attribute)
          Creates a hash of unique entries in a set attribute.
 java.lang.String delimitersTipText()
          Returns the tip text for this property.
protected  boolean entryIsInSet(java.lang.String entry, java.lang.String set)
          Tests whether the entry is in the given set.
 java.lang.String getDelimiters()
          Get the current delimeter(s).
 java.lang.String getInsideBracket(char startBracket, char endBracket, java.lang.String text)
          Returns the string inside matching brackets.
 weka.core.SelectedTag getNominalValues()
          Get the current nominal values used to classify an attribute.
 java.lang.String[] getOptions()
          Gets the current settings of the filter.
 java.lang.String getRange()
          Get the current range of filtering.
 boolean getRangeInverse()
          Get whether the current range of filtering is to be inversed or not.
 java.lang.String globalInfo()
          Returns a string describing this filter
 boolean input(weka.core.Instance instance)
          Input an instance for filtering.
protected  boolean isAttributeSet(weka.core.Attribute attribute)
          Tests whether the attribute is a set.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options
static void main(java.lang.String[] argv)
          Main method for testing this class.
 java.lang.String nominalValuesTipText()
          Returns the tip text for this property.
 java.lang.String rangeInverseTipText()
          Returns the tip text for this property.
 java.lang.String rangeTipText()
          Returns the tip text for this property.
 void setDelimiters(java.lang.String delimiters)
          Set the delimiters of the set.
 boolean setInputFormat(weka.core.Instances instanceInfo)
          Sets the format of the input instances.
 void setNominalValues(weka.core.SelectedTag nominal_values)
          Set the nominal values used to classify an attribute.
 void setOptions(java.lang.String[] options)
          Parses a given list of options controlling the behaviour of this object.
 void setRange(java.lang.String range)
          Set the range of filtering.
Note: This filter will ignore any non-set attribute (The numeric, nominal, or string attribute with no delimeter in any of its instances).
 void setRangeInverse(boolean rangeInverse)
          Inverses the range of filtering.
 
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyStringValues, copyStringValues, filterFile, flushInput, getInputFormat, getInputStringIndex, getOutputFormat, getOutputStringIndex, getStringIndices, inputFormat, isOutputFormatDefined, numPendingOutput, output, outputFormat, outputFormatPeek, outputPeek, push, resetQueue, setOutputFormat, useFilter
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

YESNO

public static int YESNO
The static integer used to define nominal values of yes and no.


TRUEFALSE

public static int TRUEFALSE
The static integer used to define nominal values of true and false.


ONEZERO

public static int ONEZERO
The static integer used to define nominal values of 1 and 0.


m_nom_values

protected int m_nom_values
The integer defining current nominal values.


SET

protected java.lang.String SET
The string definining a set.


NONSET

protected java.lang.String NONSET
The string defining a non-set.


m_InputFormat

protected weka.core.Instances m_InputFormat
Stores the input format.


m_OutputFormat

protected weka.core.Instances m_OutputFormat
Stores the output format.


m_SetFormat

protected weka.core.FastVector m_SetFormat
Stores the set format.
e.g. [SET][entry][entry][entry][NONSET][SET][entry]


m_nom_yes

protected java.lang.String m_nom_yes
Stores the nominal value for an instance with an entry.


m_nom_no

protected java.lang.String m_nom_no
Stores the nominal value for an instance without an entry.


m_delimiters

protected java.lang.String m_delimiters
Stores the delimeter(s) used for separating entries in a set.


m_range

protected java.lang.String m_range
Stores the range of filtering as a string.


m_rangeInverse

protected boolean m_rangeInverse
Stores whether the range of filtering should be inversed, or not.


m_hashCapacity

protected int m_hashCapacity
Stores the initial hash capacity for a hash used internally.


TAGS_NOMINAL_VALUES

public static final weka.core.Tag[] TAGS_NOMINAL_VALUES
Define possible nominal values.

Constructor Detail

AttributeSetFilter

public AttributeSetFilter()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this filter

Returns:
a description of the filter suitable for displaying in the explorer/experimenter gui.

setInputFormat

public boolean setInputFormat(weka.core.Instances instanceInfo)
                       throws java.lang.Exception
Sets the format of the input instances.

Parameters:
instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
Returns:
true if the outputFormat may be collected immediately.
Throws:
java.lang.Exception - if the inputFormat can't be set successfully.

input

public boolean input(weka.core.Instance instance)
              throws java.lang.Exception
Input an instance for filtering. This filter requires all instances be read before producing output.

Parameters:
instance - the input instance
Returns:
true if the filtered instance may now be collected with output().
Throws:
java.lang.IllegalStateException - if no input format has been defined.
java.lang.Exception - if the input instance was not of the correct format or if there was a problem with the filtering.

batchFinished

public boolean batchFinished()
                      throws java.lang.Exception
Signify that this batch of input to the filter is finished. If output() may now be called to retrieve the filtered instances.

Returns:
true if there are instances pending output.
Throws:
java.lang.NullPointerException - if no input structure has been defined,
java.lang.Exception - if there was a problem finishing the batch.

delimitersTipText

public java.lang.String delimitersTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui.

setDelimiters

public void setDelimiters(java.lang.String delimiters)
                   throws java.lang.Exception
Set the delimiters of the set. Each character in the field will be treated as a delimeter.

Throws:
java.lang.Exception - if an invalid delimeter is supplied.

getDelimiters

public java.lang.String getDelimiters()
Get the current delimeter(s).

Returns:
a string containing the current delimeter(s).

nominalValuesTipText

public java.lang.String nominalValuesTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui.

setNominalValues

public void setNominalValues(weka.core.SelectedTag nominal_values)
Set the nominal values used to classify an attribute.

Parameters:
nominal_values - the nominal values used to classify an attribute.

getNominalValues

public weka.core.SelectedTag getNominalValues()
Get the current nominal values used to classify an attribute.

Returns:
a selected nominal values.

rangeTipText

public java.lang.String rangeTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui.

setRange

public void setRange(java.lang.String range)
              throws java.lang.Exception
Set the range of filtering.
Note: This filter will ignore any non-set attribute (The numeric, nominal, or string attribute with no delimeter in any of its instances). One does not have to specify specific range except for extraneous cases (Cases where some instances of a non-set string attribute containing a delimeter).

Parameters:
range - a string representing the range. Since the string will typically come from a user, attributes are indexed from 1.
e.g. 'first-last', or '1-2,6,8-last'.
Throws:
java.lang.Exception - if an invalid range is supplied

getRange

public java.lang.String getRange()
Get the current range of filtering.

Returns:
a string representing the current range.

rangeInverseTipText

public java.lang.String rangeInverseTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui.

setRangeInverse

public void setRangeInverse(boolean rangeInverse)
Inverses the range of filtering.

Parameters:
rangeInverse - true, if the range is to be inversed. false, otherwise.

getRangeInverse

public boolean getRangeInverse()
Get whether the current range of filtering is to be inversed or not.

Returns:
true, if the range is to be inversed. false, otherwise.

convertInstanceFormat

protected void convertInstanceFormat(weka.core.Instances instanceFormat)
Convert an instance format to output instance format.

Parameters:
instanceFormat - the instance format to convert.

convertInstance

protected void convertInstance(weka.core.Instance instance)
Convert an instance to the output format. The converted instance is added to the end of the output queue.

Parameters:
instance - the instance to convert.

isAttributeSet

protected boolean isAttributeSet(weka.core.Attribute attribute)
Tests whether the attribute is a set.

Parameters:
attribute - the attribute to test.
Returns:
true, if the attribute is a set. false, otherwise.

createHash

protected java.util.Hashtable createHash(weka.core.Attribute attribute)
Creates a hash of unique entries in a set attribute.

Parameters:
attribute - the attribute to create hash from.
Returns:
a Hashtable holding unique entries (key as an attribute name for the unique entry and value as an entry value itself).

entryIsInSet

protected boolean entryIsInSet(java.lang.String entry,
                               java.lang.String set)
Tests whether the entry is in the given set.

Parameters:
entry - the entry to test.
set - the set to test.
Returns:
true, if the entry is in the set. false, otherwise.

getInsideBracket

public java.lang.String getInsideBracket(char startBracket,
                                         char endBracket,
                                         java.lang.String text)
Returns the string inside matching brackets. This method assumes that the string starts with a starting bracket.

Parameters:
startBracket - the starting bracket
endBracket - the ending bracket
text - the text to examine
Returns:
the string inside matching brackets.

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options

Specified by:
listOptions in interface weka.core.OptionHandler
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options controlling the behaviour of this object. Valid options are:

-D delimiters Specify the delimiters of the set. (default "|")
-N nominal_values Specify the nominal values used to classify an attribute. Valid values are "yesno", "truefalse", and "onezero". (default "yesno")
-R range Specify the range of filtering. e.g. 'first-last', or '1-2,6,8-last'. (default "all") -I Use this flag to inverse the range.

Specified by:
setOptions in interface weka.core.OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the filter.

Specified by:
getOptions in interface weka.core.OptionHandler
Returns:
an array of strings suitable for passing to setOptions

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - should contain arguments to the filter: use -h for help