2007-01-29
----------
Notes on Page 71 (first page of paper)
Overall it seems we should try this (DPDR) on our data.
Add to TODO list:
find (write) implementation for Weka,
Ensure SVM in Weka uses RBF kernel
(or something that relies only on Euclidean or cosine
similarity measures)
This paper deals only with feature extraction methods as a
way to reduce the dimensionality of data. It is not about
attribute selection.
Notes on Page 72
The authors refer to "powerful" classifiers such as SVMs but
do not define "powerful".
Notes on Page 73
The data used in this study is largely similar to our own
pancreatic patient data. Both have very few instance and a
large number of attributes. In fact, the data used in this
study has far more attribute per instance than our own.
One thing to note when viewing the results is that these data
sets deal with gene expression levels, that are known to be
string indicating factors in the target attribute. I am
currently not as certain our data is complete with respect to
capturing all the factors determining our target attributes.
One interesting point is that for all the feature extraction
methods used, the target dimensionality is a parameter with the
exception of DPDR. All have the number of instances as one of
the values tried, with the exception of MDS, even though it
certainly looks possible. I wonder why the authors chose to not
have a common point of reference. I did not see mention of
specifying the number of dimensions DPDR results in being used
as a value for this parameter.
Notes on Page 74
The results shown in Table 3 are quite good. See previous note
on confidence of our own pancreatic patient data.
|