Notes On Page 114
There is mention of over fitting.
This made me consider the nature of the data.
Mainly, we have a very small data set.
So how do we take advantage of as much information as possible?
Isn't this exactly what over fitting does? Over fitting is used
to describe when the result is a decline in accuracy for the
resulting model. What if we could control over fitting by
concentrating on the differences between patients rather than
the similarities?
Notes On Page 151
It does not look like any work citing algorithms which detect
outliers, or exceptions. Since the goal is often to find the
few patience which could benefit from a surgical procedure this
might be worth looking at.
"Analysis of the associated confusion matrices show that
prediction dominates for the majority classes and 'IPMN - Benign
or CiS' while under-predicting the remaining values."
I wonder if accuracy could be increased if the change in these
attributes over time were accounted for. It would involve a lot
of data collection. It would seem there may not be enough
pre-diagnosis data for pancreatic cancer patients. Could data
from other cancer patients be used to augment this lack of data?
Notes On Page 176
For predicting survival naive Bayes and Bayesian nets in data
sets A and B performed better than logistic regression via
t-testing which the author claims is a notable result.
Notes On Page 242
Instance clustering and mining association rules were not
considered. Let's do so now.
I expect to be adding more patience to the database.
The author suggests using over-sampling techniques to emphasize
importance of correctly representing minority classes. What if
we built models that only predicted one value of an attribute?
That is have the input be the entire instance but predict just
yes or no for a single value of an attribute rather than predict
which value the instance has.
Notes On Page 245
Here starts a nice glossary of terms I expect to reference
often until I understand the domain's nomenclature.
|