Ph.D. Research : Ph.D. Home : Documents : References : Hayward Thesis Notes

Keith's Ph.D. Research

·	Ph.D. Home
·	Keith's Professional Site

About

·	People
·	WPI CS KDDRG
·	WPI CS Department

Documents

·	Presentations
·	References
·	To Do List
·	More...

Experiments

·	Data
·	Results
·	More...

Printer Friendly Version

References

[ Intro ] [ Hayward Thesis.pdf ] [ Hayward Thesis Notes ] [ Matrices Vector Spaces And Information Retrieval Notes ]

[ Up: Documents ]

Notes On Page 114 

There is mention of over fitting. 
This made me consider the nature of the data. 
Mainly, we have a very small data set. 
So how do we take advantage of as much information as possible? 
Isn't this exactly what over fitting does? Over fitting is used 
to describe when the result is a decline in accuracy for the 
resulting model. What if we could control over fitting by 
concentrating on the differences between patients rather than 
the similarities?

Notes On Page 151

It does not look like any work citing algorithms which detect 
outliers, or exceptions. Since the goal is often to find the 
few patience which could benefit from a surgical procedure this 
might be worth looking at.

"Analysis of the associated confusion matrices show that 
prediction dominates for the majority classes and 'IPMN - Benign 
or CiS' while under-predicting the remaining values."

I wonder if accuracy could be increased if the change in these 
attributes over time were accounted for. It would involve a lot 
of data collection. It would seem there may not be enough 
pre-diagnosis data for pancreatic cancer patients. Could data 
from other cancer patients be used to augment this lack of data?

Notes On Page 176

For predicting survival naive Bayes and Bayesian nets in data 
sets A and B performed better than logistic regression via 
t-testing which the author claims is a notable result.

Notes On Page 242

Instance clustering and mining association rules were not 
considered. Let's do so now.

I expect to be adding more patience to the database.

The author suggests using over-sampling techniques to emphasize 
importance of correctly representing minority classes. What if 
we built models that only predicted one value of an attribute? 
That is have the input be the entire instance but predict just 
yes or no for a single value of an attribute rather than predict 
which value the instance has.

Notes On Page 245

Here starts a nice glossary of terms I expect to reference 
often until I understand the domain's nomenclature.

Intro

A Comparison Of Unsupervised Dimension Reduction Algorithms For Classification Notes

Beck AT 1961.pdf

Carolina Ruiz Bagging Boosting.ppt

Distance Preserving Dimension Reduction For Manifold Learning Notes

FeatureSelectionForMachineLearning-ComparingACorrelation-basedFilterApproachToTheWrapper.pdf

Hayward Thesis.pdf

Hayward Thesis Notes

Matrices Vector Spaces And Information Retrieval Notes

Shivin Misra thesis final.pdf

Stuart Floyd MS Thesis Final.pdf

Stuart Floyd Thesis Presentation.pdf