Preliminary Notes
[ Intro ]
[ Algorithm-Ideas ]
[ Domain-and-Data-Set ]
[ Goal ]
[ ISP Intro ]
[ Plan ]
[ approach.tex ]
[ Up: Documents ]
The dataset we will be working with is from the domain of computer system performance. This domain has several advantages in addition to providing the type of data we are interested in mining. Data is easily collected on any computer system. Much of the behaviour observed can be readily explained as association rules and is well understood. Example: as workload rises, cpu utilization rises (or something like that, don't have a description of rules yet). Many simple and intuitive associations provide good base test cases for our algorithm. More subtle relationships between attributes are easily tested directly. (maybe an example here) All the time series data for a particular instance follows the same time line. This allows the mining across multiple time series attributes at once. Just as the values in a single time series attribute might be related to each other, the values at a particular time or time interval for several attributes might be correlated. Example: as the number of processes running increases, throughput decreases Notice that both attributes are time series and the association between them may not be accurately described if the time series for each were generalized. It is easy to see that comparing an average or other such summary of each series might not result in finding the rule. Comparing these attributes relative to their common time line is necessary. This common time line characteristic of the data also simplifies comparing time series attributes within an instance. (reference Berndt & Clifford, Finding Patterns in Time Series: A Dynamic Programming Approach) Comparing time series attributes across instances makes it necessary to deal with different time line length effectively. |
|||||
|