Calculating Confidence

Next: Outline Up: Apriori Sets And Sequences Previous: Level 3 (and up) Contents

Calculating Confidence

Traditionally confidence is defined for a rule A $\Rightarrow$ B as the percentage of instances that contain A that also contain B. This is usually calculated as S(AB) / S(A), where S is support.

Assume a data set that has one time sequence in it; {a:b:a:a:a:a}. Consider a rule A $\Rightarrow$ B where both A and B contain one event item each, a(0:1) and b(2:3) respectively. The event item a begins at time 0 and ends at time 1. The event item b begins at time 2 and ends at time 3. Since there is one instance in our data set and it contains the itemset {A, B} as described, the support of the itemsets {A}, {B}, and {A, B} are 1. If support was used to calculate the confidence of the rule A $\Rightarrow$ B it would be 1. This implies that in the data set from which he rule was mined that 100 percent of the time a appears, b follows. Looking at the time sequence only 20 percent of the time is a followed by b.

Instead of using support use event weight. Confidence of a rule containing event items in both the antecedent and consequent of the rule is defined here as EV(AB) / EV(A), where EV is event weight. The event weight of the itemset {A} is 5, {B} is 1, and {A, B} is 1. The confidence for A $\Rightarrow$ B is 0.2, or 20 percent. Clearly this more accurately represents the data set.

Next: Outline Up: Apriori Sets And Sequences Previous: Level 3 (and up) Contents

Keith A. Pray 2003-06-17