Keith A. Pray : Home : Academic : Machine Learning : Concept Learning, Ordering : 2.2

Project 1: Concept Learning, Ordering

[ Intro ] [ 2.1 ] [ 2.2 ] [ 2.4 ] [ 2.5 ] [ 2.7 ] [ 2.8 ] [ 2.9 ]

Exercise 2.2

Give the sequence of S and G boundary sets computed by the Candidate-Elimination algorithm if it is given the sequence of training examples from Table 2.1 in reverse order.

Table 2.1: Training Examples
Example	Sky	AirTemp	Humidity	Wind	Water	Forecast	EnjoySport
1	Sunny	Warm	Normal	Strong	Warm	Same	Yes
2	Sunny	Warm	High	Strong	Warm	Same	Yes
3	Rainy	Cold	High	Strong	Warm	Change	No
4	Sunny	Warm	High	Strong	Cool	Change	Yes

Initial State

S₀:

{< Ø, Ø, Ø, Ø, Ø, Ø >}

G₀:

{< ?, ?, ?, ?, ?, ? >}

Training Example #4

S₁:

{< Sunny, Warm, High, Strong, Cool, Change >}

G₁:

{< ?, ?, ?, ?, ?, ? >}

Training Example #3

S₂:

{< Sunny, Warm, High, Strong, Cool, Change >}

G₂:

{< Sunny, ?, ?, ?, ?, ? > < ?, Warm, ?, ?, ?, ? > < ?, ?, ?, ?, Cool, ? >}

Training Example #2

S₃:

{< Sunny, Warm, High, Strong, ?, ? >}

G₃:

{< Sunny, ?, ?, ?, ?, ? > < ?, Warm, ?, ?, ?, ? >}

Training Example #1

S₄:

{< Sunny, Warm, ?, Strong, ?, ? >}

G₄:

{< Sunny, ?, ?, ?, ?, ? > < ?, Warm, ?, ?, ?, ? >}

Although the final version space will be the same regardless of the sequence of examples (why?), the sets S and G computed at intermediate stages will, of course, depend on this sequence.

The final version space is the same because it contains all the hypotheses consistent with the set of training examples used. Since there is only one set of hypotheses which are consistent with any given set of training examples, the algorithm must arrive at that one set regardless of the order.
Can you come up with ideas for ordering the training examples to minimize the sum of the sizes of these intermediate S and G sets for the H used in the EnjoySport example.

The sum for the current order of training examples above is 14, counting the initial state. The minimum this sum can be for any set of training examples is:

2 * ( # examples + 1 )

For our example set size, the minimum would be 10. Since the hypothesis representation in use can only generalize an attribute with ?, there can be only one hypothesis in S at any one time. Hypotheses never get added to S since the specific hypothesis space relies on conjunctive expressions. In other words, generalizing S cannot be done by using an or condition. Consider an attribute A with possible values b, c, and d. If b and c were consistent with all positive training examples but d was not, the hypothesis in S must represent A as ? instead of having two hypotheses, one specifying b and the other c. This is the natural bias of the representation which allows classification of instances not previously encountered.

This leads to considering how to minimize G. The only time hypotheses are added to G is when a negative example is presented, as stated in the algorithm ¹. Therefore, a general strategy would be to present all positive training examples before negative training examples.

Following this strategy, we arrive at a sum of 11. 11 is the minimum for this set of training examples since in the final state G contains 2 hypotheses, one more than the absolute minimum calculated above.

¹ Table 2.5, pp. 33, Machine Learning, Tom M. Mitchell

by: Keith A. Pray
Last Modified: August 14, 2004 9:56 PM