Project 1: Concept Learning, Ordering

Intro ] [ 2.1 ] [ 2.2 ] [ 2.4 ] [ 2.5 ] [ 2.7 ] [ 2.8 ] [ 2.9 ]

Up: Machine Learning ]

Exercise 2.2


  1. Give the sequence of S and G boundary sets computed by the Candidate-Elimination algorithm if it is given the sequence of training examples from Table 2.1 in reverse order.

    Table 2.1: Training Examples
    Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
    1 Sunny Warm Normal Strong Warm Same Yes
    2 Sunny Warm High Strong Warm Same Yes
    3 Rainy Cold High Strong Warm Change No
    4 Sunny Warm High Strong Cool Change Yes

    Initial State

    S0:
    {< Ø, Ø, Ø, Ø, Ø, Ø >}
    G0:
    {< ?, ?, ?, ?, ?, ? >}

    Training Example #4

    S1:
    {< Sunny, Warm, High, Strong, Cool, Change >}
    G1:
    {< ?, ?, ?, ?, ?, ? >}

    Training Example #3

    S2:
    {< Sunny, Warm, High, Strong, Cool, Change >}
    G2:
    {< Sunny, ?, ?, ?, ?, ? > < ?, Warm, ?, ?, ?, ? > < ?, ?, ?, ?, Cool, ? >}

    Training Example #2

    S3:
    {< Sunny, Warm, High, Strong, ?, ? >}
    G3:
    {< Sunny, ?, ?, ?, ?, ? > < ?, Warm, ?, ?, ?, ? >}

    Training Example #1

    S4:
    {< Sunny, Warm, ?, Strong, ?, ? >}
    G4:
    {< Sunny, ?, ?, ?, ?, ? > < ?, Warm, ?, ?, ?, ? >}


  2. Although the final version space will be the same regardless of the sequence of examples (why?), the sets S and G computed at intermediate stages will, of course, depend on this sequence.

    The final version space is the same because it contains all the hypotheses consistent with the set of training examples used. Since there is only one set of hypotheses which are consistent with any given set of training examples, the algorithm must arrive at that one set regardless of the order.
  3. Can you come up with ideas for ordering the training examples to minimize the sum of the sizes of these intermediate S and G sets for the H used in the EnjoySport example.

    The sum for the current order of training examples above is 14, counting the initial state. The minimum this sum can be for any set of training examples is:

    2 * ( # examples + 1 )

    For our example set size, the minimum would be 10. Since the hypothesis representation in use can only generalize an attribute with ?, there can be only one hypothesis in S at any one time. Hypotheses never get added to S since the specific hypothesis space relies on conjunctive expressions. In other words, generalizing S cannot be done by using an or condition. Consider an attribute A with possible values b, c, and d. If b and c were consistent with all positive training examples but d was not, the hypothesis in S must represent A as ? instead of having two hypotheses, one specifying b and the other c. This is the natural bias of the representation which allows classification of instances not previously encountered.

    This leads to considering how to minimize G. The only time hypotheses are added to G is when a negative example is presented, as stated in the algorithm 1. Therefore, a general strategy would be to present all positive training examples before negative training examples.

    Following this strategy, we arrive at a sum of 11. 11 is the minimum for this set of training examples since in the final state G contains 2 hypotheses, one more than the absolute minimum calculated above.

1 Table 2.5, pp. 33, Machine Learning, Tom M. Mitchell

 

by: Keith A. Pray
Last Modified: August 14, 2004 9:56 PM
© 2004 - 1975 Keith A. Pray.
All rights reserved.