|
|
Project 1: Concept Learning, Ordering
Exercise 2.5
-
Consider the following sequence of positive and negative training
examples describing the concept "pairs of people who live in the
same house." Each training example describes an ordered
pair of people, with each person described by their sex, hair
color (black, brown, or blonde), height (tall, medium, or short), and
nationality (US, French, German, Irish, Indian, Japanese, or
Portuguese).
Training Examples
|
1
|
+
|
<
< male brown tall US >
< female black short US >
>
|
2
|
+
|
<
< male brown short French >
< female black short US >
>
|
3
|
-
|
<
< female brown tall German >
< female black short Indian >
>
|
4
|
+
|
<
< male brown tall Irish >
< female brown short Irish >
>
|
Consider a hypothesis space defined over these instances, in which
each hypothesis is represented by a pair of 4-tuples, and where
each attribute constraint may be a specific value,
"?," or "Ø," just as in the EnjoySport hypothesis
representation. For example, the hypothesis
<
< male ? tall ? >
< female ? ? Japanese >
>
represents the set of all pairs of people where the first is a
tall male (of any nationality and hair color), and the second
is a Japanese female (of any hair color and height).
-
Provide a hand trace of the
Candidate-Elimination
algorithm learning from the above training examples and
hypothesis language. In particular, show the specific and
general boundaries of the version space after it has processed
the first training example, then the second training
example, etc.
Initial State
S0:
{
<
< Ø Ø Ø Ø >
< Ø Ø Ø Ø >
>
}
|
G0:
{
<
< ? ? ? ? >
< ? ? ? ? >
>
}
|
Training Example #1
S1:
{
<
< male brown tall US >
< female black short US >
>
}
|
G1:
{
<
< ? ? ? ? >
< ? ? ? ? >
>
}
|
Training Example #2
S2:
{
<
< male brown ? ? >
< female black short US >
>
}
|
G2:
{
<
< ? ? ? ? >
< ? ? ? ? >
>
}
|
Training Example #3
S3:
{
<
< male brown ? ? >
< female black short US >
>
}
|
G3:
{
<
< male ? ? ? >
< ? ? ? ? >
>
<
< ? ? ? ? >
< ? ? ? US >
>
}
|
Training Example #4
S4:
{
<
< male brown ? ? >
< female ? short ? >
>
}
|
G4:
{
<
< male ? ? ? >
< ? ? ? ? >
>
}
|
-
How many distinct hypotheses from the given hypothesis space are
consistent with the following single positive training example?
+ << male black short Portuguese >
< female blonde tall Indian >>
Each hypothesis consistent with the above example can have either
the specified value seen above or "?" for each attribute.
That makes 2 per attribute. There are 8 attributes.
2 8 = 256
-
Assume the learner has encountered only the positive example from part
(b), and that it is now allowed to query the trainer by
generating any instance and asking the trainer to classify it.
-
Give a specific sequence of queries that assures the learner
will converge to the single correct hypothesis, whatever it may
be (assuming that the target concept is describable within the given
hypothesis language).
Give the shortest sequence of queries you can find.
Let's concentrate on S. S is currently:
S 1:
<
< male black short Portuguese >
< female blonde tall Indian >
>
Query 2 (considering the first example as # 1)
<
< female brown medium US >
< male black medium US >
>
S 2:
<
< (? or male) (? or black) (? or short) (? or Portuguese) >
< (? or female) (? or blonde) (? or tall) (? or Indian) >
>
Please note that the sex attribute in the hypothesis is determined.
There are no more values it could assume to force S to
generalize that attribute, if it hasn't already.
Query 3
<
< done blonde tall French >
< done brown short French >
>
S 3:
<
< done (? or black) (? or short) (? or Portuguese) >
< done (? or blonde) (? or tall) (? or Indian) >
>
Please note that the hair color and height attributes in the
hypothesis are determined.
There are no more values they could assume to force S to
generalize those attributes, if they haven't already.
Query 4
<
< done done done German >
< done done done German >
>
This goes on until all the nationalities for each person has
been tried (3 more queries). This makes for total of 7 queries.
-
How does the length of this sequence relate to your answer to
question (b)?
For each query the hypothesis space was reduced by half. That is
except for the second query where both the hair color attribute
and the height attribute were guaranteed convergence. This
is consistent with the 2 8 calculation for the number
of total hypotheses consistent with the original training example,
or any positive training example for that matter.
Please note that while G was not examined closely, since
the attributes were generalized to "?" and converged with G
or took on the value from the first training example and the
inconsistent hypotheses in G were removed and replaced by
those more specific. This eventually leads to S and
G converging.
-
Note that this hypothesis language cannot express all concepts that
can be defined over the instances (i.e., we can define sets of
positive and negative examples for which there is no corresponding
describable hypothesis). If we were to enrich the language so
that it could express all concepts that can be defined
over the instance language, then how would you answer to
(c) change?
Instead of checking just 2 possible values for attributes in the
hypotheses we would have to check for every combination of values
over all the possible values in the instance space. There would be
a lot more work, that's for sure...
Note: I should really put exactly how many here.
|
|