Keith A. Pray : Home : Academic : Machine Learning : Concept Learning, Ordering : 2.4

Project 1: Concept Learning, Ordering

[ Intro ] [ 2.1 ] [ 2.2 ] [ 2.4 ] [ 2.5 ] [ 2.7 ] [ 2.8 ] [ 2.9 ]

[ Up: Machine Learning ]

Exercise 2.4

Consider the instance space consisting of integer points in the x, y plane and the set of hypotheses H consisting of rectangles. More precisely, hypotheses are of the form a <= x <= b, c <= y <= d, where a, b, c, and d can be any integers.
1. Consider the version space with respect to the set of positive ( + ) and negative ( - ) training examples shown below. What is the S boundary of the version space in this case? Write out the hypotheses and draw them in on the diagram.
  
  Please see attached sheet for drawing.
  S:
  
  {< 4 <= x <= 6, 3 <= y <= 5 >}
  
  This assumes that a rectangle is at minimum 1 x 1. This is also assuming that generalization or specification is the decreasing or increasing in value of a, b, c, or d.
2. What is the G boundary of this version space? Write out the hypotheses and draw them in.
  
  Please see attached sheet for drawing.
  G:
  
  {< 3 <= x <= 8, 2 <= y <= 7 > }
  
  My answer was missing { < 2 <= x <= 8, 2 <= y <= 5 > }
3. Suppose the learner may now suggest a new x, y instance and ask the trainer for its classification. Suggest a query guaranteed to reduce the size of the version space, regardless of how the trainer classifies it. Suggest one that will not.
  
  The learner could request ( 7, 4 ) for classification. Actually, any point in ( 4 <= x <= 7, y = 6 ) or ( x = 7, 3 <= y <= 6 ) will work. This is because the points along thesey two lines are between the version space bounds indentified by S and G. Since S and G should converge upon one hypothesis, one must generalize or specialize, respectively.
  By selecting ( 5, 4 ), ( 4, 5 ), ( 5, 5 ), ( 6, 3 ) or ( 6, 4 ) reducing the space should be avoided. Since these points are already included by S, there should be no change in the space. This is a result of the bias imposed by the hypothesis representation.
  
  Note: This only works with my original answer in the previous problem.
Now assume you are a teacher, attempting to teach a particular target concept (e.g., 3 <= x <= 5, 2 <= y <= 9 ). What is the smallest number of training examples you can provide so that the Candidate-Elimination algorithm will perfectly learn the target concept?

For the target concept to be learned exactly, G and S must converge, hence negative and positive examples must be given. One set of opposite points from the concept rectangle should be given as positive examples to force S to include or generalize to the rectangle concept (e.g., ( 3, 9 ) and ( 5, 2 ) ). The other set of opposite corners should be expanded horizontally and vertically by 1 and be given as negative examples to restrict or specialize G to exclude everything but the concept rectangle (e.g., ( 3, 2 )->( 2, 1 ) and ( 5, 9 )->( 6, 10 ) ). This all requires that 4 training examples be given.

by: Keith A. Pray
Last Modified: February 10, 2007 3:02 PM