When choosing to do this Major Qualifying Project (MQP), we knew that there would be many large hurdles that we'd have to overcome, and we think that we have overcome most of these hurdles in the first term. One of the larger hurdles is obtaining the materials needed to complete this project. Throughout this paper we'll look at a description of our project, the tasks we've completed to date, and what we plan to do in the future.

Where this picture is, I have no idea

      First, before beginning work on our project, we had to have a clear design as to what our MQP will do. (This requirements specification is available in outline form on the WWW at www.acm.wpi.edu/~cue.) (no longer exists) Our goal is to write a piece of software that will have an easy to use interface, yet be powerful enough to meet the needs of the user. This interface will look something like this:

      This provides ease of use to the user. By having the user select a ball and a pocket for their current shot on the computer model, we then can wait for the shot to be taken, and critique the user. This interface also provides an area where the user can see the exact video input the computer is seeing making setup and troubleshooting easy. By having a space left open, it gives us the ability to add in more features as we, or the test users, see necessary. For example, we can provide a shot replay for the user to see the shot they just took. Or we could provide a motivational tool to the user, or maybe a distraction tool to the user if the process of examining the shot takes longer than we would like.

      Another major requirement that the user has to provide for the software to work is a video image of a pool table. This is another consideration we have to look at. Not everyone has the ability to place a video camera directly above a pool table, so we can’t assume that angle. The software has to be able to calibrate itself so that the computer model of the video image matches the actual pool table. By using distinguishing marks that we can assume are on every pool table, we can successfully calibrate the image. The marks that we could possible look at are those on the bumpers of the table, or the two black dots placed on the table. The only video image requirement is that the entire table is in the frame.

      Next we had to develop a functional spec for the software package. Because there are five people in this team, and because there are many integral parts to this software, it is imperative that we have a clear functional spec for all of us to work from. The decision was made to develop this software on a Windows NT® platform under Visual C++ with the front end using some Java. As mentioned before when talking about the camera calibration, we can’t assume a specific camera placement, but we can assume that the camera will remain in the same place throughout a game. Because of this assumption, we may only have to calibrate the image once per game as a best situation. This calibration requires that the diamonds on the bumpers of the table be visible. With these diamonds, and the two black points on the table, we can determine the perspective of the image, and create a transformation matrix of the image. After this, we can determine what parts of the image we need throughout the game, and what parts are extraneous. Because we know the position of the balls before the start of a game (when calibration would be done), we can determine the color of the felt to create an initial color space. We also have to be able to detect each ball and other objects on the table. Because we know the color space of the felt, we can mask it out, and just have an image of the balls, pockets, and bumpers. We can detect the specific ball by using the nearest neighbor method based on the location of each ball in the RGB color space. Now we have a system where we know what every specific object is on the table. The next task is to determine any events that happen, changes in the image. For example, when any of the balls on the table move, it can be considered an event. We have to detect exactly what happens during an event, and relay those changes to the computer model. Knowing what happens during an event lets us critique the shot if the desired result wasn’t achieved. The software design of the handling of this mass on information can be seen in the following flow charts.

I don't know where this image went....

      Throughout the first term of the project, we have overcome many physical problems of dealing with a "real" system, and have also created some working code to manipulate images to serve our purpose. One of the first problems in tackling a project like this is being able to have a pool table at our disposal for testing purposes. When the project was in its idea stages last year, we knew that this might have been a problem. Some of our options were to ask local billiards establishments for a donation of a used table that they may be replacing, or if they would donate time in their establishment for us to do testing. There were some logistic problems with these ideas. First, we would have to find a very local establishment that would let us use their facilities at a time convenient to them and to five WPI students. This could have been an impossible task, so we decided to try and get a pool table for on campus. Ideally, we would have liked a donation of a table, but we were also exploring options of other organizations on campus to purchase a table for us to use for a year, and them to own after our project. We had decided to explore the free option first, and decided to go with a local company because they may have a vested interest in students on campus versus a chain that couldn’t care less. The only local company we could think of around here was Spencer Billiards. This looked like an ideal solution because they actually manufactured the tables, so their cost would be lower, and they are a family owned local company. After attempts to find a connection between WPI and Spencer Billiards, time was of the essence, and we had to call them. We couldn’t have been luckier. The owners of Spencer Billiards have a child who just started at WPI, and they love to help good causes. They generously donated a brand new pool table. We placed this pool table in a lab on the third floor of Fuller Labs, and have a permanent site for testing. This turned out to be an ideal situation. Within the group, we have exclusive access to a video camera, and the CS department donated a machine for our exclusive use. Finally, we needed a device to convert the video images to the computer. After looking on the web for the hardware we would need, we found a frame grabber made by Data Translations. This frame grabber takes a NTSC video image, and dumps frames of the video into memory. This frame grabber looked promising because as part of the package, came a software development kit which would save us the time of dealing with the data as a came in off the frame grabber, and let the development kit just present us with a stream of data. This frame grabber is part of a gift that WPI is looking to receive from Data Translation. Whereabouts of this is unknown at this time. We hope to have a frame grabber soon.

      Now that we almost have everything we need for the project, we began work on image manipulation. From previous CS classes, members of the group had code that worked with the PCX file format, and because of the ease of use of the file format, we choose to use it for our testing purposes. For this initial image manipulation code, we used the on campus UNIX systems for development because of the lack of a computer at this point. Now that we could handle file I/O on PCX images, we obtained some shots of a pool game we videotaped, and ported some frames of the video to PCX files. With still frames, we began to test our image manipulations.

I don't know where this picture went either

      First we were curious about the amount of noise in the image that we would have to deal with. In order to see the amount of noise, we captured two similar frames (using Adobe Premiere®). These frames had the same shot of the pool table, but were taken at different times. We then proceeded to take the difference of those two frames. This is the image we got after that difference. Black in this image denoted no noise, and the color denotes noise in that color channel. Looking at this image showed us that the images were not that noisy, and we wouldn’t concern ourselves with it at this point.

      Next we wrote code that would perform manipulations on the actual images. One of these manipulations was to rotate an image 90 degrees. This was done by reading in a pixel of the original image, and using a formula to determine where that pixel would be on the rotated image. By creating this new data stream, and changing the PCX header information, a new image was created. Next, we need to be able to detect the edges of an image. To determine the edges, the program originally divided the image into three colorbands: the red, the green, and the blue. For each colorband, we applied the two matrices,

?? and ??

to find the horizontal and vertical edges for each band. Originally, the program then squared each horizontal and vertical edge value for each pixel value in the three color bands, summed them all, and then took the square root to return a single band image showing where the edges. This technique was later modified to return a color image of the edges. This technique unfortunately took roughly 35 seconds for a single 640 x 480 image on a lightly loaded Pentium 133. However, 35 seconds is far too long for a single shot, so ways were found to dramatically speed up the process. The first step was to reduce the size of the image by taking advantage of the fact that each image was actually two fields. By only using one field, there was effectively half as many pixels to calculate, halving the time for detection. This was still too slow to be useful. Sacrificing generality for the sake of speed, the masks were hard coded as additions, instead of floating point multiplication. In addition, rather than worrying about the borders of an image, which required multiple comparisons for each pixel, the image was padded with 0's on the top and the bottom. Several values, which were being calculated one time for each pixel, were moved out of the loop and calculated ahead of time. The series of four nested loops was replaced by one loop, which made the code slightly less readable, but gave a sufficient speed increase to justify this loss. Finally, the squaring and square rooting used before was replaced by simply taking the absolute value. This method was considerably quicker, and the data loss appears to be negligible. Current times with gcc running on the Pentium 133 are as follows:

 

Type of Image

Time with Absolute Value

Time with Square Root

1 Frame (640 x 480 x 3)

4-5 seconds

6-8 seconds

1 Field (640 x 320 x 3)

2-3 seconds

3-4 seconds

Current Version (640 x 480 x 3)

1-2 seconds

N/A

      Ideally, Visual C++ on Windows NT® may be able to slightly optimize these times. That, coupled with the faster 200MHz processor should hopefully reduce the times slightly further. Currently, without any more performance improves in the code, it is not unreasonable to assume that one frame should take roughly three seconds to find all the edges using absolute values, which should be close enough to real-time. In this report we’ll show an example of our edge detection on the next page.

Another missing image!

      The image clearly shows the edges of the four balls, the pool table, the pockets, and the mark where the break is made. We also made test images of shots in motion, and while there is blur associated with them, the edges are still visible.

      Finally, we wanted to be able create our own version of a movie for the replay of frames. Not only did we define what our movie would be, we then wanted to be able to change almost every aspect of the movie at any time. A movie standard was developed for our project. The movie consists of a movie header, and a list of frames that contain the sequential image data. The movie header contains: information that spans all images in the movie, image size, width, color depth, and pointers to the current, first, and last frame in the movie. Each frame contains the data for each image, a timestamp for the image and the frame number. The movie class has many methods for creating, manipulating, and deleting movies. There are several methods for moving to the next, first, and last frame in the movie, returning the frame as an image, and adding a frame between two frames. There are also several methods for returning the frame at time 'n', or frame #m. The idea behind this is that the movie will be completely independent, all changes in the movie will be made by code within the movie. To return a frame in the movie, the method GetFrame can be used, which simply returns the current frame as an image. To walk through the frames in a movie, the methods GotoFirstFrame, GotoLastFrame, GotoNextFrame and GotoPreviousFrame can be used. These methods simply set the current frame to the first, last, current+1, or current-1 frame. Since each frame in the movie contains a timestamp, a method exists to return the frame at timestamp N. This is done by incrementing through the movie until the timestamp of the frame is just below the timestamp N. Similarly a method exists that simply returns the image at frame number M. In order to modify a movie, methods exist for inserting and deleting frames in movies. The InsertFrame method inserts a frame between two other frames, and DeleteFrame removes a frame.

      Right now we are currently in the process of porting our code onto the Windows NT® platform under Visual C++. Once this is done, the development will all be done under this environment. Currently all of our image manipulation code has been ported to Visual C++. Right now the code compiles without error, and a testing application is being developed, and will be completed after A-Term break.

      Our next term will be busy. Throughout the second term of our project we will work on code that will be able to detect all of the distinct objects on the pool table, and that will be able to detect events that happen. A lot of this is dependent upon the time frame of the receipt of the frame grabber. If we experience delays because of hardware, we will focus mainly on the fine development and implementation of the user interface. This next term will be very important.