U.S. Pat. No. 7,121,946
REAL-TIME HEAD TRACKING SYSTEM FOR COMPUTER GAMES AND OTHER APPLICATIONS
AssigneeCybernet Systems Corporation
Issue DateJune 29, 2001
Illustrative Figure
Abstract
A real-time computer vision system tracks the head of a computer user to implement real-time control of games or other applications. The imaging hardware includes a color camera, frame grabber, and processor. The software consists of the low-level image grabbing software and a tracking algorithm. The system tracks objects based on the color, motion and/or shape of the object in the image. A color matching function is used to compute three measures of the target's probable location based on the target color, shape and motion. The method then computes the most probable location of the target using a weighting technique. Once the system is running, a graphical user interface displays the live image from the color camera on the computer screen. The operator can then use the mouse to select a target for tracking. The system will then keep track of the moving target in the scene in real-time.
Description
DETAILED DESCRIPTION OF THE INVENTION A schematic of the system is shown inFIG. 1. The imaging hardware includes a color camera102and a digitizer. The sequence of images of the scene is then fed to a computer104which runs tracking software according to the invention. The tracking algorithm is independent of the imaging system hardware. The tracking system has a graphical user interface (GUI) to initialize the target and show the tracking result on the screen106. The GUI for the ROTS displays a live color image from the camera on the computer screen. The user can initialize the target manually or automatically. Once initialized, the ROTS will then track the target in real-time. The flow chart of the tracking algorithm is shown inFIG. 2. The program captures live images from the camera and displays them on the screen. It then allows the user to select the target manually using the mouse or automatically by moving the target to a predetermined position in the scene. At the point of initialization, the color, the shape and location of the target are computed and stored. Once the target is initialized, we compute an estimate of the target location using target dynamics. We then compute the actual location using the color, shape and motion information with respect to a region centered at the estimated location. The input to the ROTS is a sequence of color images, preferably in the standard RGB24 format. Hence, the hardware can be a camera with a image grabbing board or a USB camera connected to the USB port of the computer. A preferred GUI is shown inFIG. 3. Tracking using Color, Shape and Motion Once the user clicks on the target in the image, we compute the median color of a small region around this point in the image. This will ...
DETAILED DESCRIPTION OF THE INVENTION
A schematic of the system is shown inFIG. 1. The imaging hardware includes a color camera102and a digitizer. The sequence of images of the scene is then fed to a computer104which runs tracking software according to the invention. The tracking algorithm is independent of the imaging system hardware. The tracking system has a graphical user interface (GUI) to initialize the target and show the tracking result on the screen106.
The GUI for the ROTS displays a live color image from the camera on the computer screen. The user can initialize the target manually or automatically. Once initialized, the ROTS will then track the target in real-time.
The flow chart of the tracking algorithm is shown inFIG. 2. The program captures live images from the camera and displays them on the screen. It then allows the user to select the target manually using the mouse or automatically by moving the target to a predetermined position in the scene. At the point of initialization, the color, the shape and location of the target are computed and stored. Once the target is initialized, we compute an estimate of the target location using target dynamics. We then compute the actual location using the color, shape and motion information with respect to a region centered at the estimated location.
The input to the ROTS is a sequence of color images, preferably in the standard RGB24 format. Hence, the hardware can be a camera with a image grabbing board or a USB camera connected to the USB port of the computer. A preferred GUI is shown inFIG. 3.
Tracking using Color, Shape and Motion
Once the user clicks on the target in the image, we compute the median color of a small region around this point in the image. This will be the color of the target region being tracked in the scene until it is reinitialized. We also store the shape of the target by segmenting the object using its color. Once tracking begins, we compute the center of the target region in the image using a combination of three aspects of the target. The three aspects are the color, the shape and the motion. This results in a very robust tracking system which can withstand a variety of noise, occlusion and rapid motion.
Color Matching
The color of a pixel in a color image is determined by the values of the Red, Green and Blue bytes corresponding to the pixel in the image buffer. This color value will form a point in the three-dimensional RGB color space. When we compute the color of the target, we assume that the target is fairly evenly colored and the illumination stays relatively the same. The color of the target is then the median RGB value of a sample set of pixels constituting the target. When the target moves and the illumination changes the color of the target is likely to change. We use a computationally efficient color matching function which allows us to compute whether a pixel color matches the target color within limits.
When the illumination on the target changes, the intensity of the color will change. This will appear as a movement along the RGB color vector as shown inFIG. 5. In order to account for slight variations in the color, we further allow the point in color space to lie within a small-truncated cone as shown inFIG. 5. The two thresholds will decide the shape of the matching color cone. A threshold on the angle of the color cone and another threshold on the minimum length of the color vector decides the matching color space. Thus, any pixel whose color lies within the truncated cone in color space will be considered as having the same color as the target.
Given a colored pixel, we quantitatively define the match between it and a reference color pixel as follows. Let (R, G, B) be the values of the RGB vector of the first pixel. Let (Rr, Gr, Br) be the RGB vector for the reference color.
d=RRr+GGr+BBrmr=Rr2+Gr2+Br2m=R2+G2+B2dm=dmrda=dmrmColorMatch(R,G,B)={dmdaif((dml<dm<dmh)&(dal<da0)0otherwiseS(i,j,t)=P(i,j,t)M(i,j,t-1)Centershape=[rscs]=[∑1I*JS(i,j,t)*i∑1I*JS(i,j,t)∑1I*JS(i,j,t)*j∑1I*JS(i,j,t)]
The closeness of the shape is a summation of the product of the pixel color match P(i,j) with the target template M(i,j). Note again that the color matching measure is used to weight the shape measure. This makes our algorithm robust to creep. Once the region S is obtained, we can compute the centroid of S. This is the probable location of the target based solely on the shape of the target.
Motion Detection
The algorithm checks for motion in a region near the estimated target position using a motion detecting function. This function computes the difference between the current image and the previous image, which is stored in memory. If motion has occurred, there will be sufficient change in the intensities in the region. The motion detection function will trigger if a sufficient number of pixels change intensity by a certain threshold value. This detection phase eliminates unnecessary computation when the object is stationary.
Position Using Motion
If the motion detection function detects motion, the next step is to locate the target. This is done using the difference image and the target color. When an object moves between frames in a relatively stationary background, the color of the pixels changes between frames near the target (unless the target and the background are of the same color). We compute the color change between frames for pixels near the target location. The pixels whose color changes beyond a threshold make up the difference image. Note that the difference image will have areas, which are complementary. The pixels where the object used to be will complement those pixels where the object is at now. If we separate these pixels using the color of the target, we can compute the new location of the target. The set of pixels in the difference image, which has the color of the target in the new image, will correspond to the leading edge of the target in the new image. If we assume that the shape of the target changes negligibly between frames, we can use the shape of the target from the previous image to compute the position of the center of the target from this difference image.
Let D be the difference sub-image between the previous target and the estimated target location in the new image. If we threshold the difference image, we end up with a binary image. If we intersect this binary image D with the shape of the target in the new image M we get the moving edge of the target as the region V. We then weight this region by the color matching measure P.
D(i,j,t)={1ifP(i,j,t)-P(i,j,t-1)>τm0otherwiseM(i,j,t)={1if(P(i,j,t)>τc)0otherwiseV(i,j,t)=(D(i,j,t)⋂M(i,j,t))*P(i,j,t)Centermotion=[rmcm]=[∑1I*JV(i,j,t)*i∑1I*JV(i,j,t)∑1I*JV(i,j,t)*j∑1I*JV(i,j,t)]
The centroid of the region V is then computed as the probable location of the target based on motion alone. This weighting of the intersection region by the color matching measure makes our tracking less prone to jitter.
In a physically implemented system, the image capture board is capable of providing us with a 480×640-pixel color image at 30 frames per second. Processing such a large image will slow down the program. Fortunately, the nature of the tracking task is such that, only a fraction of the image is of interest. This region called the window of interest lies around the estimated position of the target in the new image. We can compute the location of the target in the new image from the location of the target in the previous image and its dynamics. We have used prediction based on velocity computation between frames. This technique is able to keep track of the target even when the target moves rapidly. We have found that the window of interest is typically one one-hundredth the area of the original image. This speeds up the computation of the new target location considerably.
Tracking Algorithm
If we are given an estimated target location as (rc, cc) in the new image and the size of the area to be searched is given by (rs, cs), then the algorithm can be written in pseudo code as shown inFIG. 6.
Note that the color matching weight c is being used to weight all the three centers. This weighting makes this algorithm smoother and more robust. The velocity computed at the end of the tracking algorithm is used to compute the estimated position of the target in the next frame.
Extensions of the system are possible in accordance with the described algorithm herein. One is a tracking system which can track multiple targets in the same image. Another uses the tracking in two stereo images to track the target in 3D.
As a further extension, the system may be used to track a person's head for use in video games or other real-time computer vision applications. As with the other embodiments described herein, such tracking may be based on color, motion and/or shape of the object in the image. In this particular implementation, the system preferably tracks a person's head, such that when the person moves their head slightly to the left or right, the game responds according (such as looking to the left or right around the corner).
If the person moves their head to a greater degree, a preferred response would be that the person followed by the tracker “jogs” quickly to the left or right. Similar responses would occur if the person moves their head up or down, or looks in a specific direction.
As with the other embodiments described herein, the real-time object tracking system (ROTS) in this case preferably uses the color of the object, its shape and its motion to localize it in the scene. The hardware also includes a color camera, a frame grabber, and a suitable computer or workstation. The software includes the low-level image grabbing software and the tracking algorithm.
The computer game (or games) and camera are installed on the computer, and the camera oriented to face the person as shown inFIG. 7. The software is installed on the computer and started. An interface appears, which includes a GUI (Generic User Interface) and an image from the camera.
Once the ROTS is running, the graphical user interface displays the live image from the color camera on the computer screen. The operator can then use the mouse to click on the head in the image to select a target for tracking. The system will then keep track of the moving target in the scene in real-time. The system is also capable of automatically locking onto the target if it moves in a periodic motion. This eliminates the need for manual clicking on the target. Since the algorithm uses the color, shape and motion of the target, the tracker is robust to noise and rapid motion of the target.
The GUI allows the user to perform the following operations:a. To initiate head tracking by having the person ‘click’ the mouse icon on their own head. Note that they can click on any object in the camera's field of view, and that object will be tracked. This object could be a head, hand, coffee mug, or other object;b. To adjust the left, right, up, down motion boundaries;c. To adjust the velocity threshold for motion;d. To designate desired keystrokes to initiated head motions (left, right, up, down); ande. To initiate the playing of the game.
During the game play, if the user's head exceeds the threshold above the motion velocity, the system will initiate the corresponding key stroke or set of key strokes as commands into the game. The user can turn off and on the initiating of head motion commands; for example, by hitting the num-lock or other non-game related key. The system can also emulate joystick inputs, or any other type of input in the same manner that keystrokes are emulated.
Claims
- A method controlling a computer game, comprising the steps of: imaging a sequence of scenes including the head of a user of the computer;comparing visual characteristics from a portion of a scene to a center of said portion of a scene to determine movement of the user's head within the scene wherein at least one of the visual characteristics is color;providing a weighted average of color to compute the location of the user's head based upon color alone;and controlling the game in accordance with the movements.
- The method of claim 1 , wherein the visual characteristics include shape or location.
- The method of claim 2 , wherein the visual characteristics include a combination of static and dynamic characteristics.
- The method of claim 3 , further including the step of modeling of the dynamic characteristics to yield an estimate of head position.
- The method of claim 1 , further including the step of initiating head tracking through a graphical user interface.
- The method of claim 5 , wherein the graphical user interface provides a bounding box displayed in a screen to assist in targeting the user's head.
- The method of claim 2 , further comprising the step of enabling a match in color despite the differences arising from lighting and shadows.
- The method of claim 2 , further comprising the step of enabling a match in color within a threshold of hue.
- The method of claim 1 , wherein the step of comparing the visual characteristics includes a comparison of pixels from scene to scene.
- The method of claim 1 , further including the step of determining if the user's head was moved outside of the scene.
- The method of claim 1 , further including the step of segmenting a region defined by a predetermined closeness of color as an estimate of target shape.
- The method of claim 1 , further including the step of continuing to track the user's head when moving in front of or behind a similarly colored object in the scene.
Disclaimer: Data collected from the USPTO and may be malformed, incomplete, and/or otherwise inaccurate.