In November 2010 Microsoft released the Xbox Kinect, which is a device that is capable of capturing and interpreting human gestures. Various open source drivers have been developed that allow developers access to the Kinect's capabilities using a standard PC. In an effort to improve the user interaction with its software, CTI conducted a prototyping experiment to explore the gesture capabilities of the Kinect with regard to the Transport and Logistics Industry.
The objectives of the experiment were:
The Kinect hardware consists of three main components:
The infrared transmitter emits many signals across its field of view which are then picked up by the receiver. The data received is then built into a depth map which allows bodies to be tracked. The image illustrates these infrared signals with white dots.
The Kinect also includes an array of microphones that are capable of identifying speech and the position of its source. However, this capability was not explored nor implemented within this prototyping experiment.
A test application was developed and targeted for the following platform:
The Kinect connects to the PC via a USB connection while simultaneously connected to a power source.
A prototype was designed by CTI to visualize aircraft tracking data. The GUI is navigated purely with gestures as input from the Kinect.
In summary, the final architectural platform is illustrated in the following diagram.
Firstly, it was observed that the OpenNI's drivers contained sample programs and code which demonstrated the capabilities of the driver. The sample programs visualised the Kinect's depth and user tracking capabilities. A powerful additional feature was the skeletal modeling. However, the library did not interpret skeletal movements as gestures. This task is left to the application developers.
User skeletons are made up of various joints with their own depth information. With this information, the developers were able to build on top OpenNI's framework and provide code that observed changes in depths of specific joints. Gestures are then identified simply based on specific changes of these joints' positions.
However, the Kinect's hardware limitations have significantly reduced the number trackable joints that can be observed. The Kinect has a small resolution of 640x480 which prevents small detailed joint recognition such as finger skeletal joints. As a result it was decided that gestures were to be designed around the movement of the user's hands, with fingers ignored.
Below is a picture showing the Kinect's interpretation of a hand at a distance of 1-2m. The hand in reality had its fingers spread apart. However, the space between some fingers is not recognized accurately.
Download: Kinect Demonstration Video
Upon start up, the application idles until it tracks a user. When the user is tracked, he/she is required to strike a specified pose to begin the calibration process. The pose is holding both arms up at right angles to the body as illustrated in Figures 8A and 8B.
When the calibration process is completed, the users hand will be visually isolated to indicate the promotion of gesture capabilities. A Cursor-wheel is assigned to each hand which accurately indicates the current position of the user's hand relative to the screen. It was found that this was required to provide feedback to the user.
A following reach gesture towards the TPACTM icon will trigger a selection to enter the application.
Inside the application, the user is able to perform all functions with gestures with no other input devices required.
User interaction is identified with a reach motion towards the camera. This is recognized with the hand's Cursor-wheel's color changing green. This ensures that the user won't accidentally trigger unwanted interaction. Once this reach is recognized the following functions are provided based on movements of one or both hands:
Selection is done with a simple reach towards the screen. The current object underneath the cursor wheel is selected. Scrolling (see Figure 10A) is performed with the movement of only one of the user's reached hands. Zooming in or out (see Figure 10B) is performed with the expansion or retraction (respectively) of the distance between both hands.
Using these gestures, the user is able to navigate through the interface and display various types of information that would be viewable within a real time tracking system.
Legend:
The user may switch view modes using the left most icon. Globe with altitude pins (see Figure 12A) and a flat map view (see Figure 12B).
Aircraft can be filtered by equipment type by selecting the circular arrow icon.
This experiment using the Kinect increased CTI's understanding of modern Human-Computer Interaction. It was found that operating the user interface through gestures was feasible, but with inherent limitations. Selection gestures seemed to need further improvements for more effective recognition while scrolling and zooming felt natural and responsive.
This initial prototype has suggested further improvements that could be made to enhance the system. However, additional functionality would be need to be implemented if the current application were to be deployed on site.
The Kinect has a slight delay, yet is still responsive. Resolution was high enough for detailed interaction with the screen. However finger recognition was poor, such that more advanced gestures similar to sign language were infeasible. It is understood that the Kinect and/or similar hardware will improve over time. Current limitations such as the resolution are likely to improve and allow accurate finger tracking.
As gesture-based computing evolves, common functions within applications will develop their own widely accepted gestures. CTI must be careful not to design their gestures to differ significantly from the standard. Although speech recognition was not explored in this experiment, it is likely to be tested within the future to determine its feasibility within the application.
Search our website
Download: Kinect Demonstration Video