Kinect Prototype Project

Kinect Prototype Project

In November 2010 Microsoft released the Xbox Kinect, which is a device that is capable of capturing and interpreting human gestures. Various open source drivers have been developed that allow developers access to the Kinect's capabilities using a standard PC. In an effort to improve the user interaction with its software, CTI conducted a prototyping experiment to explore the gesture capabilities of the Kinect with regard to the Transport and Logistics Industry.

The objectives of the experiment were:

  • Determine the feasibility of controlling a Graphical User Interface (GUI) with gestures using the Kinect hardware
  • All functionality to be available only with gestures without the need of other input devices
  • Explore modern gesture based human-computer interaction

Hardware and Software platforms

Kinect 1
Figure 1.

The Kinect hardware consists of three main components:

  1. Standard RGB camera
  2. Infrared transmitter
  3. Infrared receiver

The infrared transmitter emits many signals across its field of view which are then picked up by the receiver. The data received is then built into a depth map which allows bodies to be tracked. The image illustrates these infrared signals with white dots.

Kinect 2
Figure 2.

The Kinect also includes an array of microphones that are capable of identifying speech and the position of its source. However, this capability was not explored nor implemented within this prototyping experiment.

A test application was developed and targeted for the following platform:

  • Ubuntu 10.10 Operating System
  • Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz
  • NVIDIA Geforce 9400 GT GPU
  • OpenNI's open source Kinect drivers

The Kinect connects to the PC via a USB connection while simultaneously connected to a power source.

Kinect 3
Figure 3.

A prototype was designed by CTI to visualize aircraft tracking data. The GUI is navigated purely with gestures as input from the Kinect.

Kinect 4
Figure 4.

In summary, the final architectural platform is illustrated in the following diagram.

Kinect 5
Figure 5.

Results

Firstly, it was observed that the OpenNI's drivers contained sample programs and code which demonstrated the capabilities of the driver. The sample programs visualised the Kinect's depth and user tracking capabilities. A powerful additional feature was the skeletal modeling. However, the library did not interpret skeletal movements as gestures. This task is left to the application developers.

Kinect 6
Figure 6.

User skeletons are made up of various joints with their own depth information. With this information, the developers were able to build on top OpenNI's framework and provide code that observed changes in depths of specific joints. Gestures are then identified simply based on specific changes of these joints' positions.

However, the Kinect's hardware limitations have significantly reduced the number trackable joints that can be observed. The Kinect has a small resolution of 640x480 which prevents small detailed joint recognition such as finger skeletal joints. As a result it was decided that gestures were to be designed around the movement of the user's hands, with fingers ignored.

Below is a picture showing the Kinect's interpretation of a hand at a distance of 1-2m. The hand in reality had its fingers spread apart. However, the space between some fingers is not recognized accurately.

Kinect 7
Figure 7.

Prototype Application Walkthrough


Flash is required to view this video. Download flash plugin.

Download: Kinect Demonstration Video

Upon start up, the application idles until it tracks a user. When the user is tracked, he/she is required to strike a specified pose to begin the calibration process. The pose is holding both arms up at right angles to the body as illustrated in Figures 8A and 8B.

Kinect 8a
Figure 8A.

Kinect 8b
Figure 8B.

When the calibration process is completed, the users hand will be visually isolated to indicate the promotion of gesture capabilities. A Cursor-wheel is assigned to each hand which accurately indicates the current position of the user's hand relative to the screen. It was found that this was required to provide feedback to the user.

A following reach gesture towards the TPACTM icon will trigger a selection to enter the application.

Kinect 9a
Figure 9A.

Kinect 9b
Figure 9B.

Inside the application, the user is able to perform all functions with gestures with no other input devices required.

User interaction is identified with a reach motion towards the camera. This is recognized with the hand's Cursor-wheel's color changing green. This ensures that the user won't accidentally trigger unwanted interaction. Once this reach is recognized the following functions are provided based on movements of one or both hands:

  • Standard selection
  • Scrolling/Panning
  • Zooming

Selection is done with a simple reach towards the screen. The current object underneath the cursor wheel is selected. Scrolling (see Figure 10A) is performed with the movement of only one of the user's reached hands. Zooming in or out (see Figure 10B) is performed with the expansion or retraction (respectively) of the distance between both hands.

Kinect 10a
Figure 10A.

Kinect 10b
Figure 10B.

Using these gestures, the user is able to navigate through the interface and display various types of information that would be viewable within a real time tracking system.

Kinect 11
Figure 11.

Legend:

  • Yellow dots: Ports within the current geographical region (e.g. Australia)
  • Green dots: Ports outside the current geographical region
  • Blue flashing dot: Current port nearest to the Cursor-wheel
  • Red dots: Mobile aircraft moving from the origin port to the destination port
  • Lines: Flight path

The user may switch view modes using the left most icon. Globe with altitude pins (see Figure 12A) and a flat map view (see Figure 12B).

Kinect 12a
Figure 12A.

Kinect 12b
Figure 12B.

Aircraft can be filtered by equipment type by selecting the circular arrow icon.

Kinect 13
Figure 13.

Conclusion

This experiment using the Kinect increased CTI's understanding of modern Human-Computer Interaction. It was found that operating the user interface through gestures was feasible, but with inherent limitations. Selection gestures seemed to need further improvements for more effective recognition while scrolling and zooming felt natural and responsive.

This initial prototype has suggested further improvements that could be made to enhance the system. However, additional functionality would be need to be implemented if the current application were to be deployed on site.

The Kinect has a slight delay, yet is still responsive. Resolution was high enough for detailed interaction with the screen. However finger recognition was poor, such that more advanced gestures similar to sign language were infeasible. It is understood that the Kinect and/or similar hardware will improve over time. Current limitations such as the resolution are likely to improve and allow accurate finger tracking.

As gesture-based computing evolves, common functions within applications will develop their own widely accepted gestures. CTI must be careful not to design their gestures to differ significantly from the standard. Although speech recognition was not explored in this experiment, it is likely to be tested within the future to determine its feasibility within the application.

Join 'Optimum' our Newsletter Service.