I recently bought a Kinect controller for my Xbox and is really loving it. Of course I had to see if I couldn’t use that for something, and after some other people were nice enough to create and release free Windows drivers and libraries, I had to try it!
I started out with Code Laboratories “CL NUI Platform” which is a simple installer that installs the drivers and some libraries. It also comes with some samples showing how to connect to the sensor.
I quickly got that going, with a small app that displays the RGB and Depth image from the sensor.
“Unfortunately” this is also all you get out of the box. If you expect to automatically get an event when you wave your hand or doing some other gesture, this is not for you. However if have a little love for some simple Image Analysis algorithms, you should be all over this too!
I wanted to make a simple gesture API that could tell me when a user drags, pinch/stretch or flicks something, very similar to what you do with a Touch Screen. So basically an API that mimics an invisible touch screen hanging suspended in the air. Luckily the algorithms for this turned out to be pretty simple.
The first thing you do is go through the depth image pixel by pixel. If you find a pixel that is too close or too far away, replace it with Color.Empty so we can ignore those. That we we will only “see” things in the picture that are within a certain distance from the sensor. Like for instance two hands stretched out. This leaves us with a set of pixels that could potential be some stretched out hands.
We need to associate these pixels with each other. Let’s say all pixels that are next to each other that are not transparent, belong to the same hand. We call these sets of grouped pixels for “Blobs” and there’s a nice simple algorithm for detecting all the blobs in an image, known as “Connected Component Labelling”. Instead of explaining how it works here, all you need to know is here on wikipedia. If you register their center and size, you can easily use these as “touch input” and discard too large or small blobs. This is what this looks like, using a circle for center and size of each blob:
30 times a second you will be calculating a new set of blobs. Compare these to the previous blobs, and use the center and size to see which blobs are the same and has moved, which are new, and which disappeared. If I only have one blob and it moves, I consider this a “drag”. If I have more than 1, I calculate the bounding box of all the blobs, measure its diagonal and compare it to the size of the previous diagonal - The fraction between these two diagonals is the scale in a pinch or stretch.
Now all we need to do is take these “gesture events” and hook them up to something. Like for instance a map. So here’s the end-result: (note : It looks a little jerky because of the screen capture software, but it runs a lot smoother when not recording and flicks works consistently).
It worked surprising well for a first go.
Download source: Sorry no source available at this time, due to some licensing restrictions.