Amir, thanks for writing this article. Very informative! Just a few weeks ago I tried out ZigFu and really liked it. I'm sorry to hear the team has left. But I hope, ZigFu keeps being successful, even if not in the VC sense.
In my humble opinion, one, if not the biggest problems is, that the Kinect is just not good enough, yet. My experience is, that the Kinect works well for the T-pose and anything that resembles it, but not well or at all once your arm is pointing towards the Kinect.
Here [1] is a recent paper, with the most detailed validation of the MS Kinect I have seen so far. They report a maximum error for reaching movements with OpenNI of 46.9 deg in the elbow angle estimation.
Another problem is lag. From what I hear, the main problem here is the USB 2.0 bandwidth. Hopefully the next generation, Kinect 2 and co., will improve a lot.
I agree that the sensors are a limiting factor for the current applications and of course Kinect is just the beginning of this technology in consumer markets. The description I provide of a 3-D integrated computer vision processor attached to a high-pixel-count CMOS array will address the latency, power and frame-rate issues. The algorithms and automation of computer vision to hardware still needs to be solved.
Personally, I have mixed feelings about integrating the skeleton trackers into the hardware. While this might drive down cost for specific use cases, it limits the capabilities of these natural interface devices. The skeleton trackers are based on many assumptions, like approximate camera angle and position, proper clothing etc.
Once, any of these assumptions don't hold true any more, for instance a camera from the top, side or back, a object held by the user, uncommon clothing like baggy pants or skirts, you have to write your own skeleton tracker in software.
In my humble opinion, one, if not the biggest problems is, that the Kinect is just not good enough, yet. My experience is, that the Kinect works well for the T-pose and anything that resembles it, but not well or at all once your arm is pointing towards the Kinect. Here [1] is a recent paper, with the most detailed validation of the MS Kinect I have seen so far. They report a maximum error for reaching movements with OpenNI of 46.9 deg in the elbow angle estimation. Another problem is lag. From what I hear, the main problem here is the USB 2.0 bandwidth. Hopefully the next generation, Kinect 2 and co., will improve a lot.
[1] S Choppin, J Wheat , 2012, Marker-less tracking of human movement using Microsoft Kinect http://w4.ub.uni-konstanz.de/cpa/article/view/5271