Google Home is coming out today. It is getting super excited to see that Google is using its cutting-edge technology on smart home, just like Amazon Echo and Samsung Atrik. What is interesting about Google Home is that we see many other opportunities other than just voice commands.
While security is surely one thing (and one big thing of course) in these smart home devices (or precisely in the IoT development), I thought of one thing in particular: sensing. Although it is not limited to Google Home, here I use it as an example.
To give a sense on sensing, here is an example. Imagine that you have a Google Home at home. Say you also have light bulbs that allow fine-grained brightness control. Voice commands are good but limited sort of. You can for sure say "Okay Google, turn on/off the light". But it would be tricky to say "Okay Google, turn on the light at brightness level x". While it would work, the level x idea is pre-defined and can be very hard to remember. Typically people fine tune the control via a small nob. That leads me to another project named Project Soli at Google.
Just a brief introduction on Project Soli. This is a hopefully on-going project at Google that enables fine-grained controls on smartwatches given their small size. Its goal is to analyze gestures at finger level so that user interaction, the way we control the device, is not limited to touch and swipe screen. Micro-gestures enable abundant user interface possibilities since they extend the control area to the whole space around the smartwatch. To achieve micro-gesture recognitions, it uses mmWave (e.g., the unlicensed 60GHz band). Thanks to mmWave's small wavelength and highly predictable signals, we can "see" fingers in the air and derive the micro-gestures with a small chip inside a smartwatch.
While sounding, using mmWave for micro-gesture is very challenging since it requires the device to be absolutely stationary. This is because any small changes in the position of the device (without accurately tracking it at mm-level accuracy), the phase measurements will be destroyed. Any phase distortion will lead to a misconception in "seeing" fingers in the air and break the results (i.e. wrong gesture recognition).
Thus it seems that micro-gesture recognition is a perfect fit when we use it on the stationary device, e.g., Google Home.
An example scenario is as follows. Instead of using sound that requires pre-defined and coarse control over the brightness, why not use micro-gestures? It is natural that people can control brightness by just moving fingers with some certain gestures. So we can say that "Okay Google, turn on the light" and then its microphones determine the direction of the sound source. Using this direction (hopefully accurate), we use the mmWave chip to beamform to the sound source. The light is on now but it is too dark. So immediately after we say "turn on the light", we put out our hand and do simple gestures just like twisting a small string (shown below like Project Soli's promotion idea) and then light is getting brighter and brighter!
A problem here is whether we can achieve such fine-grained control at far distance. From a fundamental radar law, we know that at far distance like several meters, for a small chip it is very hard to see things that are below 10cm. Maybe we can coarsely localize it, but it will be impossible (99% sure) to see each individual finger. Also, since mmWave is highly directional and cannot really penetrate walls (unless with very high power), it won't be very practical if we talk to the Google Home but there is a wall in between.
Is there anything we can research in academia to overcome these two issues? In both system design and software implementation and maybe hardware improvement?
Other than micro-gestures, other possibilities exist too. For example, can we promote 2nd screen services, which send notifications (ads) to phone (2nd screen) while you are watching TV selling a product, leveraging the microphones on Google Home?