Combining Gestures and Speech for High Productivity in Multitasking Environments

4 min readApr 8, 2018

The concept of multitasking is not new to us, especially in this fast paced world. In fact, this is something that is so innate to us, that we just naturally keep switching between tasks, involved in both the physical and virtual world. Viewing Instagram or doodling while a lecture is going on, scanning Safari loaded with relevant webpages, Word open for taking notes all while having one eye and your whole heart focused on your crush. Although that last bit might be the reason for my staying up late to complete this article, it goes on to affirm that we humans are naturals at multitasking.

But what happens when one is put into a slightly more stressful environment, at least more so than just idly sitting in class; perhaps in a kitchen? A lot of other factors now come into play, such as keeping an eye on the various vessels that are always just a second away from burning the dish you have painstakingly made, or the amount of walking one has to do in the kitchen between getting items from the fridge to the stove into the microwave and onto the counter. Keeping tabs on all that is happening, and coming out triumphant with dishes that are even just edible is enough reason to nominate one for your country’s highest honour.

Enter Digital Voice based Assistants. These melodious but slightly robotic sounding voices embodied by small cylindrical pieces of metal with some mesh wire have become man’s companion in the war zone that we call kitchen. The Google Assistant whips out some fancy recipes, for you have only one chance at impressing your crush, Alexa keeps the tabs on how much each dish has cooked. This lack of burden now allows you to flirt with Siri while you update yourself on what your friends have been upto via Facebook, until Alexa reminds you to turn the gas down to ensure your pasta remains more than just an attempt.

We find ourselves, as seen in the previous example, to be increasingly moving away from traditional input devices like mouse and to some extent, even touch, entering into a brave new era of contactless interfaces, heralded by impeccable advances in Voice based technology. With the onset of IoT, we will find ourselves communicating with our refrigerators, microwaves, toasters, and probably even interacting with kitchen counters. Although the current interface for this interaction is a GUI controlled by touch, we are albeit slowly, but surely making progress towards a Voice based future.

But, at times, we find just speech to be rather limiting. As a species, we have evolved in a manner so as to use our bodies to communicate whatever it is we are saying. These gestures, whilst appearing insignificant, make up a large percentage of what we perceive of the person talking to us. This clearly means that the brain is able to collect a lot more data about what a person is saying through his/her body language than just by their voice. Therefore, it only makes sense that we use gestures along with voice to communicate with the devices of the future, especially in environments that require multitasking.

Using tools like Leap Motion to track hand movements, we can remotely access devices and use them without the need for any complex interface in between. We can control devices through simple intuitive gestures, requiring almost zero learning. This will help bridge the gap between computers and humans by ensuring computers recede into the background, and our environment becomes more humane.

There are multiple use cases for such an interaction. Lets say you want the flame on the stove to increase, instead of asking Alexa to turn the stove up to 60%, we could instead just ask her to turn the stove up by “this much” and gesture turning a knob as to how much we’d want the flame to go up by. This coupled with a visual indicator would ensure we can continue going about our tasks without increasing stress levels that come with having to leave a certain task incomplete to tend to something like this, or can be used by elderly independent people for whom moving about a lot is detrimental to health.

This could also be used in hospitals by nurses who have to walk a fair bit around a room to ensure all the equipment is delivering the right amount of dosage to each patient. It could be used in a car to control the dashboard equipment without having to lift one’s hands off the steering wheel. All this ensures continuity in our daily lives without without being perturbed by cumbersome interfaces.

This will be the first step in the evolution towards a screen less future, towards a future where rectangular slabs don’t govern our day to day activities, towards a future where computers don’t interfere, but instead improve and and help us connect with the world around us.

Combining Gestures and Speech for High Productivity in Multitasking Environments

Written by Avyay Kashyap