We have all become familiar with the set of candidates that are lining up to replace the traditional keyboard and mouse as the primary interaction paradigm.  At this point, almost all of us have experience using multi-touch, and many of us have experience using voice and gesture control.

Very little of my day involves the mouse and keyboard anymore – I write emails and blog entries on the touchpad of my tablet and I product plan using voice input and Evernote on my smartphone, which is a far cry from my programming days that had me tapping at my keyboard for eight hours at a time.  Many people believe that touch technology is a productivity killer, which would inhibit the long-term adoption of this technology. Leap Motion, among other companies, are postulating gesture as a natural alternative for input, and they may be right. Others speculate that a screen-less device is really what we need to maximize productivity.

But researchers at Dartmouth College have recently added even another candidate to the mix: eye tracking control.  Detecting and tracking the human eye using a camera is a problem that was claimed by the computer vision community in the 90s and has received intermittent attention ever since.  I’ve worked on this problem in the past with Amit Kale. He believed that gaze detection could help model individuals’ intention at a distance, since you look more often at things you are moving toward.  The statistical models we were developing as part of a wide-area tracking project could be better informed by accurate estimates of gaze.  Throughout the past decade, computer vision research projects like ours have focused on passive sensing in surveillance scenarios with or without the active participation of the individual being tracked. Only recently has the technology been tied to applications that can directly assist the individual being tracked.

By using the front-facing camera now built into most smartphones and tablets, Andrew Campbell’s research team at Dartmouth asked a pretty straightforward question – can eye tracking technologies be the basis for smartphone control?  The EyePhone project explores a range of controllable behaviors, including eye blink patterns that answer a call, and gaze-directed selection or launching of applications.  I applaud them for their efforts, and the work is noteworthy in that the debate for how we will control the proliferation of data, devices and displays is still unsettled.  I have my own (probably biased) opinions about where we need to head, but I will go out on a limb and say eye tracking control will not be successful.

Why? The eye is a cognitive input device and an integral part of your brain.  A large amount of image processing occurs just a few millimeters from the surfaces of the retina.  Borrowing from or retasking the visual system for output control, as opposed to input, is difficult.  Your eye is completing very specific tasks to process its surroundings, many of which will need to be ignored or deemed irrelevant for eye tracking control to work.  For example, saccadic eye movements – a natural and unintentional part of the sensing process – may be interpreted as miscues to the input system. How practical would the mouse and keyboard be if our fingers unintentionally clicked up and down periodically?

These challenges to using the eye as an input device were put forward in a seminal human-computer interaction paper by Rob Jacob in 1990 entitled “What You Look at is What You Get: Eye Movement Based Interaction Techniques.” It is a great paper and describes how overloading the eye causes the “Midas Touch” problem – triggering input commands by accident when you were really just looking at different parts of your desktop.

Despite the obvious drawbacks of gaze-based control, I’m happy to see that the debate about more efficient and natural human-computer interaction models is still robust.  The methods we use today need updating – my hands and neck are sore from typing this post on my tablet while I’m crammed in a low-fare airplane seat.  We are launching Solstice this month. Once that’s available for download, I’ll tell you more about how I think the interaction problem should be solved.  Stay tuned!


About Christopher Jaynes

Jaynes received his doctoral degree at the University of Massachusetts, Amherst where he worked on camera calibration and aerial image interpretation technologies now in use by the federal government. Jaynes received his BS degree with honors from the School of Computer Science at the University of Utah. In 2004, he founded Mersive and today serves as the company's Chief Technology Officer. Prior to Mersive, Jaynes founded the Metaverse Lab at the University of Kentucky, recognized as one of the leading laboratories for computer vision and interactive media and dedicated to research related to video surveillance, human-computer interaction, and display technologies.

Submit Comment