Masters Thesis '11, at KAIST
Gesture-Recognition Interface with Keyboard Embedded IR Modules [pdf] (Korean)
Hyunyoung Kim
Advisor: Minsoo Hahn
Suggested gestures on keyboard. 1) One hand sweep, 2) hold & one hand sweep, 3) both hands sweep, 4) sweep between keycaps.
Interface artwork (left top), implemented interface on keyboard (left bottom), circuit close-up (right).
Sensor test video
In most tasks we use keyboard & mouse at the same time. For example, we type with keyboard and change fonts with mouse. This kind of works cause hand shifts and inconvenience. Some experts use chords to reduce the time, however, chords are difficult remember for novice. From these perspectives, I suggested gesture-recognizable keyboards & gestures for it, which enables short input time than mouse (because it doesn't need shift) and shorter learning time than chords.
Thesis Summary
Problem and Study Objective
When the Graphical User Interface was invented, the first pointing device was also invented. The problem is that most computer tasks require a combination of keyboard and pointing device input, and users need to move their right hand from the keyboard to the pointing device alternatively, resulting in distraction and less productivity. Considering the amount of time we dedicate to using computers in one day, the aforementioned problem should be resolved. Trackpads reduce the distance between the keyboard and the pointing device. However, users still need to move their hands from the keyboard to the trackpad. The pointing stick is a stick installed between the G, H, and B keys on the keyboard. However, only some users prefer it because it requires a certain amount of strength to operate it and time to be familiar with it. An alternative method to using pointing devices to execute commands is using keyboard shortcut, i.e., pressing and holding several keys simultaneously. Shortcuts have the shortest task performance time, but users can have difficulties remembering them because the shortcuts have scant relation to their functions, e.g.,Ctrl+E for cEnter alignment in MS Word. Although the first letter in the function name can be used for the corresponding shortcut, e.g., Ctrl+L for Left alignment in MS Word, non-English speaking users might still have difficulties memorizing the shortcuts. On the other hand, gestures are relatively easy to memorize because they borrow from the physical movement of objects on the screen.
The objective of this thesis is to develop a novel gestural keyboard interface using IR-proximity sensors, thus providing reduced performance time compared with pointing devices for certain tasks. The specific objectives of the thesis are: 1) to propose on-keyboard gestures for a gestural keyboard, 2) to implement the gesture recognition hardware and software, and 3) to suggest an evaluation process to design gestures for the interface. This thesis contributes to the gestural interface with typing functionalities seen as the preliminary step for any attempt toward the proposition of an appropriate design and implementation method (chapters 1, 2).
User Gesture Design
Gestures in an interface should be consistent so that users can infer one gesture from another and remember them easily. General gesture design principles are devised to provide consistency. When there is a conflict between principles with a gesture, the former has a higher priority. First, for long-term interface use, it is essential for users to rest their wrists on a desk. Second, one hand or both can be used for a gesture. Simultaneous different gestures by each hand are not allowed to prevent memory interference. Third, the gesture size should be related to the visually effectual size of the function. Moreover, the gesture function should vary according to the target object to be manipulated. For example, if a figure is selected, the pinch-out gesture enlarges the figure. However, when some text is highlighted, the font size increases with the same gesture. Lastly, only flat palms are used for gestures in order to reject false truths caused by the normal hand posture. This aims at not using an explicit trigger for gesture recognition.
The Pinch-in and Pinch-out gestures involve posing both hands flat, placing them on the keyboard, and moving them to the center of the keyboard or outward in a horizontal direction. Up and Down moves all fingers, except the thumb, up and down while resting the wrist; the appearance is similar to scratching the keyboard. Such functions can be used to move a page up and down, or control the sound volume. Up-by-two-hands and Down-by-two-hands move the fingers up and down with two hands. This involves both hands and means moving a larger object based on the third design principle. The Left/right moves a hand to the left or right. Text alignment, previous or next page navigation, and previous or next music navigation in a music player can be performed through this gesture. The Scribble gesture places four fingers between the rows for the Q, W, E, and R and A, S, D, and F keycaps, moving them horizontally above a certain speed. This uses the metaphor of people scribbling on text when they want to erase it. The Scribble-by-two-hands performs the same gesture with both hands. Because this is a larger gesture size compared with one hand, a larger change of object is applied: minimizing and maximizing the current window. The next four gestures involve both hands and control application windows: Hold-and-left and Hold-and-right change the highlighted window. Hold-and-down minimizes all windows with the exception of the selected window, and hold-and-up undoes the minimizing action (chapter 3).
Implementation of Hardware Interface and Recognition Software
For hardware interface, 30 SG-105F infrared proximity sensors are embedded between the keyboard’s keycaps. Groups of ten sensors are embedded above each of the QWER, ASDF, and ZXCV rows; one sensor is installed above each keycap in the row. The Arduino Duemilanove microcontroller board reads each sensor every 100 µs. The board converts analog sensor values to digital values, and the digital values to binary values using a threshold. The one or zero values are transferred to the PC via the RS-232 protocol. On the PC side, the gesture recognition software reads the recent sensor values every 16 ms, and calculates the physical contact area size of the hand over the sensors. The software determines whether one hand or two are on the keyboard. When one hand is recognized, the software calculates the center point of the detected area, and finds a vector within several frames. When two hands are recognized as a wide blob, the software divides the blob horizontally and finds the center of each hand. The gestures are determined based on the height of the blob, the number of the hands, the vector direction and the vector scalar (chapter 4).
Comprehensive Interface Evaluation
Interface evaluation is conducted in three steps: gesture evaluation, interface evaluation, and performance evaluation with applications.
Gesture evaluation aims at finding gestures that are intuitive and cause less degree of fatigue. A screen, the implemented gestural keyboard interface, and survey documents were given to five daily computer users separately in the evaluation. The keyboard is used to observe how the participants react to the new interface and ask qualitative questions. At this stage, the keyboard does not recognize the gestures. The participants were asked to perform a gesture randomly as shown on the screen, and provide scores with regard to its relevance to the functions on the survey sheet. Then they were asked to repeat the gesture three times and score the fatigue levels. The Pinch-in/Pinch-out, Up/Down, and Scribble gestures showed high relevance scores to the targeted functions: graphical object and text font resizing, page up/down, and deleting one text line, respectively. The Left and Right gestures showed less relevance to the targeted function, left and right alignment of selected text. We assume the reason is that the number of possible alignments (three, left, right, and center) does not match the expected resulting number of gestures (two, relative left and relative right). The other gestures received similar scores between functions. We assume that this is because the gestures are new, and participants had difficulties predicting the functions. In the case of the fatigue survey, the participants indicated higher fatigue levels with vertical gestures—Up/Down, Up/Down-by-two-hands,Hold-and-up/down—because their fingers were fixed between keycaps. Scribble-by-two-hands obtained low relevance and high fatigue scores. For these reasons, all vertical gestures and Scribble-by-two-hands were eliminated from the final gesture list of the interface.
We tested the recognition rate and asked about the usefulness of the remaining gestures. Five users participated in this evaluation. The recognition rate for Left, Right, Scribble, Hold-and-left, and Hold-and-right is higher than 95%. However, the rate for the Pinch-in and Pinch-out gestures is 89.8% and 74.2%, respectively. The rate should be increased in future work. For the usefulness survey, the participants indicated that the gesture functions are useful, with the exception of the Hold-and-left and Hold-and-right gestures. This is because all participants were familiar with the functionality’s shortcut for switching the top-level application (Alt+Tab). The Hold-and-left and Hold-and-right gestures were also eliminated.
Two applications are used to test the interface performance in comparison with other pointing devices and shortcuts. One application is an album viewer implemented for evaluation purposes. Users can navigate between pictures in a folder and assign tags to each picture using the application. A series of tasks that require different types of gesture inputs was assigned to five participants. The second application is map navigation. A web browser is used to navigate Google Map. The gesture recognition software generates window key events according to the gesture so that users can perceive as though navigating the map through gestures. Both applications require users to perform GUI-related tasks—tasks can be performed by using pointing devices, gestures, or shortcuts—and typing tasks. Two non-It experts and three It experts participated in the evaluation. Enough time is provided to practice unfamiliar shortcuts as the participants want. In the case of non-IT experts, the gesture performance time is shorter than for a trackpad and pointing stick. The performance time of the shortcut is shorter than for the gestures. However, in the case of IT experts, they required longer time to input gestures than non-IT experts, and theirs is the longest time among other input interfaces. IT experts indicated that they were confused by performing gestures on a typing device to which they are accustomed. We can postulate a hypothesis that memory interference occurred between gestures and typing(chapter 5).
Discussions and Conclusions
An interesting aspect of the suggested interface was that most users find themselves to enjoy the interface. A cognitive stress evaluation could be performed in addition to an ergonomic stress evaluation. A long-term acceptance rate can be evaluated in case the interface is commercialized. Furthermore, additional applications were suggested: unlocking an OS, games, and emotional interaction. Unlocking an OS is similar to the Android’s pattern lock. Arachnoid will be a great example of the game applications. Emotional interaction shakes the whole screen when users repeatedly hit the keyboard with a force; this application if from a casual user research. Most users answered that they have a habit to hit the keyboard, flick keycaps, and type meaningless texts fast.
In this thesis, we proposed a gesture recognition keyboard interface. Novel gestures for the interface were designed and the related hardware and software system were implemented. The gestures were subjected to an evaluation processes to find those that were intuitive and comfortable. For computer novices, this interface is capable of reducing the number of right hand movements between the keyboard and a pointing device, and therefore reducing the performance time that require both the GUI-related and typing tasks (chapter 6).