urjnasw xkfjjkn's hot blog

2013年1月27日星期日

Specifying Gestures by Example


Blog on paper: Specifying Gestures by Example
Summary:
    Although the creative HCI by means of gestures is appealing, it is not rapidly developed due to the difficulty to create gestures and the code complexity of gesture recognizer. Therefore, this paper primarily talks about GRANDMA (Gesture Recognizers Automated in a Novel Direct Manipulation Architecture), with which gesture recognizer can be created automatically from example gestures, removing the need for complicated hand coding.
    Each recognizer is rapidly trained from a small number of examples of each gesture. The attributes of these gestures should be meaningful and be considered in gesture classification.
    Then this paper refers to GDP, which is a gesture-based drawing program built with GRANDMA. All gestures in GDP are arranged in hierarchy view classes, as shown in Figure 1.
 Figure 1. GDP view classes and associated gesture sets
    Next, it tells us how to add a new gesture into GDP’s initial click-and-drag interface.
(1)Create a new gesture handler and associate it with the GraphicObjectView class in Figure 1. The gesture handler window is shown in Figure 2.
Figure 2. 4 new gesture class in gc6 – gc9
(2) Enter training examples for each new gesture class. Figure 3 shows the process of entering 7 training examples for “Delete” gesture class( gc9 ). Typically, 15 training examples per gesture class is perfect.
Figure 3. 7 training examples for “delete” gesture
    After introducing how to add new gestures, the paper begins to talk about gesture recognition. The essential of gesture recognition is given a input gesture g, determine from a set of C gesture classes, to which g belongs.
    Each gesture is represented by an array of sample points:
gp = ( xp, yp, tp ), 0<=p<=P, 
P: total number of sample points, tp: the time specific point is drawn
    Feature is an important aspect to classify gestures. There are three principles for the selection of features:
(1)They should be incrementally computable in constant time per input point.
(2)They should be meaningful so that it can be used in gesture semantics and for recognition.
(3)There should be enough but not too many features to provide differentiation for all gestures.
    The following 13 features, shown in Figure 4, are a good selection to identify strokes:



A few noticeable points:
(1)cos(α) and sin(α) are used as a feature instead of the α itself to avoid a discontinuity as the angle passes through 2pi and warps to 0.
(2)The initial angle features are computed from the first and third mouse point because the result is generally less noisy than when computed from the first two points. Nowadays, because of the enhancement in sampling precision, we may take the first and the fifth point as measurement points.

Gesture Classification formula:

Training:
    The purpose of training is to determine the weight of Wc0 and Wci
from example gestures.

Rejection:
    The purpose of rejection is to rejecting ambiguous gestures and outliers.

Eager recognition and multi-finger recognition:
    Eager recognition: the recognition of gestures as soon as they are unambiguous. In this recognition, classification occurs on every mouse point.
    Multi-finger recognition: the recognition of gestures made with multiple fingers simultaneously. By treating the multi-finger input as multi-path data, the single-stroke recognition algorithm may be applied to each of the path individually and the result combined to classify the multi-path gesture.

Bibliography:
Specifying Gestures by Example, Dean Rubine, Information Technology Center, Carnegie Mellon University, Pittsburgh, PA, SIGGRAPH 91, Proceedings of the 18th annual conference on Computer graphics and interactive techniques, Pages 329-337, ACM New York, NY, USA 1991. http://dl.acm.org/citation.cfm?id=122753





Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes


Gestures without Libraries, Toolkits or Training: 
A $1 Recognizer for User Interface Prototypes
Summary:
    This paper primarily introduces $1 recognizer, which is an easy, cheap, and usable recognizer consisting of about 100 lines. Although it is a simple recognizer, it can still achieve excellent recognition accuracy, even better than more complicated recognizer, such as Rubine recognizer.
The $1 gesture recognizer:
    The ultimate aim of $1 recognizer is to determine which set of previously recorded template points Ti most closely matches the set of candidate points C from user’s gesture result. However, this is not easy with a few concerns between templates and candidate, such as different size, angle, drawing speed and so forth. Keep these concerns in mind and the desire for simplicity, there are several requirements to $1 recognizer:
(1)Be resilient to variations in sampling due to movement speed or sensing;
(2)Support optional and configurable rotation, scale, and position invariance;
(3)Require no advanced mathematical techniques (e.g., matrix inversions, derivatives, integrals);
(4)Be easily written in few lines of code;
(5)Be fast enough for interactive purposes (no lag);
(6)Allow developers and application end-users to “teach” it new gestures with only one example;
(7)Return an N-best list with sensible [0..1] scores that are independent of the number of input points;
(8)Provide recognition rates that are competitive with more complex algorithms, such as Dynamic Time Warping and Rubine.
    With these requirements in mind, $1 recognizer is realized with a simple 4-step algorithm.
A simple 4-step algorithm:
(1) Resample the Point Path
The purpose of this step is to eliminate the influence of movement speed to the number of input points in a gesture. For example,
Figure 1 The relationship between articulation speed and number of input points
Usually, we resample gestures such that the path defined by their original M points is defined by N equidistantly spaced points( 32<=N<=256 ).
Figure 2 The effect of different number of resampling point
Figure 2 shows that when N = 32, the sampling points are too sparse to identify path. However, when N = 128, it is so dense that will add time to path comparisons.
A common way to solve this problem is:
(1)Calculating the total length of the M-point path.
(2)Dividing the length by (N-1) to obtain the length of each increment L between N new points.
(3)Step through the path such that when the distances covered exceeds L, add a new point.




(2)Rotate Once Based on the “Indicative Angle”
We define “Indicative Angle” as the angle formed by the centroid and starting point of a gesture. Find this angle and rotate the gesture so that the angle is at 0 degree. The process just like the following figure shows,
Figure 3 Rotate gesture so that indicative angle is zero


(3)Scale and Translate:
Considering the difference in the size of candidate and template, the candidate will be scale into a reference square.
After scaling, the gesture is translated to a reference point so that the centroid is exactly at the reference point.

(4) Find the Optimal Angle for the Best Score:
A key to determine match or not between candidate and template is the following equation:

It defines the path-distance between candidate and template. The template with the minimum path-distance to candidate is the result of the recognition.
After di is computed, we converted it into a range [0,1] score using:

The variable size is the length of a side of the reference square.


An analysis of Rotation Invariance:
In step 2, adjusting “indicative angle” to 0 degree cannot guarantee the minimum path-distance between candidate and template. It is just an approximation. However, there exist a few approaches to help us achieve the minimized path-distance. One is “seed and search”, this brute force approach simply rotate candidate by +1 degree for all 360 degree and take the best result. An improvement to this approach is called “hill climbing”. It rotates candidate by +/- 1 degree for as long as candidate’s path-distance to template decreases. Nevertheless, hill climbing is not so effective for dissimilar ones. Another improvement called Golden Section Search guarantees it will finish after exactly 10 iterations, regardless of whether or not two gestures are similar.
Figure 4 Path-distance as a function of angular rotation away from zero indicative angle for similar and dissimilar gestures


Limitation of the $1 recognizer:
(1)$1 recognizer cannot distinguish gestures whose identities depend on specific orientations, aspect ratios, or locations. For example, separating squares from rectangles, circles from ovals, or up-arrows from down-arrows is not possible without modifying the algorithm.
(2)horizontal and vertical lines are abused by non-uniform scaling
(3)$1 recognizer does not use time, so gestures cannot be differentiated on the basis of speed.

Conclusion from evaluation:
(1)Medium articulation speed is more accurate than slow and high articulation speed for Rubine, $1, and DTW.
(2)The more number of templates, the more accurate the recognition will be.( Rubine will have most significant improvement with the increase of template )

Bibliography:
Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes, Jacob Wobbrock and Yang Li, The Information School, University of Washington, Seattle, Washington, Andew D Wilson, Microsoft Research, Redmond, Washington, UIST 2007, Proceedings of the 20th annual ACM symposium on User Interface Software and Technology, Pages 159-168, ACM New York, NY, USA 2007
http://dl.acm.org/citation.cfm?id=1294238