urjnasw xkfjjkn's hot blog

2013年1月27日星期日

Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes


Gestures without Libraries, Toolkits or Training: 
A $1 Recognizer for User Interface Prototypes
Summary:
    This paper primarily introduces $1 recognizer, which is an easy, cheap, and usable recognizer consisting of about 100 lines. Although it is a simple recognizer, it can still achieve excellent recognition accuracy, even better than more complicated recognizer, such as Rubine recognizer.
The $1 gesture recognizer:
    The ultimate aim of $1 recognizer is to determine which set of previously recorded template points Ti most closely matches the set of candidate points C from user’s gesture result. However, this is not easy with a few concerns between templates and candidate, such as different size, angle, drawing speed and so forth. Keep these concerns in mind and the desire for simplicity, there are several requirements to $1 recognizer:
(1)Be resilient to variations in sampling due to movement speed or sensing;
(2)Support optional and configurable rotation, scale, and position invariance;
(3)Require no advanced mathematical techniques (e.g., matrix inversions, derivatives, integrals);
(4)Be easily written in few lines of code;
(5)Be fast enough for interactive purposes (no lag);
(6)Allow developers and application end-users to “teach” it new gestures with only one example;
(7)Return an N-best list with sensible [0..1] scores that are independent of the number of input points;
(8)Provide recognition rates that are competitive with more complex algorithms, such as Dynamic Time Warping and Rubine.
    With these requirements in mind, $1 recognizer is realized with a simple 4-step algorithm.
A simple 4-step algorithm:
(1) Resample the Point Path
The purpose of this step is to eliminate the influence of movement speed to the number of input points in a gesture. For example,
Figure 1 The relationship between articulation speed and number of input points
Usually, we resample gestures such that the path defined by their original M points is defined by N equidistantly spaced points( 32<=N<=256 ).
Figure 2 The effect of different number of resampling point
Figure 2 shows that when N = 32, the sampling points are too sparse to identify path. However, when N = 128, it is so dense that will add time to path comparisons.
A common way to solve this problem is:
(1)Calculating the total length of the M-point path.
(2)Dividing the length by (N-1) to obtain the length of each increment L between N new points.
(3)Step through the path such that when the distances covered exceeds L, add a new point.




(2)Rotate Once Based on the “Indicative Angle”
We define “Indicative Angle” as the angle formed by the centroid and starting point of a gesture. Find this angle and rotate the gesture so that the angle is at 0 degree. The process just like the following figure shows,
Figure 3 Rotate gesture so that indicative angle is zero


(3)Scale and Translate:
Considering the difference in the size of candidate and template, the candidate will be scale into a reference square.
After scaling, the gesture is translated to a reference point so that the centroid is exactly at the reference point.

(4) Find the Optimal Angle for the Best Score:
A key to determine match or not between candidate and template is the following equation:

It defines the path-distance between candidate and template. The template with the minimum path-distance to candidate is the result of the recognition.
After di is computed, we converted it into a range [0,1] score using:

The variable size is the length of a side of the reference square.


An analysis of Rotation Invariance:
In step 2, adjusting “indicative angle” to 0 degree cannot guarantee the minimum path-distance between candidate and template. It is just an approximation. However, there exist a few approaches to help us achieve the minimized path-distance. One is “seed and search”, this brute force approach simply rotate candidate by +1 degree for all 360 degree and take the best result. An improvement to this approach is called “hill climbing”. It rotates candidate by +/- 1 degree for as long as candidate’s path-distance to template decreases. Nevertheless, hill climbing is not so effective for dissimilar ones. Another improvement called Golden Section Search guarantees it will finish after exactly 10 iterations, regardless of whether or not two gestures are similar.
Figure 4 Path-distance as a function of angular rotation away from zero indicative angle for similar and dissimilar gestures


Limitation of the $1 recognizer:
(1)$1 recognizer cannot distinguish gestures whose identities depend on specific orientations, aspect ratios, or locations. For example, separating squares from rectangles, circles from ovals, or up-arrows from down-arrows is not possible without modifying the algorithm.
(2)horizontal and vertical lines are abused by non-uniform scaling
(3)$1 recognizer does not use time, so gestures cannot be differentiated on the basis of speed.

Conclusion from evaluation:
(1)Medium articulation speed is more accurate than slow and high articulation speed for Rubine, $1, and DTW.
(2)The more number of templates, the more accurate the recognition will be.( Rubine will have most significant improvement with the increase of template )

Bibliography:
Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes, Jacob Wobbrock and Yang Li, The Information School, University of Washington, Seattle, Washington, Andew D Wilson, Microsoft Research, Redmond, Washington, UIST 2007, Proceedings of the 20th annual ACM symposium on User Interface Software and Technology, Pages 159-168, ACM New York, NY, USA 2007
http://dl.acm.org/citation.cfm?id=1294238







没有评论:

发表评论