urjnasw xkfjjkn's hot blog

2013年2月28日星期四

urjnasw xkfjjkn's new blog on Protractor: A Fast and Accurate Gesture Recognizer


A Fast and Accurate Gesture Recognizer
urjnasw xkfjjkn's extract on Protractor Paper
Protractor is faster and more accurate than other peer recognizers because it employs a novel method to measure the similarity between gestures, by calculating a minimum angular distance between them with a closed-form solution. Less memory demand and faster speed make it more suitable for mobile computing.

What is template-based recognizer? What are its cons and pros?
---In template-based recognizer, training samples are stored as templates, and at runtime, an unknown gesture is compared against these templates. Training samples are stored as templates, and at runtime, an unknown gesture is compared against these templates.
These recognizers are also purely data-driven, and they do not assume a distribution model that the target gestures have to fit. As a result, they can be easily customized for different domains or users, as long as training samples for the domain or user are provided.
Since a template-based recognizer needs to compare an unknown gesture with all of stored templates to make a prediction, it can be both time and space consuming, especially for mobile devices that have limited processing power and memory. However, Protractor is a special case.

How does Protractor work?
(1)     Protractor first resamples a gesture into a fixed number, N, equidistantly-spaced points, using the procedure described previously in $1 recognizer, and translate them so that the centroid of these points becomes (0, 0). This step removes the variations in drawing speeds and locations on the screen.
(2)     Next, Protractor reduces noise in gesture orientation.
When Protractor is specified to be orientation invariant, it rotates a resampled gesture around its centroid by its indicative angle, which is defined as the direction from the centroid to the first point of the resampled gesture.
When Protractor is specified to be orientation sensitive, it employs a different procedure to remove orientation noise. Protractor aligns the indicative orientation of a gesture with the one of eight base orientations that requires the least rotation. Since Protractor is data-driven, it can become orientation-invariant even if it is specified to be orientation-sensitive, e.g. if a user provides gesture samples for each direction for the same category.

Based on the above process, we acquire an equal-length vector in the form of (x1, y1, x2, y2, …, xN, yN) for each gesture. Note that Protractor does not rescale resampled points to fit a square as the $1 recognizer does because rescaling narrow gestures to a square will seriously distort them and amplify the noise in trajectories.
(3)     Classification by Calculating Optimal Angular Distances
For each pairwise comparison between a gesture template t and the unknown gesture g, Protractor uses the inverse cosine distance between their vectors, vt and vg, as the similarity score S of t to g.


From this, we can see Protractor is inherently scale invariant because the gesture size, reflected in the magnitude of the vector, becomes irrelevant to the distance.
Since the indicative angle is only an approximate measure of a gesture’s orientation, the alignment in the preprocessing cannot completely remove the noise in gesture orientation. This can lead to an imprecise measure of similarity and hence an incorrect prediction. To address this issue, at runtime, Protractor rotates a template by an extra amount so that it results in a minimum angular distance with the unknown gesture and better reflects their similarity.
Protractor employs a closed-form solution to find a rotation that leads to the minimum angular distance.

  Since we intend to rotate a preprocessed template gesture t by a hypothetical amount so that the resulting angular distance is the minimum (i.e., the similarity reaches its maximum), we formalize this intuition as:
Evaluation:
  Protractor is significantly faster than the $1 recognizer, the time needed for recognizing a gesture increases linearly for it.
  As training size increases, Protractor performs significantly more accurate than the $1 recognizer on this data set.
  Protractor uses N=16. But for $1 recognizer the paper mentioned that the good results are expected with 32<=N<=256. Protractor uses 1/4 of the space required by $1 recognizer. It would be interesting to see how the closed-form solution helped in decreasing N, still providing with good recognition results.

Bibliography:
Yang Li works at Google and He has done some amazing work in the area of HCI.