A Fast and Accurate Gesture Recognizer
urjnasw xkfjjkn's extract on Protractor Paper
Protractor is faster and more accurate than
other peer recognizers because it employs a novel method to measure the
similarity between gestures, by calculating a minimum angular distance between
them with a closed-form solution. Less memory demand and faster speed make it more
suitable for mobile computing.
What is template-based recognizer? What are
its cons and pros?
---In template-based recognizer, training
samples are stored as templates, and at runtime, an unknown gesture is compared
against these templates. Training samples are stored as templates, and at
runtime, an unknown gesture is compared against these templates.
These recognizers are also purely
data-driven, and they do not assume a distribution model that the target
gestures have to fit. As a result, they can be easily customized for different
domains or users, as long as training samples for the domain or user are
provided.
Since a template-based recognizer needs to compare
an unknown gesture with all of stored templates to make a prediction, it can be
both time and space consuming, especially for mobile devices that have limited processing
power and memory. However, Protractor is a special case.
How does Protractor work?
(1)
Protractor first resamples a
gesture into a fixed number, N, equidistantly-spaced points, using the
procedure described previously in $1 recognizer, and translate them so that the
centroid of these points becomes (0, 0). This step removes the variations in
drawing speeds and locations on the screen.
(2)
Next, Protractor reduces noise
in gesture orientation.
When
Protractor is specified to be orientation invariant, it rotates a resampled
gesture around its centroid by its indicative angle, which is defined as the
direction from the centroid to the first point of the resampled gesture.
When
Protractor is specified to be orientation sensitive, it employs a different
procedure to remove orientation noise. Protractor aligns the indicative
orientation of a gesture with the one of eight base orientations that requires
the least rotation. Since Protractor is data-driven, it can become
orientation-invariant even if it is specified to be orientation-sensitive, e.g.
if a user provides gesture samples for each direction for the same category.
Based on
the above process, we acquire an equal-length vector in the form of (x1, y1,
x2, y2, …, xN, yN) for each gesture. Note that Protractor does not rescale resampled
points to fit a square as the $1 recognizer does because rescaling narrow
gestures to a square will seriously distort them and amplify the noise in
trajectories.
(3)
Classification by Calculating
Optimal Angular Distances
For each pairwise
comparison between a gesture template t and the unknown gesture g, Protractor
uses the inverse cosine distance between their vectors, vt and vg, as the
similarity score S of t to g.
From this, we can see Protractor is inherently scale invariant
because the gesture size, reflected in the magnitude of the vector, becomes irrelevant
to the distance.
Since the
indicative angle is only an approximate measure of a gesture’s orientation, the
alignment in the preprocessing cannot completely remove the noise in gesture
orientation. This can lead to an imprecise measure of similarity and hence an
incorrect prediction. To address this issue, at runtime, Protractor rotates a
template by an extra amount so that it results in a minimum angular distance
with the unknown gesture and better reflects their similarity.
Protractor employs a closed-form solution to find a rotation that
leads to the minimum angular distance.
Since we intend to rotate a
preprocessed template gesture t by a hypothetical amount so that the resulting
angular distance is the minimum (i.e., the similarity reaches its maximum), we
formalize this intuition as:
Evaluation:
Protractor is
significantly faster than the $1 recognizer, the time needed for recognizing a
gesture increases linearly for it.
As training size
increases, Protractor performs significantly more accurate than the $1
recognizer on this data set.
Protractor
uses N=16. But for $1 recognizer the paper mentioned that the good results are
expected with 32<=N<=256. Protractor uses 1/4 of the space required
by $1 recognizer. It would be interesting to see how the closed-form
solution helped in decreasing N, still providing with
good recognition results.
Bibliography:
Yang Li works at Google and He has done some amazing work in the
area of HCI.