A Visual Approach to Sketched Symbol
Recognition
Introduction:
The
primary contributions of this paper is proposing an original way to recognize
freehand sketches according to their visual appearance in contrast with
individual strokes or geometric primitives combining temporal and spatial
information. Another contribution is the creation of a new symbol
rotation-deformation invariant classifier.
Approach:
Unlike
pure off-line shape recognition, the extra information about the temporal
nature of the strokes are used as well.
Procedure:
(1)
Symbol Normalization:
Resampling, Scale and Translation. The resampling approach is
to assign points according to fixed distance interval. Scale horizontally and
vertically until the shape has a unit standard deviation in both axes.
Translate the mass center of shapes into point (0,0).
(2)
Feature Representation:
Four orientation features (0,45,90,135 degrees) measures how
horizontal, vertical or diagonal the stroke is at each point. If stroke angle
== reference angle, the value is 1. If they differs more than 45 degree, the
value is 0.
The fifth feature is endpoint feature testing whether the
starting and ending stroke points are overlapping. If they are, the feature
value is 1, else it is 0.
The authors use five 24*24 grids to present the five features.
This grids can be thought as feature images, in which the intensity of a pixel
is determined by the maximum feature value of the sample points that fall
within its cell. For example, for horizontal stroke, the intensity of
0-orientation image is high.
(3)
Smoothing and Downsampling:
To make freehand sketches more tolerant to local shifts and
distortions, the authors apply Gaussian smoothing function to each image that
spreads feature values to neighboring pixels.
Then downsampling the images by a factor of 2 using MAX
filter, where each pixel in the downsized image is the maximum of the four
corresponding pixels in the original.
(4)
Recognition:
Image Deformation Model (DFM) allows every point in the input
image to shift within a 3*3 local window to form the best match to the
prototype image. To avoid overfitting, the authors include the local context around
each point, shifting 3*3 image patches instead of single pixels.
(5)
Performance Optimization:
Coarse Candidate
Pruning:
Indexing images using their first K principle components. Then
using the distance between these reduced feature sets to find the nearest
candidates.
Hierarchical
Clustering:
Applying agglomerative hierarchical clustering to the training
examples in each class. Then organizing them into groups based on complete-link
distance. The process first initializes each symbol into its own cluster, then
progressively merges the two nearest clusters until there is only one cluster
per class. At each step, it records the two sub-clusters that are merged to
form the present.
Rotational Invariance:
Generating and matching rotated versions of the input symbol
to each of the training examples. In the implementation, 32 evenly spaced
orientations from 0 to 360 degrees are used.
没有评论:
发表评论