A Visual Approach to Sketched Symbol Recognition

Introduction:

The primary contributions of this paper is proposing an original way to recognize freehand sketches according to their visual appearance in contrast with individual strokes or geometric primitives combining temporal and spatial information. Another contribution is the creation of a new symbol rotation-deformation invariant classifier.

Approach:

Unlike pure off-line shape recognition, the extra information about the temporal nature of the strokes are used as well.

Procedure:

(1) Symbol Normalization:

Resampling, Scale and Translation. The resampling approach is to assign points according to fixed distance interval. Scale horizontally and vertically until the shape has a unit standard deviation in both axes. Translate the mass center of shapes into point (0,0).

(2) Feature Representation:

Four orientation features (0,45,90,135 degrees) measures how horizontal, vertical or diagonal the stroke is at each point. If stroke angle == reference angle, the value is 1. If they differs more than 45 degree, the value is 0.

The fifth feature is endpoint feature testing whether the starting and ending stroke points are overlapping. If they are, the feature value is 1, else it is 0.

The authors use five 24*24 grids to present the five features. This grids can be thought as feature images, in which the intensity of a pixel is determined by the maximum feature value of the sample points that fall within its cell. For example, for horizontal stroke, the intensity of 0-orientation image is high.

(3) Smoothing and Downsampling:

To make freehand sketches more tolerant to local shifts and distortions, the authors apply Gaussian smoothing function to each image that spreads feature values to neighboring pixels.

Then downsampling the images by a factor of 2 using MAX filter, where each pixel in the downsized image is the maximum of the four corresponding pixels in the original.

(4) Recognition:

Image Deformation Model (DFM) allows every point in the input image to shift within a 3*3 local window to form the best match to the prototype image. To avoid overfitting, the authors include the local context around each point, shifting 3*3 image patches instead of single pixels.

(5) Performance Optimization:

Coarse Candidate Pruning:

Indexing images using their first K principle components. Then using the distance between these reduced feature sets to find the nearest candidates.

Hierarchical Clustering:

Applying agglomerative hierarchical clustering to the training examples in each class. Then organizing them into groups based on complete-link distance. The process first initializes each symbol into its own cluster, then progressively merges the two nearest clusters until there is only one cluster per class. At each step, it records the two sub-clusters that are merged to form the present.

Rotational Invariance:

Generating and matching rotated versions of the input symbol to each of the training examples. In the implementation, 32 evenly spaced orientations from 0 to 360 degrees are used.

urjnasw xkfjjkn

urjnasw xkfjjkn's hot blog

2013年4月22日星期一