urjnasw xkfjjkn's hot blog

2013年4月22日星期一

Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches


Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

---By Michael Oltmans

Introduction:

This paper introduces the shape recognition approach based on the ink on the page (visual approach sketch recognition).

Challenges:

Since user-drawn sketches are informal, two people can have completely different writing styles. This introduces several big challenges in information sketch recognition field, such as overtracing, signal noises (variations that users do not intend), conceptual variations (A variety of ways some symbols can be drawn), segmentation (grouping strokes that correspond to a single shape) etc.

Isolated Shape Classification:

Similar to geometric semantic primitive part recognition, the classification approach here is also based on primitive parts. The only difference is we will focus on visual part instead of geometric semantic part of shapes. A visual part is an abstraction of the appearance of a region of the shape. It does not correspond to the semantic part of a shape (one of the lines in rectangle). It represents the appearance of a region of a shape (the top right hand corner of a rectangle).
Each visual part is a circular region that is subdivided into rings and wedges, like a dartboard. Each of the subdivisions of the circular region is a histogram bin that measures how many pixels of ink are contained on that region. The part is represented as a vector formed from the count in each bin. The good point of this presentation is it allows us to represent a wide variety of shapes, including those cannot be easily described with geometric primitives.
To form the representation, they first calculate the visual parts at a sampling of locations that cover the sketched symbol.
Next step they make use of a standard vocabulary of parts. The standard vocabulary is used to describe each sketched symbol in terms of the same set of parts. The vocabulary of parts is called the codebook, and generally contains 100 different parts. The representation of the sketched symbol is a vector of distances that indicates the degree to which each of the codebook parts appears in the sketched symbol. This vector of distances is called the match vector. It is calculated by finding the minimum distance in each row of the matrix.

Summary of Shape Localization:

This is accomplished in three steps:
(1)    Scan the input sketch to identify a large number of candidate locations. These locations may contain a shape, a part of a shape, parts of multiple shapes, or nothing at all.
(2)    The second step is to apply the classifier to each of the candidate location.
(3)    The third step is to sort through this candidates to produce a final set of predictions of where shapes are in the sketch

Representation of Visual Parts:

For more accurate lining up between two shapes, we represents the shapes as a collection of “parts” where each part represents the appearance of a portion of the shape. We then base our classification of the shape on the parts that it is composed of.

Bulls-eye Feature:

Bull-eyes features are calculated at a point and represent the appearance of a circular region surrounding that point. The central point of the feature is called the interest point. The circular region surrounding the interest point is divided into wedges, like a dartboard, by a series of concentric rings and pie slices. Each of the wedges in the circular region is a histogram bin that counts the number of ink points contained inside it. The appearance of ink within the region is represented by a vector containing the point count for each bin.
As can be clearly seen from the feature diagram, the radial bins are not evenly spaced. The inner rings are denser represents more detailed information about the appearance compared with the outer rings, which represents the “context” for inner rings. This is useful for our goals of suppressing signal noise and being able to represent a wide range of appearances.


Strokes Have Direction:

Inputs to system are still digital sketches. We focus on the trajectory of a stroke rather than the sequence of strokes. Therefore, there will be no constraints on the order users draw the components of each shape.

Making Features Rotational Invariant:

By calculating the bulls-eye’s histogram relative to the stroke direction, instead of relative to the x-axis, the bulls-eyes are rotationally invariant and do not change when the shape is drawn at a different orientation.
Two details must be dealt with to achieve true invariance.
(1)    Identical looking strokes can be drawn from either end.
(2)  The inherent noise in stroke trajectories results in the points along the stroke falling haphazardly onto either side of this boundary. This effect is eliminated by orienting the bullseye to the interest point's orientation and then rotating it by an additional amount equivalent to half a bin's angular width.
The stroke direction can be added as a third dimension to the histogram. It provides a count of how many points fall into each spatial bin at each direction.

How to calculate stroke direction:

To calculate the direction of the pen at a point along the stroke, we find the tangent of the stroke at that point. To deal with error caused by imprecision of pen movement, we deal with the direction of pen at a point in the following way: Calculating the tangent by fitting a line to a window of points immediately preceding and following the point. We find calculating the tangent over a window of 19 points generally provides good result.

Stroke preprocessing:

Size normalization: individual points based normalization: scale the shape so that the maximum distance between any two points in the shape is 75 pixels.
Resampling: Ensure the points on the stroke are constantly spaced. If there exits two points, whose distance is larger than 1 pixel, we will interpolate additional sample points between them.

Calculating distance between Bulls-eyes:

The distance between two bulls-eyes are calculated by comparing their vectors of bin counts.

Representation of Shapes:

What is codebook?

---It is the standard vocabulary of parts. It is derived from training set. We select a set of parts that span the range of parts that appear in the training set. Then we calculate the collection of bulls-eyes parts for each shape in our training set. After that, cluster them into groups of similar parts. Finally, form a codebook by using a representative from each cluster center.

What is match vector?

---A kind of representation that indicates the degree to which each of the codebook parts appears in the input shape.

Recognition:

Use the representation of Class label plus Matched Class Probability for each shape.

Steps:

Selecting candidate regions, classifying the candidate regions, and combining the classified regions to form the final predictions.
After clustering the initial candidates to form the intermediate candidates, there may still be predictions that overlap one another. To make a final set of predictions we follow a greedy strategy based on the total weight of each cluster. The final set of predictions is made by first selecting the highest scoring prediction and then removing any remaining predictions that significantly overlap the covered by that prediction.

Bibliography:

http://rationale.csail.mit.edu/publications/Oltmans2007Envisioning.pdf

A Visual Approach to Sketched Symbol Recognition


A Visual Approach to Sketched Symbol Recognition

Introduction:

The primary contributions of this paper is proposing an original way to recognize freehand sketches according to their visual appearance in contrast with individual strokes or geometric primitives combining temporal and spatial information. Another contribution is the creation of a new symbol rotation-deformation invariant classifier.

Approach:

Unlike pure off-line shape recognition, the extra information about the temporal nature of the strokes are used as well.

Procedure:

(1)    Symbol Normalization:
Resampling, Scale and Translation. The resampling approach is to assign points according to fixed distance interval. Scale horizontally and vertically until the shape has a unit standard deviation in both axes. Translate the mass center of shapes into point (0,0).
(2)    Feature Representation:
Four orientation features (0,45,90,135 degrees) measures how horizontal, vertical or diagonal the stroke is at each point. If stroke angle == reference angle, the value is 1. If they differs more than 45 degree, the value is 0.
The fifth feature is endpoint feature testing whether the starting and ending stroke points are overlapping. If they are, the feature value is 1, else it is 0.
The authors use five 24*24 grids to present the five features. This grids can be thought as feature images, in which the intensity of a pixel is determined by the maximum feature value of the sample points that fall within its cell. For example, for horizontal stroke, the intensity of 0-orientation image is high.
(3)    Smoothing and Downsampling:
To make freehand sketches more tolerant to local shifts and distortions, the authors apply Gaussian smoothing function to each image that spreads feature values to neighboring pixels.
Then downsampling the images by a factor of 2 using MAX filter, where each pixel in the downsized image is the maximum of the four corresponding pixels in the original.
(4)    Recognition:
Image Deformation Model (DFM) allows every point in the input image to shift within a 3*3 local window to form the best match to the prototype image. To avoid overfitting, the authors include the local context around each point, shifting 3*3 image patches instead of single pixels.

(5)    Performance Optimization:
      Coarse Candidate Pruning:
Indexing images using their first K principle components. Then using the distance between these reduced feature sets to find the nearest candidates.

Hierarchical Clustering:
Applying agglomerative hierarchical clustering to the training examples in each class. Then organizing them into groups based on complete-link distance. The process first initializes each symbol into its own cluster, then progressively merges the two nearest clusters until there is only one cluster per class. At each step, it records the two sub-clusters that are merged to form the present.


Rotational Invariance:
Generating and matching rotated versions of the input symbol to each of the training examples. In the implementation, 32 evenly spaced orientations from 0 to 360 degrees are used.


Bibliography:

http://www.aaai.org/ocs/index.php/IJCAI/IJCAI-09/paper/viewFile/281/918

2013年4月11日星期四

ShortStraw-A Simple and Effective Corner Finder for Polylines


ShortStraw-A Simple and Effective Corner Finder for Polylines
Summarized by urjnasw xkfjjkn
This paper introduces a polylines corner finding algorithm called ShortStraw. It is simple to implement and outperforms the state-of-the-art corner finding algorithm for polylines.
The implementation is comprised of the following two basic steps:
(1)    Resampling:
Unlike traditional resampling using fixed time period to determine a resampling point, ShortStraw uses fixed distance to determine when to insert a point to original graph. The fixed distance is  (size of bounding box / 40 ) empirically. Specifically speaking, resampling point set is generated according to the following algorithm:
The termination condition for this algorithm is when i > |number of points in original graph|.
(2)    Corner Finding:
There are two approaches for corner finding:
[1] Bottom-up:
First of all, the concept of straw should be clarified. Straw means the Euclidean distance between point Pi+w and Pi-w, where w is window size. Empirically it should be 3 for higher corner recognition rate. After summing up the straw of each point in the original stroke and take the average, we will get a straw median. 0.95 * straw median is used as a threshold. When the straw length of a point is less than the threshold, we take the local minimum as a corner.
[2] Top-down:
Top-down is run after bottom-up. It is used to find missing corners and rejecting false positives.
After running Bottom-up, we have already got a bunch of corners. For each consecutive corners, we run IS-Line test:

Basically, this is the simplest line test ever. If it is a line, r equals to 1. Otherwise, r should be less than 1. Here, we set r equals 0.95 as threshold. When r between two consecutive corners is less than the threshold, we need to relax the threshold to find missing corner around the middle half of the segment. This process is repeated until r is larger than the threshold.
On the other hand, we need to do collinear check on three consecutive corners cl, cm, cn. If cl and cn pass the IS-Line test mentioned above, just remove cm. What should be emphasized is all returned corners should be from resampling points obtained in the first step.
Pros and Cons for ShortStraw:
Pros: Compared with other common corner recognition techniques, such as Sezgin, Kim, ShortStraw is quicker, easier to implement and more accurate. It does not need any temporal information.
Cons: Only applicable to stroke consisting only polylines, cannot handle arcs, not sensitive to obtuse angles. 

Bibliography:

A. Wolin, B. Eoff, and T. Hammond. 2008. ShortStraw: a simple and effective corner finder for polylines. In Proceedings of the Fifth Eurographics conference on Sketch-Based Interfaces and Modeling (SBM'08), Christine Alvarado and Marie-Paule Cani (Eds.). Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 33-40.
https://diglib.eg.org/EG/DL/WS/SBM/SBM08/033-040.pdf.abstract.pdf%3binternal&action=action.digitallibrary.ShowPaperAbstract

2013年4月9日星期二

Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes


Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes

This paper introduces a Mechanix---a computer-assisted tutoring system for engineering students. It can recognize freehand sketches and provide instant feedbacks. Mechanix will release the burden of grading so that instructors can assign more free response questions in homework. Quite interesting!
Firstly, let me clarify the concept of “body” in this paper. Any stroke or set of strokes that forms a closed shape as a possible “body”. To correctly recognize the “bodies”, we need the following steps:

Body Identification:

To check whether a collection of segments form a “body” (closed shape), we just pick a random starting point from one of the segments. For the other end of the segment, we try to find out the closest end point of another segment in the collection. If the distance between this two endpoints are within the threshold of 9% of the total length of the strokes (segments in the collection), then we predict the two strokes connect with each other. This simple process repeats again and again until the end point in the last segment connects back to the starting point of the first segment. If it is, we predict the collection of segments as a “body”.

Body Comparison:

One big difference between “body comparison” and “instance-based recognition” is only one template will be used to generate a binary “match/no match” classification.
Three key metrics to determine similarity of a body and a template are Hausdorff distance, modified Hausdorff distance and Tanimoto coefficient.

Hausdorff distance:
PA is the vector of points in template. PB is the vector of points in “body”. Hausdorff distance is to calculate the maximum of the maximum D from A to B and B to A.

Modified Hausdorff distance:



It is the average of the averages of DA and DB.
To determine whether the “body” matches with the reference (template), the authors introduces a “match probability”:

where 20 is the half of the bounding box size.

Tanimoto coefficient:


where nAB is the number of overlapping pixels. nA,nB are the total number of pixels in A and B respectively.
For easier computation, we usually use a lax definition of overlapping. That means we deal with points rather than pixels any more.
where nAB is the number of overlapping points from A to B (Here overlapping means the distance between the two points is no more than 10% of the bounding box size)
T(A,B) represents the overall percentage of overlap between two shapes.

About Trusses:

Trusses are important structure for engineers because of their comprehensive applications, such as bridges, airplane wings, building supporting roofs, etc.

Trusses identification:

We first use the PaleoSketch algorithm to preprocess each stroke and identify all of the line segments. We use all of the line segments in a diagram to build a graph with each segment as a node, and edges between segments that intersect. Once we have built the connection graph, we consider each edge AB as a potentially shared edge. We remove the AB edge from the graph and run a breadth-first search (BFS) to search for another path from A to B. If we do not find a path, then AB is not even part of a polygon, and so can’t be a shared edge. If we do find a path then AB is part of a polygon, and we remove all of the edges in that path from the graph and use BFS to search for a third path from A to B (AB being the first path). If we do not find a path, then AB is not a shared edge. If we do find a path, then the two paths we found with BFS each represent a polygon sharing the AB edge.

Trusses Comparison:

When comparing a student’s truss to the reference solution, we consider both the sketch and the adjacency graph. It is important that the student’s submission has the exact same graphical structure as the reference solution.

Bibliography:

http://dl.acm.org/citation.cfm?id=2283803&bnc=1





A Domain_Independent System for Sketch Recognition



A Domain_Independent System for Sketch Recognition

This paper primarily introduces a domain-independent sketch recognition system, which uses direction graph and curvature graph to identify primitive shapes, uses feature-area verification for double check and post-process to complete hierarchical output.
Let’s start from imprecise stroke approximation:
Initial stroke approximation is basically done with the help of direction graph and curvature graph.
These two graphs are obtained by analyzing the sampling points on the original stroke.

where dn is the direction between point (n+1) and point n. cn is the curvature between point (n-k) and (n+k). The nominator is the total direction change between point (n-k) and point (n+k), and the denominator is the path length between these two points.
Then Direction graph and Curvature graph can be obtained:
For example,


For the Star Stroke, there are four times change of direction. So basically, there are four big changes in direction graph.
Feature area is used for stroke verification. It works like this:
The shaded area is the difference between original stroke and reference stroke. When the difference is very small, it basically indicates that the reference shape is exactly the one indicated by the original stroke.
From now on, we will dive into specific cases on how to proceed with primitive shape detection:

For vertex detection:

When an input stroke cannot be approximated with any kind of primitive shape, we will divide the stroke into two sub-strokes. The dividing point will be the point with highest curvature value in current stroke.

For line detection:

Generally, there are two ways for line detection. The first one is to check whether the direction curve can be fitted with a horizontal line in direction graph. The second one is to check whether the actual coordinates of the stroke can be fitted by a line. Then we can use feature-area verification for double check, just as shown in figure 3(a)

For curve detection:

If a line with non-zero slope shows in direction graph, the stroke is likely to be a curve. If the slope is close to 2*pi/N, then it is likely to be a circle. If succeeds, we hypothesize a candidate circle which takes the center of the stroke’s bounding box as its own center and the mean distance between the center and each stroke point as its radius. If the difference between the stroke and candidate circle is within the reasonable threshold, we can identify the stroke as a circle.

For arc detection:

The algorithm for arc recognition is devised as follows:
(1) Check whether the direction curve can be approximated with a line, and abort if fails.
(2) Find the perpendicular bisector, denoted as M, of the first and last stroke points, calculate the intersection point A between line M and the stroke path, and then find the circle on which the first and last stroke points as well as point A reside. The circle’s center is taken as the center of
the candidate arc.
(3) Take the mean distance from each stroke point to the center as the radius.
(4) Compute the ratio between the corresponding feature area against the candidate arc and that arc’s standard area. Check whether the ratio is less than a predefined threshold, and abort if fails.
Handling self-intersection Strokes
Given a stroke which cannot be approximated by any of our predefined primitive shapes, if it has no self-intersection point, then follow the original procedure; otherwise, make two copies of that stroke, divide them using different strategies, i.e. from self-intersection points versus from the point with maximum curvature, then compare respective recognition result and choose the better one.

Post-Process:

In this stage, we need to do simple relation retrieval, cleanup and basic object recognition. Then everything is done, we can output our sketch recognition result.

Bibliography:

http://dl.acm.org/citation.cfm?id=604499