2013年4月22日星期一

Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

---By Michael Oltmans

Introduction:

This paper introduces the shape recognition approach based on the ink on the page (visual approach sketch recognition).

Challenges:

Since user-drawn sketches are informal, two people can have completely different writing styles. This introduces several big challenges in information sketch recognition field, such as overtracing, signal noises (variations that users do not intend), conceptual variations (A variety of ways some symbols can be drawn), segmentation (grouping strokes that correspond to a single shape) etc.

Isolated Shape Classification:

Similar to geometric semantic primitive part recognition, the classification approach here is also based on primitive parts. The only difference is we will focus on visual part instead of geometric semantic part of shapes. A visual part is an abstraction of the appearance of a region of the shape. It does not correspond to the semantic part of a shape (one of the lines in rectangle). It represents the appearance of a region of a shape (the top right hand corner of a rectangle).

Each visual part is a circular region that is subdivided into rings and wedges, like a dartboard. Each of the subdivisions of the circular region is a histogram bin that measures how many pixels of ink are contained on that region. The part is represented as a vector formed from the count in each bin. The good point of this presentation is it allows us to represent a wide variety of shapes, including those cannot be easily described with geometric primitives.

To form the representation, they first calculate the visual parts at a sampling of locations that cover the sketched symbol.

Next step they make use of a standard vocabulary of parts. The standard vocabulary is used to describe each sketched symbol in terms of the same set of parts. The vocabulary of parts is called the codebook, and generally contains 100 different parts. The representation of the sketched symbol is a vector of distances that indicates the degree to which each of the codebook parts appears in the sketched symbol. This vector of distances is called the match vector. It is calculated by finding the minimum distance in each row of the matrix.

Summary of Shape Localization:

This is accomplished in three steps:

(1) Scan the input sketch to identify a large number of candidate locations. These locations may contain a shape, a part of a shape, parts of multiple shapes, or nothing at all.

(2) The second step is to apply the classifier to each of the candidate location.

(3) The third step is to sort through this candidates to produce a final set of predictions of where shapes are in the sketch

Representation of Visual Parts:

For more accurate lining up between two shapes, we represents the shapes as a collection of “parts” where each part represents the appearance of a portion of the shape. We then base our classification of the shape on the parts that it is composed of.

Bulls-eye Feature:

Bull-eyes features are calculated at a point and represent the appearance of a circular region surrounding that point. The central point of the feature is called the interest point. The circular region surrounding the interest point is divided into wedges, like a dartboard, by a series of concentric rings and pie slices. Each of the wedges in the circular region is a histogram bin that counts the number of ink points contained inside it. The appearance of ink within the region is represented by a vector containing the point count for each bin.

As can be clearly seen from the feature diagram, the radial bins are not evenly spaced. The inner rings are denser represents more detailed information about the appearance compared with the outer rings, which represents the “context” for inner rings. This is useful for our goals of suppressing signal noise and being able to represent a wide range of appearances.

Strokes Have Direction：

Inputs to system are still digital sketches. We focus on the trajectory of a stroke rather than the sequence of strokes. Therefore, there will be no constraints on the order users draw the components of each shape.

Making Features Rotational Invariant:

By calculating the bulls-eye’s histogram relative to the stroke direction, instead of relative to the x-axis, the bulls-eyes are rotationally invariant and do not change when the shape is drawn at a different orientation.

Two details must be dealt with to achieve true invariance.

(1) Identical looking strokes can be drawn from either end.

(2) The inherent noise in stroke trajectories results in the points along the stroke falling haphazardly onto either side of this boundary. This effect is eliminated by orienting the bullseye to the interest point's orientation and then rotating it by an additional amount equivalent to half a bin's angular width.

The stroke direction can be added as a third dimension to the histogram. It provides a count of how many points fall into each spatial bin at each direction.

How to calculate stroke direction:

To calculate the direction of the pen at a point along the stroke, we find the tangent of the stroke at that point. To deal with error caused by imprecision of pen movement, we deal with the direction of pen at a point in the following way: Calculating the tangent by fitting a line to a window of points immediately preceding and following the point. We find calculating the tangent over a window of 19 points generally provides good result.

Stroke preprocessing:

Size normalization: individual points based normalization: scale the shape so that the maximum distance between any two points in the shape is 75 pixels.

Resampling: Ensure the points on the stroke are constantly spaced. If there exits two points, whose distance is larger than 1 pixel, we will interpolate additional sample points between them.

Calculating distance between Bulls-eyes:

The distance between two bulls-eyes are calculated by comparing their vectors of bin counts.

Representation of Shapes:

What is codebook?

---It is the standard vocabulary of parts. It is derived from training set. We select a set of parts that span the range of parts that appear in the training set. Then we calculate the collection of bulls-eyes parts for each shape in our training set. After that, cluster them into groups of similar parts. Finally, form a codebook by using a representative from each cluster center.

What is match vector?

---A kind of representation that indicates the degree to which each of the codebook parts appears in the input shape.

Recognition:

Use the representation of Class label plus Matched Class Probability for each shape.

Steps:

Selecting candidate regions, classifying the candidate regions, and combining the classified regions to form the final predictions.

After clustering the initial candidates to form the intermediate candidates, there may still be predictions that overlap one another. To make a final set of predictions we follow a greedy strategy based on the total weight of each cluster. The final set of predictions is made by first selecting the highest scoring prediction and then removing any remaining predictions that significantly overlap the covered by that prediction.

Bibliography:

http://rationale.csail.mit.edu/publications/Oltmans2007Envisioning.pdf

A Visual Approach to Sketched Symbol Recognition

Introduction:

The primary contributions of this paper is proposing an original way to recognize freehand sketches according to their visual appearance in contrast with individual strokes or geometric primitives combining temporal and spatial information. Another contribution is the creation of a new symbol rotation-deformation invariant classifier.

Approach:

Unlike pure off-line shape recognition, the extra information about the temporal nature of the strokes are used as well.

Procedure:

(1) Symbol Normalization:

Resampling, Scale and Translation. The resampling approach is to assign points according to fixed distance interval. Scale horizontally and vertically until the shape has a unit standard deviation in both axes. Translate the mass center of shapes into point (0,0).

(2) Feature Representation:

Four orientation features (0,45,90,135 degrees) measures how horizontal, vertical or diagonal the stroke is at each point. If stroke angle == reference angle, the value is 1. If they differs more than 45 degree, the value is 0.

The fifth feature is endpoint feature testing whether the starting and ending stroke points are overlapping. If they are, the feature value is 1, else it is 0.

The authors use five 24*24 grids to present the five features. This grids can be thought as feature images, in which the intensity of a pixel is determined by the maximum feature value of the sample points that fall within its cell. For example, for horizontal stroke, the intensity of 0-orientation image is high.

(3) Smoothing and Downsampling:

To make freehand sketches more tolerant to local shifts and distortions, the authors apply Gaussian smoothing function to each image that spreads feature values to neighboring pixels.

Then downsampling the images by a factor of 2 using MAX filter, where each pixel in the downsized image is the maximum of the four corresponding pixels in the original.

(4) Recognition:

Image Deformation Model (DFM) allows every point in the input image to shift within a 3*3 local window to form the best match to the prototype image. To avoid overfitting, the authors include the local context around each point, shifting 3*3 image patches instead of single pixels.

(5) Performance Optimization:

Coarse Candidate Pruning:

Indexing images using their first K principle components. Then using the distance between these reduced feature sets to find the nearest candidates.

Hierarchical Clustering:

Applying agglomerative hierarchical clustering to the training examples in each class. Then organizing them into groups based on complete-link distance. The process first initializes each symbol into its own cluster, then progressively merges the two nearest clusters until there is only one cluster per class. At each step, it records the two sub-clusters that are merged to form the present.

Rotational Invariance:

Generating and matching rotated versions of the input symbol to each of the training examples. In the implementation, 32 evenly spaced orientations from 0 to 360 degrees are used.

Bibliography:

http://www.aaai.org/ocs/index.php/IJCAI/IJCAI-09/paper/viewFile/281/918

2013年4月11日星期四

ShortStraw-A Simple and Effective Corner Finder for Polylines

Summarized by urjnasw xkfjjkn

This paper introduces a polylines corner finding algorithm called ShortStraw. It is simple to implement and outperforms the state-of-the-art corner finding algorithm for polylines.

The implementation is comprised of the following two basic steps:

(1) Resampling:

Unlike traditional resampling using fixed time period to determine a resampling point, ShortStraw uses fixed distance to determine when to insert a point to original graph. The fixed distance is (size of bounding box / 40 ) empirically. Specifically speaking, resampling point set is generated according to the following algorithm:

The termination condition for this algorithm is when i > |number of points in original graph|.

(2) Corner Finding:

There are two approaches for corner finding:

[1] Bottom-up:

First of all, the concept of straw should be clarified. Straw means the Euclidean distance between point P_i+w and P_i-w, where w is window size. Empirically it should be 3 for higher corner recognition rate. After summing up the straw of each point in the original stroke and take the average, we will get a straw median. 0.95 * straw median is used as a threshold. When the straw length of a point is less than the threshold, we take the local minimum as a corner.

[2] Top-down:

Top-down is run after bottom-up. It is used to find missing corners and rejecting false positives.

After running Bottom-up, we have already got a bunch of corners. For each consecutive corners, we run IS-Line test:

Basically, this is the simplest line test ever. If it is a line, r equals to 1. Otherwise, r should be less than 1. Here, we set r equals 0.95 as threshold. When r between two consecutive corners is less than the threshold, we need to relax the threshold to find missing corner around the middle half of the segment. This process is repeated until r is larger than the threshold.

On the other hand, we need to do collinear check on three consecutive corners c_l, c_m, c_n. If c_l and c_n pass the IS-Line test mentioned above, just remove c_m. What should be emphasized is all returned corners should be from resampling points obtained in the first step.

Pros and Cons for ShortStraw:

Pros: Compared with other common corner recognition techniques, such as Sezgin, Kim, ShortStraw is quicker, easier to implement and more accurate. It does not need any temporal information.

Cons: Only applicable to stroke consisting only polylines, cannot handle arcs, not sensitive to obtuse angles.

Bibliography:

A. Wolin, B. Eoff, and T. Hammond. 2008. ShortStraw: a simple and effective corner finder for polylines. In Proceedings of the Fifth Eurographics conference on Sketch-Based Interfaces and Modeling (SBM'08), Christine Alvarado and Marie-Paule Cani (Eds.). Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 33-40.
https://diglib.eg.org/EG/DL/WS/SBM/SBM08/033-040.pdf.abstract.pdf%3binternal&action=action.digitallibrary.ShowPaperAbstract

2013年4月9日星期二

Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes

This paper introduces a Mechanix---a computer-assisted tutoring system for engineering students. It can recognize freehand sketches and provide instant feedbacks. Mechanix will release the burden of grading so that instructors can assign more free response questions in homework. Quite interesting!

Firstly, let me clarify the concept of “body” in this paper. Any stroke or set of strokes that forms a closed shape as a possible “body”. To correctly recognize the “bodies”, we need the following steps:

Body Identification:

To check whether a collection of segments form a “body” (closed shape), we just pick a random starting point from one of the segments. For the other end of the segment, we try to find out the closest end point of another segment in the collection. If the distance between this two endpoints are within the threshold of 9% of the total length of the strokes (segments in the collection), then we predict the two strokes connect with each other. This simple process repeats again and again until the end point in the last segment connects back to the starting point of the first segment. If it is, we predict the collection of segments as a “body”.

Body Comparison:

One big difference between “body comparison” and “instance-based recognition” is only one template will be used to generate a binary “match/no match” classification.

Three key metrics to determine similarity of a body and a template are Hausdorff distance, modified Hausdorff distance and Tanimoto coefficient.

Hausdorff distance:

P_A is the vector of points in template. P_B is the vector of points in “body”. Hausdorff distance is to calculate the maximum of the maximum D from A to B and B to A.

Modified Hausdorff distance:

It is the average of the averages of D_A and D_B.

To determine whether the “body” matches with the reference (template), the authors introduces a “match probability”:

where 20 is the half of the bounding box size.

Tanimoto coefficient:

where n_AB is the number of overlapping pixels. n_A,n_B are the total number of pixels in A and B respectively.

For easier computation, we usually use a lax definition of overlapping. That means we deal with points rather than pixels any more.

where n_AB is the number of overlapping points from A to B (Here overlapping means the distance between the two points is no more than 10% of the bounding box size)

T(A,B) represents the overall percentage of overlap between two shapes.

About Trusses:

Trusses are important structure for engineers because of their comprehensive applications, such as bridges, airplane wings, building supporting roofs, etc.

Trusses identification:

We first use the PaleoSketch algorithm to preprocess each stroke and identify all of the line segments. We use all of the line segments in a diagram to build a graph with each segment as a node, and edges between segments that intersect. Once we have built the connection graph, we consider each edge AB as a potentially shared edge. We remove the AB edge from the graph and run a breadth-first search (BFS) to search for another path from A to B. If we do not find a path, then AB is not even part of a polygon, and so can’t be a shared edge. If we do find a path then AB is part of a polygon, and we remove all of the edges in that path from the graph and use BFS to search for a third path from A to B (AB being the first path). If we do not find a path, then AB is not a shared edge. If we do find a path, then the two paths we found with BFS each represent a polygon sharing the AB edge.

Trusses Comparison:

When comparing a student’s truss to the reference solution, we consider both the sketch and the adjacency graph. It is important that the student’s submission has the exact same graphical structure as the reference solution.

Bibliography:

http://dl.acm.org/citation.cfm?id=2283803&bnc=1

A Domain_Independent System for Sketch Recognition

This paper primarily introduces a domain-independent sketch recognition system, which uses direction graph and curvature graph to identify primitive shapes, uses feature-area verification for double check and post-process to complete hierarchical output.

Let’s start from imprecise stroke approximation:

Initial stroke approximation is basically done with the help of direction graph and curvature graph.

These two graphs are obtained by analyzing the sampling points on the original stroke.

where d_n is the direction between point (n+1) and point n. c_n is the curvature between point (n-k) and (n+k). The nominator is the total direction change between point (n-k) and point (n+k), and the denominator is the path length between these two points.

Then Direction graph and Curvature graph can be obtained:

For example,

For the Star Stroke, there are four times change of direction. So basically, there are four big changes in direction graph.

Feature area is used for stroke verification. It works like this:

The shaded area is the difference between original stroke and reference stroke. When the difference is very small, it basically indicates that the reference shape is exactly the one indicated by the original stroke.

From now on, we will dive into specific cases on how to proceed with primitive shape detection:

For vertex detection:

When an input stroke cannot be approximated with any kind of primitive shape, we will divide the stroke into two sub-strokes. The dividing point will be the point with highest curvature value in current stroke.

For line detection:

Generally, there are two ways for line detection. The first one is to check whether the direction curve can be fitted with a horizontal line in direction graph. The second one is to check whether the actual coordinates of the stroke can be fitted by a line. Then we can use feature-area verification for double check, just as shown in figure 3(a)

For curve detection:

If a line with non-zero slope shows in direction graph, the stroke is likely to be a curve. If the slope is close to 2*pi/N, then it is likely to be a circle. If succeeds, we hypothesize a candidate circle which takes the center of the stroke’s bounding box as its own center and the mean distance between the center and each stroke point as its radius. If the difference between the stroke and candidate circle is within the reasonable threshold, we can identify the stroke as a circle.

For arc detection:

The algorithm for arc recognition is devised as follows:
(1) Check whether the direction curve can be approximated with a line, and abort if fails.
(2) Find the perpendicular bisector, denoted as M, of the first and last stroke points, calculate the intersection point A between line M and the stroke path, and then find the circle on which the first and last stroke points as well as point A reside. The circle’s center is taken as the center of
the candidate arc.
(3) Take the mean distance from each stroke point to the center as the radius.
(4) Compute the ratio between the corresponding feature area against the candidate arc and that arc’s standard area. Check whether the ratio is less than a predefined threshold, and abort if fails.

Handling self-intersection Strokes
Given a stroke which cannot be approximated by any of our predefined primitive shapes, if it has no self-intersection point, then follow the original procedure; otherwise, make two copies of that stroke, divide them using different strategies, i.e. from self-intersection points versus from the point with maximum curvature, then compare respective recognition result and choose the better one.

Post-Process:

In this stage, we need to do simple relation retrieval, cleanup and basic object recognition. Then everything is done, we can output our sketch recognition result.

Bibliography:

http://dl.acm.org/citation.cfm?id=604499

2013年3月27日星期三

A few things to learn about machine learning

(1) Learning = Representation + Evaluation + Optimization
Representation: A classifier must be represented in some formal language that the computer
can handle. Hypothesis space: A set of classifiers that a learner can learn.
Evaluation: An objective function used to distinguish good classifiers from bad ones.
Optimization: Search among the classifiers in the language for the highest-scoring one.
(2) It’s generalization that counts
The fundamental goal of machine learning is to generalize beyond the examples in the training
set. It is a huge mistake to train and test on the same data. But holding data will reduce the amount of
available for training. One solution to this contradiction is cross-validation: randomly dividing your
training data into (say) 10 subsets, holding out each one while training on the rest, testing each learned
classifier on the examples it did not see, and averaging the results to see how well the particular
parameter setting does .
(3) Data alone is not enough
Every learner must embody some knowledge or assumptions beyond the data it is given in order
to generalize beyond it. This is called induction. It turns a small amount of input knowledge into
a large amount of output knowledge.
A corollary of this is that one of the key criteria for choosing a representation is which kinds of
knowledge are easily expressed in it.
(4) Overfitting has many faces
Overfitting problem occurs when we try to determine classifier with insufficient knowledge and
data.
One way to understand overfitting is by decomposing generalization error into bias and variance.
Bias is a learner’s tendency to consistently learn the same wrong thing. Variance is the tendency
to learn random things irrespective of the real signal.
(5) Intuition Fails in high Dimensions
Another big problem after Overfitting is curse of dimensionality. Basically, it means generalizing
gets harder as your dimension s get larger. If you have 100 dimensions and 1 trillion training
samples, you only cover 10 to the negative eighteenth power of the possible combination of the
input space. That’s tiny! Sometimes more dimensions mean more noise.
(6) Theoretical Guarantees are not What they seem
In machine learning field, the most common theoretical guarantee is a bound on the number of
examples needed to ensure good generalization. One of the major developments of recent
decades has been the realization that in fact we can have guarantees on the results of induction,
particularly if we are willing to settle for probabilistic guarantees.
(7) Feature engineering is the Key
The most important factor is the features used. Learning is easy if you have many independent
features that each correlate well with the class.
(8) A dumb algorithm with lots and lots of data beats a clever one with modest amounts of it.
The sufficiency of examples is necessary.
(9) Learn many models, not Just one
The best learner varies from application to application. If instead of selecting the best variation
found, we combine many variations, the results are better—often much better—and at little
extra effort for the user. Bagging is an example of this, in which “we simply generate random
variations on the training set by resampling, learn a classifier on each, and combine the results
by voting. This works because it greatly reduces variance while only slightly increasing bias.”
(10) Simplicity Does not imply Accuracy
Occam’s razor states that in machine learning, given two classifier with the same training error,
the simpler one has the lowest test error. However, “no free lunch” theorems imply it can-not
be true. Simplicity is a preference that is not related to accuracy.
(11) Just because a function can be represented does not mean it can be learned
Don’t be lured in by claims that every problem can be representable by a certain algorithm. It
may be representable but not learnable given your training set.
(12) Correlation Does Not Imply Causation
Correlation is potentially causation but should not be stated as such without proof

Bibliography:
Pedro Domingos, A few useful things to know about machine learning Communications of the ACM
CACM Homepage archive Volume 55 Issue 10, October 2012 Pages 78-87

2013年3月24日星期日

The Word-Gesture Keyboard: Reimaging Keyboard Interaction

Summary:

This paper primarily talks about Word-Gesture Keyboard, a creative HCI method through keyboard. It is on average considered more preferred, more fun, and less physically but more visually demanding. It is designed with following three aims in mind:

(1) Fast input speed

(2) Minimal recognition load on new users

(3) Improved efficiency with users getting more familiar with Word-Gesture Keyboard

Word-Gesture Keyboard will allow user to write each and every word via a word gesture. Here, the “word” is not limited in lexicon. It can be tokens defined by arbitrary strings of characters, such as “Gmail”.

Word-Gesture Keyboard Feasibilities:

One big problem of Word-Gesture Keyboard is most word gestures will run across letters that are not part of the word intended. Fortunately, it can be solved with statistical regularities of natural language, which indicates some character sequences are more likely than others and most simply don’t exist as legitimate words. It implies that all valid letter combinations will form a finite set that can be captured in a language model. With this theory breakthrough, all possible words can be represented geometrically on a given key-board layout as word gestures and matched against users’ input gestures. Just as mentioned above, “word” here has a more generalized meaning. It can be rare names, jargons, email addresses, passwords, etc. We name this kinds of words Out of Vocabularies (OOV). In our Word-Gesture Keyboard, OOV letter sequences can always be entered by typing the individual letter keys. If these OOV sequences are frequently used then they may be added to the system’s list of recognized words, either manually or automatically. An addition to the solution is N-best suggestions. When “tip top” conflict occurs, they can be addressed by manual selection from the alternative N-best suggestions, or automatically according to word context.

Word-Gesture Keyboard Efficiency:

One Continuous Movement: In comparison to tapping-based touchscreen keyboards, gesture keyboards do not require up and down movements for each letter. Therefore, it is undoubtedly faster. The speed advantage of a single-stroke word gesture input can also be understood in motor control modeling terms. Tapping individual letters in a word can be viewed as a sequence of discrete target pointing tasks, each can be modeled by Fitts’ law (showed below).

tk,k + 1 is the time duration from tapping the kth letter (key) to the (k+ 1)th letter in the word; Dk,k + 1 is the movement distance from the kth letter to the (k+ 1) letter; and Sis the size of the target key. a and b are two constants of Fitts’ law. ID is called Fitts’ index of difficulty, measured in bits.

Experts’ conclusion is goal-crossing task is faster than tapping on the same sized targets as long as ID is less than 4 bits. Here, “goal” means a letter key needed in a word.

Auto word ending and spacing: each time a user lifts the finger from the touch surface, a word and a space are entered. Not having to enter a space character after each word is another efficiency advantage of a gesture keyboard.

Error-tolerance: Error tolerance allows the user to cut corners, to be inaccurate but fast.

One finger operation: This is the only aspect that Word-Gesture Keyword is not as good as two-handed typing. This is particularly true when the keyboard layout is the conventional QWERTY on which consecutive letters of a word tend to alternate between the left and right side of the keyboard. With two handed-typing, when one hand strikes one letter the other hand can, to some degree, move towards the next letter in parallel.

Word-Gesture Keyboard Ease of Use:

First, typing on a keyboard is a familiar text input method to most, if not all computer and smartphone users.

Second, drawing or doodling is a fun and easy action that even children enjoy doing.

Third, the user does not have to have learned any gestures before using a word-gesture keyboard.

Word-Gesture Keyboard Ease VS Efficiency:

The two types of behavior are two ends of a continuum. Our main behavioral theory of word shorthand gesture key-boards is that their use automatically shifts from the ease end (visual tracing) to the efficient end (recall gesturing).

Importantly, we do not expect the users to gesture every word without looking at the keyboard. Due to the Zipf’s law effect, a small number of words are used disproportionally frequently and their stroke patterns are memorized early. Longer and less common words are typically made of common fragments whose shapes can be quickly remembered. An important word-gesture key-board property is that it does not force the user into either “mode”. The user gradually progresses from the easy end to the more efficient end in use. In this sense, a word-gesture keyboard is a “progressive user interface.”

Word-Gesture Keyboard Gesture Recognitions:

where P(G|W) is the likelihood of W’s word gesture matching a user’s input gesture G, and P(W) reflects the system’s estimate of prior probability that the word W is the user’s intended word. The denominator P(G) only depends on the user’s gesture and is invariant during the search.

The search for the user’s intended word is thus the product of two model estimates. The probability P(G|W) reflects the gestural model and the probability P(W) reflects the language model.

In order to estimate P(G|W), we have used various techniques, such as dynamic time warping and template matching, to compute gesture keyboarding shape similarities.

Word-Gesture Keyboard Two Novel Functions:

(1) Command Strokes:

With our systems, the user may issue commands (such as “Copy” and “Paste”) by tracing out the command names on the keyboard starting from a designated key (e.g. a Cmd key). The system suggests the command effect as soon as the command stroke is unambiguous.

(2) Case Key:

We introduced a new key on the keyboard, the Case key (see the lower left corner of Figure 1). This key cycles through the different word case alternatives for the word just entered or preceding the text caret. The Case key uses dictionary information to intelligently support nonstandard casing convention for some words, such as “iPhone”. Since the Case key modifies the word preceding the current text

caret position (“reverse Polish”) it enables users to perform case corrections after the word is entered and only when they are actually needed.

Bibliography:

Zhai, Shumin, and Per Ola Kristensson. "The Word-gesture Keyboard: Reimagining Keyboard Interaction." Communications of the ACM 55.9 (2012): 91-101. ACM Digital Library. Web. 23 Mar. 2013. <http://dl.acm.org/citation.cfm?id=2330689>.

The blog content is created by urjnasw xkfjjkn (Xu Yan) on 23rd, March, 2013.

订阅：博文 (Atom)

urjnasw xkfjjkn's hot blog

2013年4月22日星期一

Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

Introduction:

Challenges:

Isolated Shape Classification:

Summary of Shape Localization:

Representation of Visual Parts:

Bulls-eye Feature:

Strokes Have Direction：

Making Features Rotational Invariant:

How to calculate stroke direction:

Stroke preprocessing:

Calculating distance between Bulls-eyes:

Representation of Shapes:

What is codebook?

What is match vector?

Recognition:

Steps:

Bibliography:

A Visual Approach to Sketched Symbol Recognition

A Visual Approach to Sketched Symbol Recognition

Introduction:

Approach:

Procedure:

Bibliography:

http://www.aaai.org/ocs/index.php/IJCAI/IJCAI-09/paper/viewFile/281/918

2013年4月11日星期四

ShortStraw-A Simple and Effective Corner Finder for Polylines

2013年4月9日星期二

Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes

Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes

Body Identification:

Body Comparison:

Hausdorff distance: PA is the vector of points in template. PB is the vector of points in “body”. Hausdorff distance is to calculate the maximum of the maximum D from A to B and B to A.

Modified Hausdorff distance:

Tanimoto coefficient:

About Trusses:

Trusses identification:

Trusses Comparison:

Bibliography:

A Domain_Independent System for Sketch Recognition

A Domain_Independent System for Sketch Recognition

For vertex detection:

For line detection:

For curve detection:

For arc detection:

Post-Process:

Bibliography:

2013年3月27日星期三

A few things to learn about machine learning

A few things to learn about machine learning

2013年3月24日星期日

The Word-Gesture Keyboard: Reimaging Keyboard Interaction

Hausdorff distance:

P_A is the vector of points in template. P_B is the vector of points in “body”. Hausdorff distance is to calculate the maximum of the maximum D from A to B and B to A.