urjnasw xkfjjkn's hot blog

2013年3月27日星期三

A few things to learn about machine learning


A few things to learn about machine learning

(1)  Learning = Representation + Evaluation + Optimization
Representation: A classifier must be represented in some formal language that the computer
can handle. Hypothesis space: A set of classifiers that a learner can learn.
Evaluation: An objective function used to distinguish good classifiers from bad ones.
Optimization: Search among the classifiers in the language for the highest-scoring one.
(2)  It’s generalization that counts
The fundamental goal of machine learning is to generalize beyond the examples in the training
set. It is a huge mistake to train and test on the same data. But holding data will reduce the amount of
available for training. One solution to this contradiction is cross-validation: randomly dividing your
training data into (say) 10 subsets, holding out each one while training on the rest, testing each learned
classifier on the examples it did not see, and averaging the results to see how well the particular
parameter setting does .
(3)  Data alone is not enough
Every learner must embody some knowledge or assumptions beyond the data it is given in order
to generalize beyond it. This is called induction. It turns a small amount of input knowledge into
a large amount of output knowledge.
A corollary of this is that one of the key criteria for choosing a representation is which kinds of
knowledge are easily expressed in it.
(4)  Overfitting has many faces
Overfitting problem occurs when we try to determine classifier with insufficient knowledge and
data.
One way to understand overfitting is by decomposing generalization error into bias and variance.
Bias is a learner’s tendency to consistently learn the same wrong thing. Variance is the tendency
to learn random things irrespective of the real signal.
(5)  Intuition Fails in high Dimensions
Another big problem after Overfitting is curse of dimensionality. Basically, it means generalizing
gets harder as your dimension s get larger. If you have 100 dimensions and 1 trillion training
samples, you only cover 10 to the negative eighteenth power of the possible combination of the
input space. That’s tiny! Sometimes more dimensions mean more noise.
(6)  Theoretical Guarantees are not What they seem
In machine learning field, the most common theoretical guarantee is a bound on the number of
examples needed to ensure good generalization. One of the major developments of recent
decades has been the realization that in fact we can have guarantees on the results of induction,
particularly if we are willing to settle for probabilistic guarantees.
(7)  Feature engineering is the Key
The most important factor is the features used. Learning is easy if you have many independent
features that each correlate well with the class.
(8)  A dumb algorithm with lots and lots of data beats a clever one with modest amounts of it.
The sufficiency of examples is necessary.
(9)  Learn many models, not Just one
The best learner varies from application to application. If instead of selecting the best variation
found, we combine many variations, the results are better—often much better—and at little
extra effort for the user. Bagging is an example of this, in which “we simply generate random
variations on the training set by resampling, learn a classifier on each, and combine the results
by voting. This works because it greatly reduces variance while only slightly increasing bias.”
(10)  Simplicity Does not imply Accuracy
Occam’s razor states that in machine learning, given two classifier with the same training error,
the simpler one has the lowest test error. However, “no free lunch” theorems imply it can-not
be true. Simplicity is a preference that is not related to accuracy.
(11)  Just because a function can be represented does not mean it can be learned
Don’t be lured in by claims that every problem can be representable by a certain algorithm. It
may be representable but not learnable given your training set.
(12)  Correlation Does Not Imply Causation
Correlation is potentially causation but should not be stated as such without proof

Bibliography:
Pedro Domingos, A few useful things to know about machine learning Communications of the ACM
CACM Homepage archive Volume 55 Issue 10, October 2012 Pages 78-87

2013年3月24日星期日

The Word-Gesture Keyboard: Reimaging Keyboard Interaction


Summary:
This paper primarily talks about Word-Gesture Keyboard, a creative HCI method through keyboard. It is on average considered more preferred, more fun, and less physically but more visually demanding. It is designed with following three aims in mind:
(1)        Fast input speed
(2)        Minimal recognition load on new users
(3)        Improved efficiency with users getting more familiar with Word-Gesture Keyboard
Word-Gesture Keyboard will allow user to write each and every word via a word gesture. Here, the “word” is not limited in lexicon. It can be tokens defined by arbitrary strings of characters, such as “Gmail”.
Word-Gesture Keyboard Feasibilities:
One big problem of Word-Gesture Keyboard is most word gestures will run across letters that are not part of the word intended. Fortunately, it can be solved with statistical regularities of natural language, which indicates some character sequences are more likely than others and most simply don’t exist as legitimate words. It implies that all valid letter combinations will form a finite set that can be captured in a language model. With this theory breakthrough, all possible words can be represented geometrically on a given key-board layout as word gestures and matched against users’ input gestures. Just as mentioned above, “word” here has a more generalized meaning. It can be rare names, jargons, email addresses, passwords, etc. We name this kinds of words Out of Vocabularies (OOV). In our Word-Gesture Keyboard, OOV letter sequences can always be entered by typing the individual letter keys. If these OOV sequences are frequently used then they may be added to the system’s list of recognized words, either manually or automatically. An addition to the solution is N-best suggestions. When “tip top” conflict occurs, they can be addressed by manual selection from the alternative N-best suggestions, or automatically according to word context.
Word-Gesture Keyboard Efficiency:
One Continuous Movement: In comparison to tapping-based touchscreen keyboards, gesture keyboards do not require up and down movements for each letter. Therefore, it is undoubtedly faster. The speed advantage of a single-stroke word gesture input can also be understood in motor control modeling terms. Tapping individual letters in a word can be viewed as a sequence of discrete target pointing tasks, each can be modeled by Fitts’ law (showed below).
tk,k + 1 is the time duration from tapping the kth letter (key) to the (k+ 1)th letter in the word; Dk,k + 1 is the movement distance from the kth letter to the (k+ 1) letter; and Sis the size of the target key. a and b are two constants of Fitts’ law. ID is called Fitts’ index of difficulty, measured in bits.
Experts’ conclusion is goal-crossing task is faster than tapping on the same sized targets as long as ID is less than 4 bits. Here, “goal” means a letter key needed in a word.
Auto word ending and spacing: each time a user lifts the finger from the touch surface, a word and a space are entered. Not having to enter a space character after each word is another efficiency advantage of a gesture keyboard.
Error-tolerance: Error tolerance allows the user to cut corners, to be inaccurate but fast.
One finger operation: This is the only aspect that Word-Gesture Keyword is not as good as two-handed typing. This is particularly true when the keyboard layout is the conventional QWERTY on which consecutive letters of a word tend to alternate between the left and right side of the keyboard. With two handed-typing, when one hand strikes one letter the other hand can, to some degree, move towards the next letter in parallel.
Word-Gesture Keyboard Ease of Use:
First, typing on a keyboard is a familiar text input method to most, if not all computer and smartphone users.
Second, drawing or doodling is a fun and easy action that even children enjoy doing.
Third, the user does not have to have learned any gestures before using a word-gesture keyboard.
Word-Gesture Keyboard Ease VS Efficiency:
The two types of behavior are two ends of a continuum. Our main behavioral theory of word shorthand gesture key-boards is that their use automatically shifts from the ease end (visual tracing) to the efficient end (recall gesturing).
Importantly, we do not expect the users to gesture every word without looking at the keyboard. Due to the Zipf’s law effect, a small number of words are used disproportionally frequently and their stroke patterns are memorized early. Longer and less common words are typically made of common fragments whose shapes can be quickly remembered. An important word-gesture key-board property is that it does not force the user into either “mode”. The user gradually progresses from the easy end to the more efficient end in use. In this sense, a word-gesture keyboard is a “progressive user interface.”
Word-Gesture Keyboard Gesture Recognitions:
where P(G|W) is the likelihood of W’s word gesture matching a user’s input gesture G, and P(W) reflects the system’s estimate of prior probability that the word W is the user’s intended word. The denominator P(G) only depends on the user’s gesture and is invariant during the search.
The search for the user’s intended word is thus the product of two model estimates. The probability P(G|W) reflects the gestural model and the probability P(W) reflects the language model.
In order to estimate P(G|W), we have used various techniques, such as dynamic time warping and template matching, to compute gesture keyboarding shape similarities.
Word-Gesture Keyboard Two Novel Functions:
(1)        Command Strokes:
With our systems, the user may issue commands (such as “Copy” and “Paste”) by tracing out the command names on the keyboard starting from a designated key (e.g. a Cmd key). The system suggests the command effect as soon as the command stroke is unambiguous.
(2)        Case Key:
We introduced a new key on the keyboard, the Case key (see the lower left corner of Figure 1). This key cycles through the different word case alternatives for the word just entered or preceding the text caret. The Case key uses dictionary information to intelligently support nonstandard casing convention for some words, such as “iPhone”. Since the Case key modifies the word preceding the current text
caret position (“reverse Polish”) it enables users to perform case corrections after the word is entered and only when they are actually needed.
Bibliography:
Zhai, Shumin, and Per Ola Kristensson. "The Word-gesture Keyboard: Reimagining Keyboard Interaction." Communications of the ACM 55.9 (2012): 91-101. ACM Digital Library. Web. 23 Mar. 2013. <http://dl.acm.org/citation.cfm?id=2330689>.

The blog content is created by urjnasw xkfjjkn (Xu Yan) on 23rd, March, 2013.

2013年3月2日星期六

Early Processing for Sketch Understanding


Sketch Based Interfaces: Early Processing for Sketch Understanding
Introduction:
This paper comes from the idea of utilizing the powerful processing capability of a computer to realize an easy and flexible UI for sketch recognition. In other words, this interface should give users a feeling as if they are drawing in a paper. To realize this target, the first problem is how to convert digitalized pen strokes into intended geometric objects. This is done by interpreting the pixels of user sketch and completing three-step processing.

Three-step processing:
(1)Approximation:
Aim: Match the most basic geometric primitives ( lines and curves ) to a given set of pixels.
Vertex detection:
Mark corners of sketch with vertices.
(1.1)How to determine vertices in sketch?
---Curvature: Change in direction with respect to arc length.
Those points with a maximum of the absolute value of curvature are vertices.
In Figure 1, the points in red circle are vertices
---Speed:
Those points with a minimum of speed are vertices.
In Figure 1, the points in green circle are vertices.


Figure 1 Direction, Curvature and Speed graph for Stroke
   To avoid the bad influence of noise in data, the authors introduce Average  Based Filtering. In Figure 2 and Figure 3, the red line is threshold. We are looking for maxima of curvature only where the curvature is already high and minima of speed only where the speed is already low. Thresholds are selected empirically. For Curvature Graph, it is the mean value. However, for Speed Graph, it is 90% of the mean value.
Figure 2 Speed Graph for Square with threshold
Figure 3 Curvature Graph for Square with threshold
(1.2)Generating Hybrid Fits:
       Computing vertex certainties:
      Our certainty metric for a curvature candidate vertex vi is the scaled magnitude of the curvature in a local neighborhood around the point, computed as |di−k − di+k|/L. Here L is curve length between points Si-k and Si+k. k is a small integer defining the neighborhood size around vi. The certainty metric for a speed fit candidate vertex vi is a measure of the pen slowdown at the point, 1 − vi/vmax. vmax is the maximum pen speed in the stroke. The     certainty values are normalized to [0, 1]. 
      Generating a set of hybrid fits:
      The initial hybrid fit H0 is the intersection of Fd (curvature data) and Fs (speed data). A succession of additional fits is then generated by appending to Hi the highest scoring curvature and speed candidates not already in Hi. To do this, on each cycle we create two new fits: Hi=Hi+vs (i.e., Hi augmented with the best remaining speed fit candidate) and Hi= Hi+vd (i.e., Hi augmented with the best remaining curvature candidate).
      Selecting the best fit:
     We set an error upper bound and designate as our final fit Hf, the Hi with the fewest vertices that also has an error below the threshold.
Handling curves:
Approximate curved region with Bezier curves, defined by two end points and two control points. Let u = si, v = sj, i < j be the end points of the part of S to be approximated with a curve. We compute the control points as:
Figure 4 Control points calculation formula
Where t1 and t2 are the unit length tangent vectors pointing inwards at the curve segment to  be approximated. The 1/3 factor in k controls how much we scale t1 and t2 in order to reach the control points; the summation is simply the length of the chord between Si and Sj.
(2)Beautification:
Modifies the output of the approximation layer to make it visually more appealing without changing its meaning, and secondarily aid the third phase, basic recognition.
Figure 5 The effect of Beautification
(3)Basic Recognition:
The recognition of the most basic objects that can be built from the line segments and curve segments produced so far. Eg. simple geometric objects( ovals, circles, rectangles and squares )

Bibliography:


Tevfik Metin Sezgin, Thomas Stahovich, Randall Davis, Proceeding PUI '01 Proceedings of the 2001 workshop on Perceptive user interfaces Pages 1-8
http://faculty.cs.tamu.edu/hammond/courses/SR/papers/Sezgin/Sezgin2001Sketchbased.pdf