urjnasw xkfjjkn: 一月 2013

2013年1月27日星期日

Specifying Gestures by Example

Blog on paper: Specifying Gestures by Example

Summary:

Although the creative HCI by means of gestures is appealing, it is not rapidly developed due to the difficulty to create gestures and the code complexity of gesture recognizer. Therefore, this paper primarily talks about GRANDMA (Gesture Recognizers Automated in a Novel Direct Manipulation Architecture), with which gesture recognizer can be created automatically from example gestures, removing the need for complicated hand coding.

Each recognizer is rapidly trained from a small number of examples of each gesture. The attributes of these gestures should be meaningful and be considered in gesture classification.

Then this paper refers to GDP, which is a gesture-based drawing program built with GRANDMA. All gestures in GDP are arranged in hierarchy view classes, as shown in Figure 1.

Figure 1. GDP view classes and associated gesture sets

Next, it tells us how to add a new gesture into GDP’s initial click-and-drag interface.
(1)Create a new gesture handler and associate it with the GraphicObjectView class in Figure 1. The gesture handler window is shown in Figure 2.

Figure 2. 4 new gesture class in gc6 – gc9

(2) Enter training examples for each new gesture class. Figure 3 shows the process of entering 7 training examples for “Delete” gesture class( gc9 ). Typically, 15 training examples per gesture class is perfect.

Figure 3. 7 training examples for “delete” gesture

After introducing how to add new gestures, the paper begins to talk about gesture recognition. The essential of gesture recognition is given a input gesture g, determine from a set of C gesture classes, to which g belongs.

Each gesture is represented by an array of sample points:

g_p = ( x_p, y_p, t_p ), 0<=p<=P,

P: total number of sample points, t_p: the time specific point is drawn

Feature is an important aspect to classify gestures. There are three principles for the selection of features:

(1)They should be incrementally computable in constant time per input point.

(2)They should be meaningful so that it can be used in gesture semantics and for recognition.

(3)There should be enough but not too many features to provide differentiation for all gestures.

The following 13 features, shown in Figure 4, are a good selection to identify strokes:

A few noticeable points:

(1)cos(α) and sin(α) are used as a feature instead of the α itself to avoid a discontinuity as the angle passes through 2pi and warps to 0.

(2)The initial angle features are computed from the first and third mouse point because the result is generally less noisy than when computed from the first two points. Nowadays, because of the enhancement in sampling precision, we may take the first and the fifth point as measurement points.

Gesture Classification formula:

Training:

The purpose of training is to determine the weight of Wc0 and Wci

from example gestures.

Rejection:

The purpose of rejection is to rejecting ambiguous gestures and outliers.

Eager recognition and multi-finger recognition:

Eager recognition: the recognition of gestures as soon as they are unambiguous. In this recognition, classification occurs on every mouse point.

Multi-finger recognition: the recognition of gestures made with multiple fingers simultaneously. By treating the multi-finger input as multi-path data, the single-stroke recognition algorithm may be applied to each of the path individually and the result combined to classify the multi-path gesture.

Bibliography:

Specifying Gestures by Example, Dean Rubine, Information Technology Center, Carnegie Mellon University, Pittsburgh, PA, SIGGRAPH 91, Proceedings of the 18th annual conference on Computer graphics and interactive techniques, Pages 329-337, ACM New York, NY, USA 1991. http://dl.acm.org/citation.cfm?id=122753

Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes

Gestures without Libraries, Toolkits or Training:

A $1 Recognizer for User Interface Prototypes

Summary:

This paper primarily introduces $1 recognizer, which is an easy, cheap, and usable recognizer consisting of about 100 lines. Although it is a simple recognizer, it can still achieve excellent recognition accuracy, even better than more complicated recognizer, such as Rubine recognizer.

The $1 gesture recognizer:

The ultimate aim of $1 recognizer is to determine which set of previously recorded template points Ti most closely matches the set of candidate points C from user’s gesture result. However, this is not easy with a few concerns between templates and candidate, such as different size, angle, drawing speed and so forth. Keep these concerns in mind and the desire for simplicity, there are several requirements to $1 recognizer:

(1)Be resilient to variations in sampling due to movement speed or sensing;

(2)Support optional and configurable rotation, scale, and position invariance;

(3)Require no advanced mathematical techniques (e.g., matrix inversions, derivatives, integrals);

(4)Be easily written in few lines of code;

(5)Be fast enough for interactive purposes (no lag);

(6)Allow developers and application end-users to “teach” it new gestures with only one example;

(7)Return an N-best list with sensible [0..1] scores that are independent of the number of input points;

(8)Provide recognition rates that are competitive with more complex algorithms, such as Dynamic Time Warping and Rubine.

With these requirements in mind, $1 recognizer is realized with a simple 4-step algorithm.

A simple 4-step algorithm:

(1) Resample the Point Path

The purpose of this step is to eliminate the influence of movement speed to the number of input points in a gesture. For example,

Figure 1 The relationship between articulation speed and number of input points

Usually, we resample gestures such that the path defined by their original M points is defined by N equidistantly spaced points( 32<=N<=256 ).

Figure 2 The effect of different number of resampling point

Figure 2 shows that when N = 32, the sampling points are too sparse to identify path. However, when N = 128, it is so dense that will add time to path comparisons.

A common way to solve this problem is:

(1)Calculating the total length of the M-point path.

(2)Dividing the length by (N-1) to obtain the length of each increment L between N new points.

(3)Step through the path such that when the distances covered exceeds L, add a new point.

(2)Rotate Once Based on the “Indicative Angle”

We define “Indicative Angle” as the angle formed by the centroid and starting point of a gesture. Find this angle and rotate the gesture so that the angle is at 0 degree. The process just like the following figure shows,

Figure 3 Rotate gesture so that indicative angle is zero

(3)Scale and Translate:

Considering the difference in the size of candidate and template, the candidate will be scale into a reference square.

After scaling, the gesture is translated to a reference point so that the centroid is exactly at the reference point.

(4) Find the Optimal Angle for the Best Score:

A key to determine match or not between candidate and template is the following equation:

It defines the path-distance between candidate and template. The template with the minimum path-distance to candidate is the result of the recognition.

After di is computed, we converted it into a range [0,1] score using:

The variable size is the length of a side of the reference square.

An analysis of Rotation Invariance:

In step 2, adjusting “indicative angle” to 0 degree cannot guarantee the minimum path-distance between candidate and template. It is just an approximation. However, there exist a few approaches to help us achieve the minimized path-distance. One is “seed and search”, this brute force approach simply rotate candidate by +1 degree for all 360 degree and take the best result. An improvement to this approach is called “hill climbing”. It rotates candidate by +/- 1 degree for as long as candidate’s path-distance to template decreases. Nevertheless, hill climbing is not so effective for dissimilar ones. Another improvement called Golden Section Search guarantees it will finish after exactly 10 iterations, regardless of whether or not two gestures are similar.

Figure 4 Path-distance as a function of angular rotation away from zero indicative angle for similar and dissimilar gestures

Limitation of the $1 recognizer:

(1)$1 recognizer cannot distinguish gestures whose identities depend on specific orientations, aspect ratios, or locations. For example, separating squares from rectangles, circles from ovals, or up-arrows from down-arrows is not possible without modifying the algorithm.

(2)horizontal and vertical lines are abused by non-uniform scaling

(3)$1 recognizer does not use time, so gestures cannot be differentiated on the basis of speed.

Conclusion from evaluation:

(1)Medium articulation speed is more accurate than slow and high articulation speed for Rubine, $1, and DTW.

(2)The more number of templates, the more accurate the recognition will be.( Rubine will have most significant improvement with the increase of template )

Bibliography:

Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes, Jacob Wobbrock and Yang Li, The Information School, University of Washington, Seattle, Washington, Andew D Wilson, Microsoft Research, Redmond, Washington, UIST 2007, Proceedings of the 20th annual ACM symposium on User Interface Software and Technology, Pages 159-168, ACM New York, NY, USA 2007
http://dl.acm.org/citation.cfm?id=1294238

2013年1月22日星期二

Blog on Sketchpad---A man-machine graphical communication system

Ivan Sutherland's main accomplishments in terms of modern day topics of current research in computer science:

(1)Ivan Sutherland created a new way for people to interact with computers through drawing on the screen and some ancillary buttons, switches and knobs. The popular touch screen/pad control of a bunch of electronic devices nowadays is derived from the idea of Ivan Sutherland.

(2)His Sketchpad can be considered to be the ancestor of modern computer-aided drafting (CAD) programs as well as a major breakthrough in the development of computer graphics in general. For example, the Graphic User Interface was derived from the Sketchpad as well as modern object oriented programming.

What I learnt through this paper:

(1) Sketchpad is a computer program that realizes human-computer interaction through line drawings instead of cumbersome typed words.

(2) With sketchpad, we can draw as we wish with the help of a light pen, turning knobs and function switches.

(3) Sketchpad’s usefulness:

(3.1)For storing and updating drawings

(3.2)For gaining scientific or engineering understanding of operations that can be described graphically

(3.3)For circuit simulations as a topological input device

(3.4)For highly repetitive drawings

(4) Three general elements of Sketchpad system, they are sub-pictures, constraints and definition copying. Sub-pictures are symbols drawn by users. Constraints are kind of relationship between picture parts( such as perpendicular, parallel, of equal size, lying on the other parts and so forth ). Definition copying means composite operation.

(5) Structures:

(4.1)Ring structure:

In my opinion, this structure is just like a circular double linked list. By following pointers in the line block, not only will the end points of a line segment be found, but all the line segments terminating on a particular point will be found by following a string of pointers that starts within the point block.

(4.2)Generic structure:

The n-component elements( such as line blocks ) are collected under a generic header, which contains all the information that makes this type different from other types. It is a hierarchy structure.

(6) Uses of light pen:

(5.1)Positioning picture parts: indicate the coordinate a new picture part should be.

(5.2)Demonstrative parts: point to existing picture parts for a possible change. To accurately pointing to a picture part, the part should not be too far from the center of the light pen in order to be identified by the sketchpad system.

(7) The information for display spot:

Each display spot has 36 bits information. 20 of them give the coordinates of the spot in display system. The rest give the address of the n-component element which is responsible for adding that spot to the display. All the spots in a line are tagged with the ring structure of that line and all the spots in a sub-picture are tagged as belonging to that sub-picture.

(8) The difficulty of magnification of pictures:

After magnifying the picture, only a portion of the original picture can be shown on screen. Sktechpad system will compute which part will appear on screen and generate relevant display spots. The difficulty lies on the edge detection problem.

(9) Display of abstraction:

The abstraction that can be displayed is either constraints or the value of a set of digits.

Constraints: the fact that the start and end points of a circle arc should be equidistant from the circle center.

(10) Recursive functions of Sketchpad

(9.1)Expansion of sub-pictures: sub-pictures within sub-pictures to as many levels as desired.

(9.2)Recursive deletion: If one thing on which another thing depend is deleted, the dependent things will also be deleted.

(9.3)Recursive merging:

If two things of the same type which are independent are merged, a single thing of that type results, and all things which depended on either of the merged things depends on the result of the merger.

(11) Other functions:

Copy function: Once a sub-picture is copied, it loses its identity as a single part and the individual parts of the original sub-picture may be deleted at will.

(12) Constraints satisfaction:

The advantage of Sketchpad system over traditional pencil work is, once a constraint is added, the already drawn parts will be automatically satisfied by the computer to make the drawing take the exact shape desired.

(13) Constraint type definition:

A subroutine that computes the existing values of the variables of a particular constraint of that type, and the error introduced into the system by that particular constraint.

(14) Engineering and Artistic application of Sketchpad:

Patterns, linkages, dimension lines, bridges, artistic drawings and so forth.

Bibliography:

Sutherland, Ivan E. "Sketch Pad a Man-machine Graphical Communication System." Sketch Pad a Man-machine Graphical Communication System DAC '64 Proceedings of the SHARE Design Automation Workshop (1964): 6.329-.346. Print

2013年1月15日星期二

Introduction

Email: yanxu90@neo.tamu.edu

Class Standing: first year master

1. Why am I taking this class?

---Some of my friends recommended me that I could really learn something interesting and beneficial to my future career in this course. What’s more important, Professor Hammond is nice and easy-going.

2. What experience do you bring to this class?

---I have some programming experience on C, Java, Android, HTML and JSP in my undergraduate studies. Last semester, I successfully completed the study of CSCE 614 Computer Architecture and CSCE 629 Analysis of Algorithms.

3. What is my professional life goal?

---My ultimate professional life goal is to become a creative and inspirational technical director in a Fortune 500 companies. But everything starts from hard-working junior programmer.

4. What is my personal life goal?

---My personal life goal is to travel all over the world with my family, tasting their cuisine, enjoying their holidays and learning as many languages as I can.

5. What do I want to do after my graduation?

---I prefer to work in industry if I do not find a research topic that interests me.

6. What do I expect to be doing in 10 years?

---I wish I will have already become a project architect by then, closer to my ultimate goal.

7. What do you think will be the next biggest technological advancement in computer science?

---A reform in human-computer interaction, maybe. Just as depicted in sci-fi movies, we can click, drag and control by gestures via a virtual screen that identify our instructions through sensors. We do not need PC any more. It is just my wild imagination.

8. If you could travel back in time, who would you like to meet and why?

---I want to meet Dennis Ritchie, the father of C language. The reason is quite simple. C is the first programming language I learnt (I am still learning). I want to figure out what difficulties he met and how he overcame them in the process to create such a popular language with a history of more than 40 years.

9. Describe your favorite shoes and why they are your favorite?

---I like white sneakers. White is my favorite color. It can easily suit a lot of clothes. Sneakers looks casual and is OK for every season (Quite convenient).

10. If you could be fluent in any foreign language that you're not already fluent in, which one would it be and why?

---Spanish. I want to travel to Spanish after my graduation. My favorite soccer team is in Spanish. It is widely spoken in Latin America.

11. Give some interesting fact/story about yourself.

---I like collecting cool beverage pop cans.

订阅：博文 (Atom)

urjnasw xkfjjkn's hot blog