File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/h94-1086_metho.xml
Size: 14,660 bytes
Last Modified: 2025-10-06 14:13:48
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1086"> <Title>On-Line Cursive Handwriting Recognition Using Hidden Markov Models and Statistical Grammars</Title> <Section position="4" start_page="432" end_page="433" type="metho"> <SectionTitle> 3. AIRLINE TRAVEL INFORMATION SERVICE: AN INITIAL 3050 WORD, 52 SYMBOL TASK </SectionTitle> <Paragraph position="0"> In the initial system, the BBN BYBLOS Continuous Speech Recognition system \[4, 5, 6\] (see Figure I) was used without modification on an on-line cursive handwriting corpus created from prompts from the ARPA Airline Travel Information Service (ATIS) corpus \[7\].</Paragraph> <Paragraph position="1"> These full sentence prompts (approximately 10 words per sentence) were written by a single subject. These sentences were then reviewed (verified) to make sure that the prompts were transcribed correctly.</Paragraph> <Paragraph position="2"> After verification, these sentences were separated into a set of 381 training sentences and a mutually exclusive set of 94 test sentences.</Paragraph> <Paragraph position="3"> The lexicon for this task consisted of 3050 words, where lowercase and capitalized versions of a word are considered distinct.</Paragraph> <Paragraph position="5"> For each sample point, an analysis program computed a two-element feature vector: the writing angle at that sample and the change in the writing angle \[2\] (see Figure 3). These time series of feature vectors were then fed into the BYBLOS system. For this task, BYBLOS quantizes the feature vectors for a sentence into 64 different clusters.</Paragraph> <Paragraph position="6"> These new time series are then used with their respective sentence transcriptions to train HMMs representing the script characters (note that the alignment of the clusters with the sentence transcriptions occurs automatically in this process). A 7-state HMM model was chosen to represent each symbol (see Figure 4). Since the penning of a script letter often differs depending on the letters written before and after it, additional HMMs are used to model these contextual effects \[8\]. Adjacent effects between two letters (bilets) are modeled as well as three letter (trilet) contexts. In a given set of sentences there may be many tfilets, up to the number of symbols cubed. However, in English only a subset of these are allowed. In the ATIS task there are 3639 different trilets in the training sentences.</Paragraph> <Paragraph position="7"> For this initial system there were 54 characters: 52 lower and upper case alphabetic, a space character, and a &quot;backspace&quot; character. The backspace character is appended onto words that contain &quot;i&quot;, &quot;j&quot;, &quot;x&quot;, or &quot;t&quot;. This character models the space the pen moves after finishing the body of the word to add the dot or the cross when drawing one of these characters.</Paragraph> <Paragraph position="8"> A statistical grammar can also be used to improve recognition performance. For this experiment, a bigram grammar (to relate pairs of words) was created using a larger set of 17209 sentences from the ATIS corpus (the 94 test sentences were not included). The resultant grammar has a perplexity of 20. Table 1 shows the word error rates for this task when doing recognition using context without the grammar (perplexity = 3050), using the grammar without context, and using both context and the grammar. Word error rate is measured as the sum of the percentage of words deleted, the percentage of words inserted, and the percentage of words that are substituted for other words in the set of test sentences.</Paragraph> <Paragraph position="9"> The data was acquired using a Momenta pentop which stored the script in a simple time series of x and y coordinates at a sampling rate of 66 Hz. The handwriting data is sampled continuously in time, except when the pen is lifted (Momenta pentops provide no information about pen movement between strokes). Because we wanted to use our speech recognition system with no modification, we decided to simulate a continuous-time feature vector by arbitrarily connecting the samples from pen-up to pen-down with a straight line and then sampling that line ten times. Thus, the data effectively became one long cfiss-crossing stroke for the entire sentence, where words run together and &quot;i&quot; and &quot;j&quot; dots and &quot;t&quot; and &quot;x&quot; crosses cause backtracing over previously drawn script (see Figure 2).</Paragraph> <Paragraph position="10"> context + no context + context + no gram. gram. gram.</Paragraph> <Paragraph position="11"> word error rate 4.2% 2.2% 1.1% As can be seen from the table, both context and a grammar are very powerful tools in aiding recognition. With no grammar but with context an error rate of 4.2% was observed. When the grammar was added and context not used, the error rate dropped to 2.2%. However, the best result used both context and a grammar for an word error rate of 1.1%. Of interest is the factors of 2 relating the error rates shown. Similar factors of 2 have also been observed in the research on the speech version of this corpus. With the best (1.1%) word error rate, only 10 errors occu~ed for the entire test set. Experimentation was suspended at this point since so few errors did not allow any further analysis of the problems in our methods.</Paragraph> <Paragraph position="12"> The above experiments demonstrated the potential utility of speech recognition methods, especially the use of HMMs and grammars, to the problem of on-line cursive handwriting recognition. Based on these good preliminary results, we embarked on a more ambitious task with a larger vocabulary and more writers.</Paragraph> </Section> <Section position="5" start_page="433" end_page="434" type="metho"> <SectionTitle> 4. WALL STREET JOURNAL: A 25,000 </SectionTitle> <Paragraph position="0"> WORD, 86 SYMBOL TASK During the past year, we have collected cursive written data using text from the ARPA Wall Street Journal task (WSJ) \[10\], including numerals, punctuation, and other symbols, for a total of 88 symbols (62 alphanumeric, 24 punctuation and special symbols, space, and backspace). The prompts from the Wall Street Journal consist mainly of full sentences with scattered article headings and stock listings (all are referred to as sentences for convenience). We have thus far collected over 7000 sentences (175,000 words total or about 25 words/sentence) from 21 writers on two GRiD Convertible pentops.</Paragraph> <Paragraph position="1"> See Figure 5 for an example of the data collected. The writers were gathered from the Cambridge, Massachusetts area and were mainly students and young professionals. Several non-native writers were included (writers whose first working language was not English).</Paragraph> <Paragraph position="2"> While the handwriting input was constrained, the rules given the subjects were simple: write the given sentence in cursive; keep the body of a word connected (do not lift the pen in the middle of a word); and do crossings and dottings after completing the body of a word. However, since many writers could not remember how to write capital letters in cursive, great leniency was allowed. Furthermore, apostrophes were allowed to be written both in the body of the word, or at the end of the word like a cross or dot. For example, the word &quot;don't&quot; could be written as &quot;dont&quot; followed by the placement of the apostrophe or &quot;don&quot;, apostrophe, and &quot;t&quot;. Overall, this task might be best described as &quot;pure cursive&quot; in the handwriting recognition literature.</Paragraph> <Paragraph position="3"> For the purposes of this experiment, punctuation, numerals, and symbols are counted as words. Thus, &quot;.&quot;, &quot;,&quot;, &quot;0&quot;, &quot;1&quot;, &quot;$&quot;, &quot;{&quot;, etc., are each counted as a word. However, apostrophes within words are counted as part of that word. Again, a capitalized version of a word is counted as distinct from the lowercase version of the word.</Paragraph> <Paragraph position="4"> While these standards may artifically inflate the word error rates, they are a simple way to disambiguate the definition of a word.</Paragraph> <Paragraph position="5"> In addition to the angle and delta angle features described in the last section, the following features were added: delta x, delta y, pen up/pen down, and sgn(x - max(x)). Pen up/pen down is 1 only during the ten samples connecting one pen stroke to another; everywhere else it is 0. Sgn(x - max(x)) is 1 only when, at that time, the current sample is the right-most sample of the data to date. Also, two preprocessing steps were used on the subjects' data. The first was a simple noise filter which required that the pen traverse over one hundredth of an inch before allowing a new sample. The second step padded each pen stroke to a minimum size of ten samples.</Paragraph> <Paragraph position="6"> At the time of this writing, samples from six subjects were used for writer dependent experiments. Three fourths of a subject's sentences were used for training with the remaining fourth used for testing (see of the data. A bigram grammar was created from approximately two million Wall Street Journal sentences from 1987 to 1989 (not including the sentences used in data collection). The results of the walter dependent tests are shown in Table 3. Substitution, deletion, insertion, and the total word error rates are included. Table 4 shows estimated character recognition error rates for each class of character: alphabetic, numeral, and punctuation and other symbols. The sum of the substituion and deletion error rates for each class is represented in this table since insertions are not directly attdbuteable to a particular class of character. However, the total character error shown incorporates insertion errors since these errors are distributed over the entire set of classes. On average, the test sets consist of 1.9% numerals, 4.1% punctuation and other symbols, and 94% alphabetics. Both aim and shs are non-native writers. A test experiment was performed without a grammar (but with context) on subject shs resulting in an error rate approximately four times the previous error rate. This result was the same ratio seen in the ATIS task.</Paragraph> </Section> <Section position="6" start_page="434" end_page="434" type="metho"> <SectionTitle> 5. ANALYSIS AND FURTHER EXPERIMENTATION </SectionTitle> <Paragraph position="0"> These results are quite startling when put in context. The BYBLOS speech system was not significantly modified for handwriting recognition, yet if handled several difficult handwriting tasks. Futhermore, none of the BYBLOS automatic optimization features were used to improve the results of any writer (or group of writers). No particular stroke order was enforced on the writers for dottings and crossings (besides being after the body of the word), and there are known inaccuracies in the transcription files. Note that a significantly larger error rate was observed for numerals and symbols than for alphabetics. Even with all insertion errors added to the estimate of the alphabetic error, the error rates for numerals and symbols are still significantly higher. One way to improve the digit recognition may subject Subst. Delet. Insert. Total ( Est. I Est. I Est. I subject ( num. ( sym. ( alpha. I total aim 1 7.1% 1 4.7% 1 .47% 1 1.4% wcd 1 5.4% ( 5.7% 1 .47% 1 1.0% ave. 1 6.2% 1 7.5% 1 .57% 1 1.4% and symbols.</Paragraph> <Paragraph position="1"> be to specifically train on common digit strings such as &quot;1989&quot;, &quot;80286&quot;, and &quot;747&quot; (presently, &quot;1989&quot; is recognized as four separate words instead of the more salient whole). Symbol recognition may be further improved by tuning the minimum stroke length in preprocessing. If the minimum stroke length is too small, a period or comma may be completely ignored due to too few samples comprising the symbol. However, if the minimum stroke length is too large, insertion errors may occur. A better solution would allow a varying number of states for different letter models. Thus, complicated letters like &quot;G&quot; would be given 7 to 11 states while a period (or letter dotting) would be given 3. This method may improve all classes of recognition. Another known improvement deals with apostrophes. Presently, apostrophes are handled incorrectly by expecting only the intra-word stroke version. By expecting both standard stroke orders in words with apostrophes, the system can increase the recognition accuracy of these words significantly. By fixing these problems and using BYBLOS's optimizing features, a 10-50% reduction in word error rate may occur.</Paragraph> <Paragraph position="2"> writer. Supplying such a large amount of training text may be tiring for just one writer. However, there is some evidence that not as many training sentences per writer are needed for good performance. Furthermore, if good word error rates for the cursive dictation task can be assured, a writer may be willing to spend some time writing sample sentences. A possible compro&se is to create a writer independent sytem which can then be adapted to a particular writer with a few sample sentences. With this level of training it may be possible to relax the few restrictions made on the writers in this experiment. However, a more robust feature set may be necessary for creating the writer independent system.</Paragraph> <Paragraph position="3"> A practical issue in handwriting recognition is the speed of the recognizer. Approximately 20 seconds per word are required for recognition in the present experimental system. However, we suspect that real-time performance is attainable by increasing the efficiency of the code and porting the decoder to a more powerful hardware platform. Future experiments will be directed at further reduction of the error rates for the writer dependent task. More writers may also be incorporated into the test. In addition, writer independent and writer adaptive systems may be attempted. Scalability of the number of training sentences will be addressed along with possible changes to the BYBLOS system to better accomodate handwriting. Adapting the system to off-line handwriting recognition may also be explored at a later date.</Paragraph> </Section> class="xml-element"></Paper>