File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-2118_metho.xml
Size: 15,489 bytes
Last Modified: 2025-10-06 14:12:08
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-2118"> <Title>Parsing Noisy Sentences</Title> <Section position="3" start_page="0" end_page="564" type="metho"> <SectionTitle> 2. The Parsing Algorithm </SectionTitle> <Paragraph position="0"> The grammar we are using is an Augmented Context-Free Grammar whose terminal symbols are phonemes rather than words. That is, the grammar includes rules like nondeterminism and ambiguity with a graph-structured * stack. Tomita also showed that it can be used for a word lattice parsing fromita 1986/. Our algorithm here is based on Tomita's parsing algorithm.</Paragraph> <Paragraph position="1"> A very simple example grammar is shown in Figure 2-1, and its LR parsing table, compiled automatically from the grammar, is shown in Figure 2-2.</Paragraph> <Paragraph position="2"> (1) S --> NP PD (2) S --> N (3) S --> PD (4) NP--> N P (5) N --> m e (6) N--> i (7) P --> g a (8) PD --> i t a i Grammar symbols of lower case characters are terminals. The Generalized LR parsing algorithm is a table driven shift-reduce parsing algorithm that can handle arbitrary context-free grammars in polynomial time. Entries &quot;s n&quot; in the action table (the left part of the table) indicate the 3. The run-time grammar, which contains both syntax and semantics, is compiled automatically from more abstract formalisms: the Functional Grammar formalism for syntax and frame representation for semantics. For more discussions on this UniversaIParser Architecture, see fromita 1987a\]. action &quot;shift one word frominput buffer onto the stack and go to state n&quot;. Entries &quot;r n&quot; indicate the action &quot;reduce constituents on the stack using rule n&quot;. The entry &quot;acc&quot; stands for the action &quot;accept&quot;, and blank spaces represent &quot;error&quot;. The goto table (the right part of the table) decides to which state the parser should go after a reduce action. While the encountered entry has only one action, parsing proceeds exactly the same way as LR parsers, which are often used in compilers of programming languages. When there are multiple actions in one entry called conflicts, all the actions are executed in parallel with the graph-structured stack. We do not describe the Generalized LR parsing algorithm in greater detail, referring the reader to /Tomita 1985/,/Tomita 1986/, fromita 1987b/.</Paragraph> <Paragraph position="3"> 2.2. Handling altered, extra, and missing phonemes To cope with altered, extra and missing phonemes, the parser must consider these errors as it parses an input from left to right. While the algorithm described in the previous subsectio n cannot handle these noisy phenomena, it is well suited to consider many possibilities at the same time, and therefore, it can be relatively easily modified to handle such noisy phenomena as the following.</Paragraph> <Paragraph position="4"> * Altered phonemes -- Each phoneme in a phoneme sequence may have been altered and thus may be incorrect. The parser has to consider all these possibilities. We can create a phoneme lattice dynamically by placing alternate phoneme candidates in the same location as the original phoneme. Each possibility is then explored by each branch of the parser, Not all phonemes can be altered to any other phoneme~ For example, while/o/can be mis-recognized as /u/,/i/can never be mis-recognized as/o/. This kind of information can be obtained from a confusion matrix, which we shall discuss in the next section. With the confusion matrix, the parser does not have to exhaustively create alternate phoneme candidates.</Paragraph> <Paragraph position="5"> * Extra phonemes -- Each phoneme in a phoneme sequence may be an extra, and the parser has to consider these possibilities. We have one branch of the parser consider an extra phoneme by simply ignoring the phoneme. The parser assumes at most one extra phoneme can exist between two real phonemes, and we have found the assumption quite reasonable and safe.</Paragraph> <Paragraph position="6"> * Missing phonemes -- Missing phonemes can be handled by inserting possible missing phonemes between two real phonemes. The parser assumes that at most one phoneme can be missing between two real phonemes.</Paragraph> <Section position="1" start_page="561" end_page="564" type="sub_section"> <SectionTitle> 2.3. An Example </SectionTitle> <Paragraph position="0"> In this subsection, we present a sample trace of the parser.</Paragraph> <Paragraph position="1"> Here we use the grammar in Figure 2-1 and the LR table in Figure 2-2 to try to parse the phoneme sequence &quot;ebaitaai&quot; represented in Figure 2-3. (The right sequence is &quot;megaitai&quot; which means &quot;I have a pain in my eye.&quot;)</Paragraph> <Paragraph position="3"> In this example we make the following asumptions for altered and Z aissing phonemes.</Paragraph> <Paragraph position="4"> */i/may possibly be mis-recognized as/e/.</Paragraph> <Paragraph position="5"> */e/may posvibly be mis-recognized as/a/.</Paragraph> <Paragraph position="6"> ./g/may possibly be mis-recognized as/b/.</Paragraph> <Paragraph position="7"> */m/may be missed in the output sequence with a higher probability.</Paragraph> <Paragraph position="8"> Now we begin parsing: first an initial state 0 is created. The action table indicates that the initial state is expecting &quot;m&quot; and 'T' (Figure 2-4). Since the parsing proceeds strictly from left to right, the parser looks for the missing phoneme candidates between the first time frame 1 - 2. (We will use the term T1, T2 .... for representing the time 1, time 2 .... in Figure 2-3.) Only the missing phoneme &quot;m&quot; in this group is applicable to state 0. The new state number 5 is determined from the action table(Figure 2-5).</Paragraph> <Paragraph position="9"> The next group of phonemes between T2 and T3 consists of the %&quot; phoneme in the phoneme sequence and the altered candidate phonemes of &quot;e&quot;. In this group %&quot; is expected by state 5 and 'T' is expected by state 0(Figure 2-6). After &quot;e&quot; is taken, the new state is 12, which is ready for the action &quot;reduce 5&quot;. Thus, using the rule 5(N -- > m e), we reduce the phonemes &quot;m e&quot; into N. From state 0 with the nonterminal N, state 2 is determined from the goto table.</Paragraph> <Paragraph position="10"> The action table, then, indicates that state 2 has a multiple entry, i.e., state 2 is expecting &quot;g&quot; and ready for the reduce action(Figure 2-7). Thus, we reduce the nonterminal N into S by rule 2(S --> N), and the new state number 6 is determined fl'om the goto table(Figure 2-8). The action table indicates that state 6 is an accept state, which means that &quot;m e&quot; is a successful parse. But only the first phoneme &quot;e&quot; of the input sequence &quot;ebaitaai&quot; is consumed at this point. Thus we discard this parse by the following constraint* \[Constraint 1\] The successful parse should consume the phonemes at least until the phoneme just before the end of the input sequence.</Paragraph> <Paragraph position="11"> Note that only the parse S in Figure 2-8 is ignored and that the nonterminal N in Figure 2-7 is alive.</Paragraph> <Paragraph position="12"> Now we return to the Figure 2-6 and continue the shift action of 'T'. After &quot;i&quot; is taken, the new state 4 is determined from the action table. This state has a multiple entry, i.e. state 4 is expecting &quot;t&quot; and ready for the reduce action. Thus we reduce &quot;i&quot; into N by rule 6. Here we use the local ambiguity packing technique, because the reduced nonterminal is the same, the starting state is 0 for both, and the new state is 2 for both. Thus we do not create the new nonterminal N.</Paragraph> <Paragraph position="13"> Now we go on to the next group of phonemes between T3 and T4. Only &quot;m&quot; is applied to the initial state(Figure 2-9). The next group of phonemes between T4 and T5 has one applicable phoneme, i.e. an altered phoneme candidate &quot;g&quot; to state 2. After &quot;g&quot; is taken, the new state 7 is determined from the action table (Figure 2-10).</Paragraph> <Paragraph position="14"> The next group of phonemes between T5 and T6 has only one applicable phoneme; a missing phoneme candidate &quot;m&quot; to stateO. Here we can introduce another constraint which discards this partial-parse.</Paragraph> <Paragraph position="15"> \[Constraint 2\] After consuming two phonemes of the input sequence, no phonemes can be applied to the initial state 0. This constraint is natural because it is unlikely that more than two phonemes are recorded before the actual beginning phoneme for our speech recognition device.</Paragraph> <Paragraph position="16"> The next group of phonemes between T6 and T7 has two applicable phonemes, i.e. the output phoneme &quot;a&quot; to state 7 and the altered phoneme candidate &quot;e&quot; to state 5. After &quot;a&quot; is taken, the new state 7 is ready for the reduce action.</Paragraph> <Paragraph position="17"> Thus, we reduce &quot;g a&quot; into P by rule 7 (Figure 2-11). The new state 8 is determined by the goto table, and is also ready for the reduce action. Thus we reduce &quot;N P&quot; into NP by rule 4 (Figure 2-12). The new state is 3. In applying &quot;e&quot;, there are two &quot;state 2&quot;s: one is &quot;m&quot; between T1 and T2; the other one is &quot;m&quot; between T3 and T4.'Here we can introduce a third constraint which discards the former partial-parse.</Paragraph> <Paragraph position="18"> \[Constraint 3\] A shift action is not applied when ,the distance between the phoneme and the applied (non)terminal is more than 4. (This distance contains at least one real phoneme.) Figure 2-13 shows the situation after &quot;e&quot; is applied. The parsing continues in this way, and the final situation is shown in Figure 2-14. As a result, the parser finds two successful parses; &quot;megaitai&quot; and &quot;igaitai&quot;(which means &quot;I have a stomachache.&quot;) 3. Scoring and the Confusion Matrix There are two main reasons why we want to score each parse: first, to prune the search space by discarding branches of the parse whose score is hopelessly low; second, to select the best sentence out of multiple candidates by comparing their scores. Branches of the parse which consider fewer altered/extra/missing phonemes should be given higher scores* Whenever a branch of the parse handles an altered/extra/missing phoneme, a specific penalty is given to the branch. Scoring accuracy can improve with the confusion matrix.</Paragraph> <Paragraph position="19"> Figure 3-1 shows a part of the confusion matrix made by the manufacturer of the recognition device from the large word data. This matrix tells us, for example, that if the phoneme /a/ is inputed, then the device recognizes it the time, as/u/1.3% of the time, and so on. The column (I) says that the input is missed 0.9% of the time.</Paragraph> <Paragraph position="20"> Conversely, if the phoneme/o/is generated from the device, there is a slight chance that the original input was/a/,/u/ and/w/, respectively, but no chance that the original input was/i/,/e/or/j/. The probability of the original input being /a/ is much higher than being /w/. Thus, an altered phoneme/w/should be given a more severe penalty than /a/. A score for altered phonemes can be obtained in this way, missing phonemes should be Even a negative score, and extra phonemes should be given a zero or a negative score. With this scoring a score of a partial parse is calculated by summing up the score of each constituent.</Paragraph> <Paragraph position="21"> Therefore, themore likely parse has a higher score.</Paragraph> <Paragraph position="22"> Two methods have been adopted to prune partial parses by a score: * Discarding the low-score shift-waiting branches when a phoneme is applied.</Paragraph> <Paragraph position="23"> * Discarding the low-score branches in a local ambiguity packing.</Paragraph> <Paragraph position="24"> The former method is very effective when strictly applied.</Paragraph> <Paragraph position="25"> The confasion matrix only shows us the phoneme-tophoneme f;ransition, therefore a broader unit transition should also be considered, such as a tendency for the/w/: phoneme ia 'owa' or 'owo' to be missed or for the very first /h/ sound of an input to be missed, and the frequent transformation to 'h@' of the 'su' sound in 'desuka.'</Paragraph> </Section> </Section> <Section position="4" start_page="564" end_page="564" type="metho"> <SectionTitle> 4. Conclu,,dons </SectionTitle> <Paragraph position="0"> The parser has been implemented in Common Lisp on a Symbolics Lisp Machine and is being integrated into CMU's knowledge-based machine translation system to accept a spoken Japanese sentence in the domain of doctor-patient conversation and generate sentences in English, German and Japanese.</Paragraph> <Paragraph position="1"> The parser has been tested against five persons. Each person pronounced 27 sentences in which long sentences are not included due to the limits of the speech recognition device. 84 % of the inputs are parsed correctly and the right sentence appears as the best-score candidate in 88 % out of the correct~ly parsed inputs. The average parsing time for one sentence is 82 seconds.</Paragraph> </Section> <Section position="5" start_page="564" end_page="566" type="metho"> <SectionTitle> Acknowledgements </SectionTitle> <Paragraph position="0"> The authors would like to thank Shuji Morii for giving us the opportunity to use the speech recognition device and to thank other members of the Center for Machine Translatioa for useful comments and advices. We are also indebted to ATR Interpreting Telephony Research Laboratories for providing the computational environment.</Paragraph> <Paragraph position="1"> Appendix. Sample Runs Two actual outputs of the parser are shown on the next page. The first input phoneme sequence is &quot;ebaitaai&quot; and the correct sequence is &quot;megaitai&quot;(which is the same sentence as in the example in Section 2.), which is produced as the top-score sentence of all parses. The second input sequence is &quot;kurigakoogateiru =&quot; and the correct sequence is &quot;kubigakowabaqteiru&quot; which means &quot;I have a stiff neck.&quot; The frame-structure output after each parse is the meaning of the sentence. This meaning is extracted in the same way the CMU's machine translation system does.</Paragraph> <Paragraph position="2"> Namely, ~;ach rule of the context free grammar has a function which is executed each time the rule is applied (i.e. when the reduce action occurs.) If tale function returns nil, this partial parse is discarded because the parse is not correct semantically. If the function returns a non-nil value, the value becomes the semantic of the right-hand-side of the rule and is forwarded to the left-hand-side nonterminal symbol. The details are described in fromita 19870/.</Paragraph> </Section> class="xml-element"></Paper>