File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2097_metho.xml
Size: 22,341 bytes
Last Modified: 2025-10-06 14:07:08
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2097"> <Title>Compiling Language Models from a Linguistically Motivated Unification Grammar</Title> <Section position="3" start_page="670" end_page="671" type="metho"> <SectionTitle> 2 Tile Genfini Language Model Compiler </SectionTitle> <Paragraph position="0"> To lnake the paper nlore self-contained, this section provides some background on the method used by Gemini to compile unifica.tion grainmars into CFGs, and then into language models. The ha.sic idea. is the obvious one: enumera.te all possible instantiations of the feal;ures in the grammar rules and lexicon entries, and thus tra.nsform esch rule and entry in the ()riginal unification grammar into a set of rules in the derived CFG. For this to be possible, the relevant fe~ttul'es Inust be constrained so that they can only take values in a finite predefined range. The finite range restriction is inconvenient for fea.tures used to build semantic representations, and the tbrmalism consequently distinguishes syntactic and semantic features; selmmtic features axe discarded a.t the start of the compilation process.</Paragraph> <Paragraph position="1"> A naive iml)lelnentation of the basic lnethod would be iml)raetical for any but the smallest a.nd simplest grammars, and considera.ble ingemfity has been expended on various optimizations. Most importantly, categories axe expanded in a demand-driven fa.shion, with infer lnatiotl being percolated 1)oth t)otton>up (from the lexicon) and top-down (fl'om the grammar's start symbol). This is done in such a. way that potentially valid colnl)inations of feature instantiations in rules are successively filtered out if they are not licensed by the top-down and bottom-ul) constra.ints. Ranges of feature values are also kept together when possible, so that sets of context-free rules produced by the mdve algorithm may in these cases be merged into single rules.</Paragraph> <Paragraph position="2"> By exploiting the structure of the grammar a.nd lexicon, the demand-driven expansion lnethod can often effect substa.ntial reductions in the size of the derived CFG. (For the type of grammar we consider in this paper, the reduction is typically by ,~ fa.etor of over 102deg). The downside is that even an app~trently slnall cha.nge in the syntactic t>atures associated with a. rule may have a large eIfect on the size of the CFG, if it opens up or blocks an important percolation path. Adding or deleting lexicon entries can also have a significant effect on the size of the CFG, especially when there are only a small number of entries in a given grammatical category; as usual, entries of this type behave from a software engineering standpoint like grammar rules.</Paragraph> <Paragraph position="3"> The language model compiler also performs a number of other non-trivial transformations.</Paragraph> <Paragraph position="4"> The most important of these is related to the fact that Nuance GSL grammars are not allowed to contain left-recursive rules, and left-recursive unification-grammar rules must consequently be converted into a non-left-recursive fort::. Rules of this type do not however occur in the gramlnars described below, and we consequently omit further description of the method.</Paragraph> </Section> <Section position="4" start_page="671" end_page="672" type="metho"> <SectionTitle> 3 Initial Experiments </SectionTitle> <Paragraph position="0"> Our initial experiments were performed on a recent unification grammar in the ATIS (Air Travel Information System) domain, developed as a linguistically principled grammar with a domain-specific lexicon. This grammar was cre~ted for an experiment COl::t)aring coverage and recognition performance of a hand-written grammar with that of a.uto:::atically derived recognition language models, as increasing amounts of data from the ATIS corpus were made available for each n:ethod. Examples of sentences covered by this gralnlnar are &quot;yes&quot;, &quot;on friday&quot;, &quot;i want to fly from boston to denver on united airlines on friday september twenty third&quot;, &quot;is the cheapest one way fare from boston to denver a morning flight&quot;, and &quot;what flight leaves earliest from boston to san francisco with the longest layover in denver&quot;. Problems obtaining a working recognition grammar from the unification grammar ended our original experiment prematurely, and led us to investigate the factors responsible for the poor recognition performance.</Paragraph> <Paragraph position="1"> We explored several likely causes of recognition trouble: number of rules, ::umber of vocabulary items, size of node array, perplexity, and complexity of the grammar, measured by average and highest number of transitions per graph in the PFSG form of the grammar.</Paragraph> <Paragraph position="2"> We were able to in:mediately rule out simple size metrics as the cause of Nuance's difficulties with recognition. Our smallest air travel grammar had 141 Gemini rules and 1043 words, producing a Nuance grammar with 368 rules.</Paragraph> <Paragraph position="3"> This compares to the Con:mandTalk grammar, which had 1231 Gemini rules and 1771 words, producing a Nuance gran:n:ar with 4096 rules.</Paragraph> <Paragraph position="4"> To determine whether the number of the words in the grammar or the structure of the phrases was responsible for the recognition problems, we created extreme cases of a Word+ grammar (i.e. a grammar that constrains the input to be any sequence of the words in the vocabulary) and a one-word-per-category grammar. We found that both of these variants of our gralmnar produced reasonable recognition, though the Word+ grammar was very inaccurate. However, a three-words-per-category grammar could not produce snccessflfl speech recognition.</Paragraph> <Paragraph position="5"> Many thature specifications can lnake a grammar ::tore accurate, but will also result in a larger recognition grammar due to multiplication of feature w~lues to derive the categories of the eontext-fl'ee grammar. We experimented with various techniques of selecting features to be retained in the recognition grammar. As described in the previous section, Gemini's default method is to select only syntactic features and not consider semantic features in the recognition grammar. We experimented with selecting a subset of syntactic features to apply and with applying only se:nantic sortal features, and no syntactic features. None of these grammars produced successful speech recognition.</Paragraph> <Paragraph position="6"> /.Fro::: these experiments, we were unable to isolate any simple set of factors to explain which grammars would be problematic for speech recognition. However, the numbers of transitions per graph in a PFSG did seem suggestive of a factor. The ATIS grammar had a high of 1184 transitions per graph, while the semantic grammar of CommandTalk had a high of 428 transitions per graph, and produced very reasonable speech recognition.</Paragraph> <Paragraph position="7"> Still, at; the end of these attempts, it beca.me clear that we did not yet know the precise characteristic that makes a linguistically motivated grammar intractable for speech recognition, nor the best way to retain the advantages of the hand-written grammar approach while providing reasonable speech recognition.</Paragraph> </Section> <Section position="5" start_page="672" end_page="673" type="metho"> <SectionTitle> 4 Incremental Grammar </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="672" end_page="672" type="sub_section"> <SectionTitle> Development </SectionTitle> <Paragraph position="0"> In our second series of experiments, we increlnenta.lly developed a. new grammar front s('ra.tch. The new gra.mma.r is basica.lly a s('a.leddown and a.dapted version of tile Core Language Engine gramme\ for English (Puhnan 1!)92; Rayner 1993); concrete development work a.nd testing were organized a.round a. speech interfa c(; to a. set; of functionalities oflhred by a simple simula,tion of the Space Shuttle (Rather, Hockey gll(l James 2000). Rules and lexical entries were added in sma.ll groups, typically 2-3 rules or 5 10 lexical entries in one increment. After each round of exl)a.nsion , we tested to make sure that the gramlnar could still 1)e compiled into a. usa.bh; recognizer, a.nd a.t severe.1 points this suggested changes in our iln1)\]ementation strategy. The rest of this section describes tile new grmmnar in nlore detail.</Paragraph> </Section> <Section position="2" start_page="672" end_page="672" type="sub_section"> <SectionTitle> 4.1 Overview of Rules </SectionTitle> <Paragraph position="0"> The current versions of the grammar and lexicon contain 58 rules a.nd 30J. Ulfinflectesl entries respectively. They (:over the tbllowing phenom- null eli :~IZ 1. Top-level utl;er~tnces: declarative clauses, WH-qtlestions, Y-N questions, iml)erat;ives, etlil)tical NPs and I)Ps, int(;rject.ions. ~.. /\] 9 \,~ H-lnovement of NPs and PPs.</Paragraph> <Paragraph position="1"> 3. The fbllowing verb types: intr~nsitive, silnple transitive, PP con:plen-mnt, lnodaJ/a.uxiliary, -ing VP con-q)len:ent, particleq-NP complement, sentential complelnent, embedded question complement.</Paragraph> <Paragraph position="2"> 4. PPs: simple PP, PP with postposition (&quot;ago&quot;)~ PP lnodifica,tion of VP and NP. 5. Relat;ive clauses with both relative NP pro1101111 (&quot;tit(; telnperature th,tt I measured ) and relative PP (&quot;the (loci: where I am&quot;). 6. Numeric determiners, time expressions, and postmodification of NP 1)y nun:eric expressions. null 7. Constituent conjunction of NPs and cl~ulses.</Paragraph> <Paragraph position="3"> Tilt following examl)le sentences illustrate current covera,ge: 3 '-. , ':how ~d)out scenario three.?&quot;, &quot;wha, t is the temperature?&quot;, &quot;measure the pressure a,t flight deck&quot;, &quot;go to tile crew ha.tch a.nd (:lose it&quot;, &quot;what were ten:pera.tttt'e a, nd pressure a.t iifteen oh five?&quot;, &quot;is the telnpera.ture going ttp'. ~', &quot;do the fixed sensors sa.y tha.t the pressure is decreasing. , &quot;find out when the pressure rea.ched fifteen p s i .... wh~t 1 is the pressure that you mea.sured?&quot;, &quot;wha.t is the tempera.lure where you a.re?&quot;, C/~(:a.n you find out when the fixed sensors say the temperature at flight deck reached thirty degrees celsius?&quot;.</Paragraph> </Section> <Section position="3" start_page="672" end_page="673" type="sub_section"> <SectionTitle> 4.2 Unusual Features of the Grammar </SectionTitle> <Paragraph position="0"> Most of the gramn:~u', as already sta.ted, is closely based on the Core Language Eng!ne gra.nlnla.r. \Y=e briefly sllnllna.rize the main divergences between the two gramnlars.</Paragraph> <Paragraph position="1"> The new gramlna, r uses a. novel trea.tment of inversion, which is p~trtly designed to simplify the l)l'ocess of compiling a, fea,ture gl'anllna, r into context-free form. The CLE grammar's trea.tlltent of inversion uses a, movement account, in which the fronted verb is lnoved to its notional pla.ce in the VP through a feature. So, tbr example, the sentence &quot;is pressure low?&quot; will in the origina.1 CLE gramma.r ha.re the phrase-structure null ::\[\[iS\]l&quot; \[pressure\]N/, \[\[\]V \[IO\V\]AI),\]\]V'\]'\],'g&quot; in whk:h the head of th(, VP is a V gap coindexed with tile fronted main verb 1,~ .</Paragraph> <Paragraph position="2"> Our new gra.mn:ar, in contrast, hal:dles inversion without movement, by making the con> bination of inverted ver\]) and subject into a.</Paragraph> <Paragraph position="3"> VBAR constituent. A binary fea.ture invsubj picks o:ll; these VBARs, a.nd there is a. questionforma,tion rule of tilt form</Paragraph> <Paragraph position="5"> Continuing the example, the new grammar a.ssigns this sentence tilt simpler phrase-structure null &quot;\[\[\[is\] v \[press:ire\] N*'\] v.A. \[\[low\] J\] V.\] S&quot; Sortal constra,ints are coded into most gr~un:nnr rules as synta.ctic features in a straight-forward lna.nner, so they are available to the compilation process which constructs the context-free grammar, ~nd ultimately tile language model. The current lexicon allows 11 possible sortal values tbr nouns, and 5 for PPs.</Paragraph> <Paragraph position="6"> We have taken the rather non-standard step of organizing tile rules for PP modification so that a VP or NP cannot be modified by two PPs of the same sortal type. The principal motivation is to tighten the language model with regard to prepositions, which tend to be phonetically reduced and often hard to distinguish from other function words. For example, without this extra constraint we discovered that an utterance like measure temperature at flight deck and lower deck would frequently be misrecognized as measure temperature at flight deck in lower deck</Paragraph> </Section> </Section> <Section position="6" start_page="673" end_page="675" type="metho"> <SectionTitle> 5 Experiments with Incremental </SectionTitle> <Paragraph position="0"> G r am 111 ar S Our intention when developing the new grammar was to find out just when problems began to emerge with respect to compilation of tangm~ge models. Our initial hypothesis was that these would l)robably become serious if the rules for clausal structure were reasonably elaborate; we expected that the large number of possible ways of combining modal and auxiliary verbs, question forlnation, movement, and sentential complements would rapidly combine to produce an intractably loose language model. Interestingly, this did not prove to be the case. Instead, the rules which appear to be the primary ca.use of difficulties are those relating to relative clauses. We describe the main results in Section 5.1; quantitative results on recognizer pertbrmance are presented together in Section 5.2.</Paragraph> <Section position="1" start_page="673" end_page="674" type="sub_section"> <SectionTitle> 5.1 Main Findings </SectionTitle> <Paragraph position="0"> We discovered that addition of the single rule which allowed relative clause modification of an NP had a dr~stic effect on recognizer perforlnance. The most obvious symptoms were that recognition became much slower and the size of the recognition process much larger, sometimes causing it to exceed resource bounds. The false reject rate (the l)roportion of utterances which fell below the recognizer's mininmnl confidence theshold) also increased substantially, though we were surprised to discover no significant increa.se in the word error rate tbr sentences which did produce a recognition result. To investigate tile cause of these effects, we examined the results of perfornfing compilation to GSL and PFSG level. The compilation processes are such that symbols retain mnemonic names, so that it is relatively easy to find GSL rules and gral)hs used to recognize phrases of specified gralnmatical categories.</Paragraph> <Paragraph position="1"> At the GSL level, addition of the relative clause rule to the original unification grammar only increased the number of derived Nuance rules by about 15%, from 4317 to 4959. The average size of the rules however increased much more a. It, is easiest to measure size at the level of PFSGs, by counting nodes and transitions; we found that the total size of all the graphs had increased from 48836 nodes and 57195 tra.nsitions to 113166 nodes and 140640 transitions, rather more than doubling. The increase was not distributed evenly between graphs. We extracted figures for only the graphs relating to specific grammatical categories; this showed that, the number of gra.1)hs fbr NPs had increased from 94 to 258, and lnoreover that the average size of each NP graph had increased fronl 21 nodes and 25.5 transitions to 127 nodes and 165 tra.nsitions, a more than sixfold increase. The graphs for clause (S) phrases had only increased in number froln 53 to 68. They ha.d however also greatly increased in average size, from 171 nodes and 212 transitions to 445 nodes and 572 transitions, or slightly less than a threefold increase. Since NP and S are by far the most important categories in the grammar, it is not strange that these large changes m~tke a great difference to the quality of the language model, and indirectly to that of speech recognition.</Paragraph> <Paragraph position="2"> Colnparing the original unification gramlnar and the compiled CSL version, we were able to make a precise diagnosis. The problem with the relative clause rules are that they unify feature values in the critical S and NP subgralnlnars; this means that each constrains the other, leading to the large observed increase in the size and complexity of the derived Nuance grammar.</Paragraph> <Paragraph position="3"> aGSL rules are written in all notation which allows disjunction and Klccne star.</Paragraph> <Paragraph position="4"> Specifically, agreement ilffbrmation and sortal category are shared between the two daughter NPs in the relative clause modification rule, which is schematically as follows: Igp: \[agr=A, sort=S\] --+ NP: \[agr=A, sort=S\] REL:\[agr=A, sort=S\] These feature settings ~re needed in order to get tile right alternation in pairs like the robot that *measure/measures the teml)erature \[agr\] the *deck/teml)era.ture tha.t you measured \[sort\] We tested our hypothesis by colnlnenting ()lit the agr and sort features in the above rule.</Paragraph> <Paragraph position="5"> This completely solves the main 1)robh;in of ex1)lesion in the size of the PFSG representation; tile new version is only very slightly larger than tile one with no relative clause rule (50647 nodes and 59322 transitions against 48836 nodes and 57195 transitions) Most inL1)ortantty, there is no great increase in the number or average size of the NP and S graphs. NP graphs increase in number froin 94 to 130, and stay constant in a.vera ge size.; S graphs increase in number f}om 53 to 64, and actually decrease, in aa;erage size to 13,5 nodes and 167 transitions. Tests on st)eech (l~t;a. show that recognition quality is nea~rly :lie sa.me as for the version of the recognizer which does not cover relative clauses. Although speed is still significantly degraded, the process size has been reduced sufficiently that the 1)roblen:s with resource bounds disappear.</Paragraph> <Paragraph position="6"> It would be rea.sonal)le 1:o expect tim: removing the explosion in the PFSG ret)resentation would result in mL underconstrained language model for the relative clause paxt of the grammar, causing degraded 1)erformance on utterances containing a, relative clause. Interestingly, this does not appear to hapl)en , though recognition speed under the new grammar is significaatly worse for these utterances COml)ared to utterances with no relative clause.</Paragraph> </Section> <Section position="2" start_page="674" end_page="675" type="sub_section"> <SectionTitle> 5.2 Recognition Results </SectionTitle> <Paragraph position="0"> This section summarizes our empirical recognition results. With the help of the Nuance Toolkit batchrec tool, we evah:ated three versions of the recognizer, which differed only with respect to tile language model, no_rels used the version of the language model derived fl'onI a granLn:a.r with the relative clause rule removed; rels is the version derived from the fltll gramlnar; and unlinked is the colnl)romise version, which keeps the relative clause rule but removes the critical features. We constructed a corpus of 41 utterances, of mean length 12.1 words.</Paragraph> <Paragraph position="1"> The utterances were chosen so that the first, 31 were within the coverage of all three versions of the grammar; the last 10 contained relative clauses, and were within the coverage of re:s and un:inked but :tot of no_rels. Each utteranee was recorded by eight different subjects, none of whom had participated in development of the gra.mmar or recognizers. Tests were run on a dual-processor SUN Ultra60 with 1.5 GB of RAM.</Paragraph> <Paragraph position="2"> The recognizer was set, to reject uttera.nces if their a.ssociated confidence measure fell under the default threshold. Figures 1 and 2 summarize the re.suits for the first 31 utterances (no relative clauses) and the last 10 uttera:Lces (relative clauses) respectively. Under 'xRT', we give inean recognition speed (averaged over subjects) expressed as a multiple of real time; 'PRe.j' gives the false reject rate, the :heart l)er centage of utterances which were reiected due to low confidence measures; 'Me:n' gives the lnean 1)ercentage of uttera.nces which fhiled due to the.</Paragraph> <Paragraph position="3"> recognition process exceeding inemory resource bounds; and 'WER,' gives the mean word error rate on the sentences that were neither rejected nor failed due to resource bound problems. Since the distribution was highly skewed, all mea.ns were calculated over the six subjects renm.i:fing after exclusion of the extreme high and low values.</Paragraph> <Paragraph position="4"> Looking first at Figure 1, we see that rels is clearly inferior to no_rels on tile subset of the corpus which is within the coverage of both versions: nea.rly twice as many utterances are rejected due to low confidence values or resource 1)roblems, and recognition speed is about five times slower, unlinked is in contrast :tot significantly worse than no_rels in terms of recognition performance, though it is still two and a half times slower.</Paragraph> <Paragraph position="5"> Figure 2 compares rels and unlinked on the utterances containing a relative clause. It seems reasona.ble to say that recognition performance containing relative clauses, averaged across 8 subjects excluding extreme values.</Paragraph> <Paragraph position="6"> is comparable for the two versions: rels has lower word error rate, but also rqjects more utterances. Recognition speed is marginally lower for unlinked, though it is not clear to us whether the difference is significant given the high variability of the data.</Paragraph> </Section> </Section> class="xml-element"></Paper>