File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-1040_metho.xml
Size: 14,802 bytes
Last Modified: 2025-10-06 14:12:08
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-1040"> <Title>Robust parsing of severely corrupted spoken utterances</Title> <Section position="3" start_page="0" end_page="196" type="metho"> <SectionTitle> 2 A closer examination of the problem 2.1 From the acoustical viewpoint </SectionTitle> <Paragraph position="0"> The phenomenology of word undetectlon at the recognition level is somewhat complex but mainly depends on word length. The dependency on length penalizes short words over long ones; i it is partly intrinsic to the signal-processing techniques used for recognition, and also heardeg ily enhanced by coarticulatlon events. The consequence is that short words are frequently undetected or are given unreliable scores; then a standard parsing either would not work or would encounter heavy inefficiencies.</Paragraph> <Paragraph position="1"> as phonemes \[Laface 87\].</Paragraph> <Paragraph position="2"> This work hoJ been partially supported by the &quot;EEC wlbhin the Esprit project ~6.</Paragraph> <Paragraph position="3"> speech. Often short words are erroneously detected and assigned a good score. That happens frequently when their phonetic representation is also part of a longer word that was actually uttered. For this reason the efficiency of a traditional parser would be reduced due to the necessity of taking into consideration such nonexistent words.</Paragraph> <Section position="1" start_page="196" end_page="196" type="sub_section"> <SectionTitle> 2.2 From the understanding view- </SectionTitle> <Paragraph position="0"> point Short words span the widest range of lexical categories and have various degrees of 'significance' (take this term informally). Some cannot be eluded and, if they are missingj it is necessary to understand the rest of the senfence and to initiate an additional interaction wlth the recognition level trying to figure out the most plausible words among a very limited set glven by the parser; if no accept~ble word is found, a dialogue with the user may be t~tarted, aimed at eliciting the essential information. Both are time-consuming operations; the latterp moreover, requires careful ergonomic considerations \[Kaplan 87,\]. However, there are words for which the situation is 1Lot so drastic. This is the case of determiners, prepositions, and auxiliary verbs.</Paragraph> <Paragraph position="1"> The ~;reatment of words of these categories follow two main guidelines in the literature. In the former~ such words act mainly as syntactic markers for multi-word semantic constituents, without providing an intrinsic semantic contribution. This philosophy includes case based \[Fillmore 68\] and conceptual-dependency based approaches to natural language understanding \[Schank 75\]. In the latter guideline, such words play an independent role as semantic units and contribute compositionally wlth other words to the global meaning, with equal dignity \[Hinrlchs 86,Lesmo 85\]. Clearly, given the specific problem we are addressing, it is mandatory to follow the former guideline. Happily, this commitment is coherent with the preference granted to caseframe based parsing coming from different and independent reasons inherent in speech understanding (see \[Hayes 86\] for an excellent discussion). The peculiar caseframe based approach summarlzed in the next section provides in most cases the ability of understanding a sentence without relying on such word~.</Paragraph> </Section> </Section> <Section position="4" start_page="196" end_page="197" type="metho"> <SectionTitle> 3 The standard parsing strat- </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="196" end_page="197" type="sub_section"> <SectionTitle> egy 3.1 Linguistic knowledge representa- </SectionTitle> <Paragraph position="0"> tion Linguistic knowledge representation is based on the notion of cas~frame \[Fillmore 68\] and is described in detail in \[Poeslo 871. Caseframes offer a number of advantages h* speech parsing, hence their popularity in many recent speech understanding systems \[Hayes 86,Brietzmann 86\], but cause two main difficulties.</Paragraph> <Paragraph position="1"> First, the analysis cannot be driven by casemarkers, as is the case with written language, since often casemarkere are just t!lose kinds of short words that are unreliably recognized or undetected at all. The standard approach is to assign to case headers the leading role, that is to instantlate caseframes using word hypotheses to fill their header slot and subsequently to try to expand the case slots. This strategy induces parsing to proceed in a top-down fashion, and works satisfactorily when headers are among the best-scored lexical hypotheses. However, it can be shown \[Gemello 87\] to cause severe problems if there is a bad-scored but correct header word, because the corresponding caseframe inetantiation will not be resumed until all of the caseffames having better-scored but false header words have been processed. The situation of headers with bad scores happens quite frequently, especially when the uttered sentences suffer from strong local corruption due to coartlculatlon phenomena or environmental noise. Moreover, the standard strategy does not exploit the fact, dual to the one previously outlined, that some word hypotheses, though not being headers, have a good and reliable score. An integrated top-down/bottomup strategy, able to exploit the predictive power of nonheader words, is mandatory in such situations.</Paragraph> <Paragraph position="2"> A second difficulty is given by the integration of caseframes and syntax. This is due to two conflicting requirements. From one side, syntax should be defined and developed as a declaratlve knowledge base independently from caseframes, since this permits to exploit syntactic formalisms ~t the best and ins, ires ease of maintenance when the linguistic domain has to be expanded or changed. On the other hand, syntactic constraints should be used together with semantic ones during parsing, because this reduces the size of the inferential activity.</Paragraph> <Paragraph position="3"> To overcome these problems~ caseframes and syntactic rules are pro-compiled into structures called Knowledge Sourcee (KSs). Each KS owns the syntactic and semantic competence necessary to perform a well-formed interpretation of a fragment of the input. Fig. 1 shows a simple caseframe, represented via Conceptual Graphs \[Sown 84\], and a simplified view of the resulting KS obtained by combining it with two rules of a Dependency Grammar \]Hays 64\]. The dependency rules are augmented wlth information about the functional role of the immediate constituents; this information id used by the offline compiler as a mapping between syntax and semantics necessary to automatically generate the KS. The KS accounts for sentences like &quot;Da quale monte haste i1 Tevere?&quot; (&quot;From which mount does the Tevere originate?&quot;). The Composition part represents a way of grouping a phrase having a MOUNT type header satisfying the Activation Condition and a phrase having a RIVER type header. The Constraints part contains checks to be performed whenever the KS is operating. The Meaning part allows to generate the meaning representation starting</Paragraph> </Section> <Section position="2" start_page="197" end_page="197" type="sub_section"> <SectionTitle> 3.2 Parsing </SectionTitle> <Paragraph position="0"> Each of the phrase hypotheses generated by KSs during parsing relates to an utterance fragment and is called Deduction/natance (DI). Dis are an extension of the island concept in the HWIM system \[Woods 82\]. A DI is supported by word hypotheses and has a tree structure reflecting the compositional constraints of the KSs that built it. It has a score computed by combining the score of the word hypotheses supporting it. A simplified view of a DI is shown in Fig. 2. That DI refers to the sentence &quot;Da quale monte nasce il 'revere?&quot; (&quot;From which mount does the Tevere originate?&quot;); its root has been built by the KS of Fig. 1, and two more KSs were required to build the rest of it. The tree structure of the DI reflects the compositional structure of the KSs. The bottom-left part of the picture shows that there are two types (SPEC and JOLLY) that correspond to phrases that have still to be detected. Such 'empty' nodes are called goa/a. SPEG will account for the phrase &quot;Quale&quot; (&quot;Which&quot;); JOLLY represents the need of a preposition that might be missing from the lattice (this aspect is discussed later).</Paragraph> <Paragraph position="1"> Parsing is accomplished by selecting the best-scored DI or word hypothesis in the lattice and letting it to be accreted by all of the KSs that can do the job. Such opportunistic score-guided search results in top-down, 'expectation-based' actions that are dynamically mixed with bottom-up, 'predictive' actions. The actions of KSs on Dis are described by operators.</Paragraph> <Paragraph position="2"> Top-down actions consist in starting from a DI having a goal, and: 1. if it is a header slot, solve it with a word hypothesis (VERIFY operator); 2. if it is a case-filler slot, * solve it with already existing complete Dis (MERGE), or * decompose it according to a KS knowledge contents (SUBGOALING).</Paragraph> <Paragraph position="3"> Bottom-up actions consist in creating a new DI start' ing either 1. from a word hypothesis, which will occupy the header slot of the new DI (ACTIVATION), or 2. from a complete DI, which will occupy a case-filler (PREDICTION).</Paragraph> <Paragraph position="4"> ~198 , Such a strategy is opportunistic, since the element on which the KSs will work is selected according to its score, and the actions to be performed on it are determined solely by its characteristics.</Paragraph> <Paragraph position="5"> The activity of the operators is mainly concerned with the p:eopagation of constraints to the goal nodes of each newly-created DL Constraints are propagated from a father to a son or vice-versa according to the current parsing direction. They consist in: Time intervals, in the form of start and end ranges; Morphological information, used to check agreemen~s inside the DI; Fun(:tional information, used to verify the correctness of the grammatical relations that are being established within the DI; Semantic type information. This information is used when, unlike the case of Fig. 1, more than one caseframe are represented by a single KS (the offih~e compiler may decide to do this if the case-frames are similar and the consequent estimated reduction of redundancy appears sufficiently great). In such a situation compliance with the single caseflames may have to be checked, hence the reason for this type of information.</Paragraph> </Section> </Section> <Section position="5" start_page="197" end_page="197" type="metho"> <SectionTitle> 4 Dealing with missing short </SectionTitle> <Paragraph position="0"> words As was pointed out, there are many different kinds of words thai. are short. In general, their semantic relevance depends on the linguistic representation and on the chosen domain. If the words are determiners, prepositions or auxiliary verbs, however, the integration of syntax and semantics outlined above makes them irrelevant in most cases, as very often it allows to infer them from the other words of the sentence. Such an inference may result not possible (mainly when prepositions are concerned), or the word may belong to other categories, such as connectives (&quot;and&quot;, &quot;or&quot;) or proper nouns, which are short but whose semantic relevance is out of question; in these cases the system must react exactly as to the lack of a 'normal ~ word.</Paragraph> <Paragraph position="1"> Let us call 'jollies' the types of word for which only a functlonal role is acknowledged. Jollies are considered merely as ~yntactlc markers for constituents to which they do not offer a meaning contribution per se. The pursued goal is twofold: 1. Par~3ing must be enabled to proceed without them in most cases; 2. However~ whenever possible and useful, one wish to exploit their contribution in terms of time con null straint and score (remember that there are also ~long' jollies, much more reliable than short ones). The general philosophy is 'ignore a jolly unless there is substantial reasons to consider it'. The proposed solution is as follows: 1. Jollies are represented as terminal slots in the compositional part of a KS, like headers. There can be syntactic and even semantic constraints on them, but they do not enter into the rule describing the meaning representation.</Paragraph> <Paragraph position="2"> 2. Since we assume that jollies have no semantic predictive power, all of the operators are inhibited to operate on them.</Paragraph> <Paragraph position="3"> 3. Another top-down operator, JVERIFY, is added to solve jolly slots, acting only when a DI has enough support from other 'significant' word hypotheses. Fig. 3 shows a KS deriving from the same caseffame of Fig. 1 but from a different dependency rule. Such a KS treats sentences like &quot;Da quale monte si orlgina il Ten vere?&quot; (&quot;From which mount does the Tevere originate?&quot;), in which the word &quot;si&quot; is a marker for verb reflexivity. The way JVERIFY operates depends on the result of a predicate, JOLLY-TYPE, applied on the jolly slot. JOLLY-TYPE has three possible values: SttORT-OR-UNESSENTIAL, LONG-OR-ESSENTIAL, and UNKNOWN that depend on various factors, including tlle lexical category assigned to the jolly slot, the temporal, morphologlc and semantic constraints imposed on that slot by other word hypotheses, and the availability of such data. if the returned value is LONG-OR-ESSENTIAL, then the jolly must be found in the lattice, and it~ loss causes parsing to react in a way exactly similar as to the loss of any other 'normal' word. Conversely, if the value is SHORT-OR-UNESSENTIAL, the jolly is ignored by placing a suitable temporal ~hole ~ in the slot pf the DI. The hole has relaxed temporal boundaries so as not to impose too strict a constraint o11 the position of words that can fill adjacent slots; thresholds are used for this purpose. Finally, if the value is UNKNOWN, an action like the previous one is done, followed by a limited search in the lattice, looking for words exceeding the maximum width of the 'hole'. Such a search is necessary because it insures that parsing does not fail when tim correct word is a jolly larger than the 'hole'. JVERIFY is submitted to the standard scheduling just as the other operators are.</Paragraph> </Section> class="xml-element"></Paper>