File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/a92-1005_metho.xml

Size: 19,902 bytes

Last Modified: 2025-10-06 14:12:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="A92-1005">
  <Title>Real-time linguistic analysis for continuous speech understanding*</Title>
  <Section position="4" start_page="33" end_page="35" type="metho">
    <SectionTitle>
3 Language representation
</SectionTitle>
    <Paragraph position="0"> The task of the machine is to combine together adj&amp; cent word hypotheses, so as to create phrase hypothese~ (PHs), which are consistent according to the languag, model. Such parsing process continues until the systerr reaches a solution.</Paragraph>
    <Paragraph position="1"> The choice of a suitable linguistic knowledge representation poses a dilemma. For the machine, to react real-time, the representation must above all be efficient that is, it must require reasonable computational cos1 and must keep low the number of PHs generated dur.</Paragraph>
    <Paragraph position="2"> ing parsing. On the other hand, for the developer o the system the representation must be easy to declare interpret, and maintain. Ease of maintenance suggests for example, that it is preferable to keep syntax an( semantics separate as much as possible.</Paragraph>
    <Paragraph position="3"> The previous considerations suggest to adopt two rep.</Paragraph>
    <Paragraph position="4"> resentations, one suitable for the system developer, th, other for the machine \[Poesio and Rullent 1987\]. Th~ translation of the linguistic knowledge from the forme: representation (high-level representation) to the latte: one (low-level representation) is performed off-line b2 a compiler (see Fig. 2). This approach also permit: to maintain separate high-level representations for syn tax and for semantics, choosing for each the formalism that seem most suitable. For semantics the Casefram, formalism \[Fillmore 1968\] in the form of Conceptua Graphs \[Sowa 1984\] had been chosen, while for syn tax the Dependency Grammar formalism \[Hays 1964 has been used. A Dependency Grammar expresses th, syntactic structure of sentences through rules involvinl dependencies between morphological categories. Th, right-hand side of the rule contains one distinguishe~ terminal symbol called governor, while the other sym bols are called dependents. A mechanism has bee\] added to the dependency rules to describe the morphc logical agreements between the governor and the depen dents.</Paragraph>
    <Section position="1" start_page="34" end_page="34" type="sub_section">
      <SectionTitle>
3.1 The compiler
</SectionTitle>
      <Paragraph position="0"> Dependency grammars have been selected as a formalism for representing syntactic knowledge because they allow an easy integration with caseframes thanks to the similar notion of governor for the dependency rules and of header for the caseframes.</Paragraph>
      <Paragraph position="1"> The compiler operates off-line and generates internal structures, called Knowledge Sources (KSs) suitable to allow an efficient parsing strategy. The basic point is that each KS is aimed at generating a certain class of constituents. Then each KS must combine the time adjacency knowledge, the syntactic, morphological and semantic knowledge that it is necessary to handle a specific class of phrases.</Paragraph>
      <Paragraph position="2"> As an example, Table lb represents the dependency rules used to deal with the sentences of Table la. Note that prepositions are never governors as they are usually short and are likely to be missing from the lattice (see section 5). The star symbol in each rule represents the governor position. The associated rules for morphological agreement checks are not reported for simplicity.  (a) sent yesterday sent yesterday by John (b) rsl) verb = * adverb\[adv-phrase\] rs2) verb * adverb\[adv-phrase\] noun\[by-phrase\] rs3) noun = prep *  For each dependency rule the compiler must find all the conceptual graphs that can be associated to such rule and to use them to generate a KS. For this purpose, each dependency rule is augmented with information about grammatical relations, contained in square brakets in Table lb; a grammatical relation is associated  to each dependent Di, accounting for the grammatical relation existing between the governor G and the lower-level constituent having Di as a governor.</Paragraph>
      <Paragraph position="3"> For example, the associated grammatical relations for rs2 could be adv-phrase for the first dependent and by-phrase for the second one. Additional mapping knowledge associates one or more conceptual graphs to each grammatical relation, so that it is possible to find from the conceptual graphs the semantic constraints that the governor and the dependents of the rule have to follow. Referring to the conceptual graphs of Fig. 3, the conceptual relation agnt can be associated to by-phrase and the conceptual relation lime can be associated to adv-phrase. The semantic constraints derived from the conceptual graphs are: SEND for the &amp;quot;verb&amp;quot; governor, YESTERDAY for the &amp;quot;adverb&amp;quot; dependent and PERSON for the &amp;quot;noun&amp;quot; dependent of rule rs2.</Paragraph>
      <Paragraph position="4"> Each KS built by the compiler has one terminal slot called header, representing one single word, and other slots called fillers representing phrases, positionally reflecting the symbols of the dependency rules from which the KS derives. The main bulk of knowledge embedded in a KS is a set of structures that express constraints between the syntactic-semantic features of the header and those of the fillers.</Paragraph>
      <Paragraph position="5"> A first version of the compiler had the goal of enriching the dependency rules with the semantic constraints derived from the concepual graphs. In this case the set of generated KS could be sketched as in Table 3, where row c4 can correspond, for instance, to the compilation of the dependency rule rs2. A total of 70 conceptual graphs and 373 syntactic rules is used in the system.</Paragraph>
      <Paragraph position="6"> These knowledge bases are able to treat a large variety of sentences, a sample of which is shown in Table 2 (nearly literal English translation).</Paragraph>
      <Paragraph position="7"> Although the obtained efficiency was sufficiently good, a conceptual improvement to the compiler has been devised, as is described in the next subsection.</Paragraph>
    </Section>
    <Section position="2" start_page="34" end_page="35" type="sub_section">
      <SectionTitle>
3.2 Efficient representation of linguistic
</SectionTitle>
      <Paragraph position="0"> constraints: rule fusion One basic problem to move towards real-time operation is to define the kind of structures that can be build for representing PHs. Suppose a &amp;quot;classical&amp;quot; grammar like a context free grammar is used and that we are trying to connect two words into a grammatical structure. In general, this can be done in several ways according to  I'd like to know Thursday's fourth one.</Paragraph>
      <Paragraph position="1"> Did any mail come in September? Tell me the mails I received since two days ago. Got any mail last week?  different grammar rules. Since structures built with different rules may connect with different word hypotheses, a new memory object is needed for every structure. In the case of speech this leads to two undesirable consequences. First, a very large memory size is required, owing to the high number of word combinations allowed by word lattices. Second, each of the structures will be separately selected and expanded, possibly with the same words, during the score-guided analysis, thus introducing redundant work. Therefore, the compiler should generate a smaller number of &amp;quot;compact&amp;quot; KSs, still keeping the maximum discrimination power.</Paragraph>
      <Paragraph position="2"> The goal of generating a small number of KSs is accomplished through the fusion technique \[Baggia et al. 1991a\]. Fusion aims at compacting together KSs. KSs  may have constituents in different order or even a different number of constituents. Let us suppose we have a WH of class C and we want to connect it to other words that can depend on it and that are adjacent to the header on the right. Table 3 contains, for the header class C, a sketchy representation of the KSs involved (the rows in the table). The positions of the constituents are also shown. The zero position indicates the header while positions 1 and 2 indicate dependents on the right of the header. The numbers attached to each class mean that different constraints act on the corresponding constituent. Table 3 shows that constituents of both classes A and B are involved. Let us focus on the class A case.</Paragraph>
      <Paragraph position="3"> As we want to find class A constituents, on the right of the header, four KSs are involved, corresponding to rows cl, c3, c5, and c6; the first two KSs propagate constraints (summarized by A1) that will be considered by a proper KS of class A; the result is the generation of two couples of PHs (two generated by the A KS and two by the C KS). Two other couples of PHs are generated in a completely similar way by the KSs of row c5 and c6, the only difference being that the KSs propagate different constraints.</Paragraph>
      <Paragraph position="4"> In the fusion case there is just one KS for the seven different rows of Table 3. The C KS propagates the constraints for the A KS: it propagates AI+A2 and the time constraint that the constituent must be adjacent (on the right) to the header. Only one search into the lattice is performed by the A KS. Only a couple of PH is created for the rows cl, c3, c5, and c6 (one by the A KS and one by the C KS).</Paragraph>
      <Paragraph position="5"> The fusion technique is effective in reducing the number of PHs to be generated and the parsing time. The results of the experiments are reported in Table 4.</Paragraph>
      <Paragraph position="6">  The reduction of PtIs would be of no use if it were balanced by an increased activity for checking and propagating constraints. So, for execution efficiency, bit coded representations are used for the propagation of constraints about active rules, in a way similar to the propagation of morphological and semantic constraints. The system runs on a Sun SparcStation 1 and is implemented using the C language, which furtherly increases speed.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="35" end_page="37" type="metho">
    <SectionTitle>
4 Control of parsing activities
</SectionTitle>
    <Paragraph position="0"> The basic problem that control is cMled to face is the width of the search space, due to the combined effect of the non-determinism of the language model and the uncertainty and redundancy of the input. Since an exhaustive search is not feasible, scores are used to constrain it along the most promising directions just from the beginning: the analysis proceeds in cycles in a best-first perspective and at each cycle the parser processes the best-scored element produced so far. The score of a PH made up by a number of word hypotheses is defined as the average of the scores of its component words, weighted by their time durations. This &amp;quot;length normalization&amp;quot; insures that, when we have to compare two PHs having different length, we do not privilege longer or shorter ones.</Paragraph>
    <Paragraph position="1"> The building of parse trees may proceed through top-down or bottom-up steps. For instance, if the best-scored element selected in one cycle is a header word, a  top-down step consists in hypothesizing fillers and verifying the presence in the lattice of words that can support them. Hypothesizing headers from already parsed fillers is an example of a bottom-up step.</Paragraph>
    <Paragraph position="2"> If all of the correct word hypotheses are well-scored, any parsing strategy works satisfactorily. However, often a correct word happens to be badly recognized and hence receives a bad score, though the overall sentence score remains good. This can be due to a burst of noise, or to the fact that the word was badly uttered. Many incorrect words will be present in the lattice, scoring better than such word. Now, imagine a pure top-down parsing in the case where such a bad word is one of the headers. Prior to processing that header, the parser will process all of the better-scored words that are themselves headers. This may delay the finding of the correct solution beyond reasonable limits, or may favor the finding of a wrong solution in the meantime. Similar considerations hold in the ease of a pure bottom-up strategy. Such bottlenecks are avoided thanks to a strategy in which the good-scored words of the correct solution may hypothesize the few bad-scored ones in any case. This property implies that the parser must be able to dynamically switch from top-down steps to bottom-up steps and vice versa, according to the characteristics of the element that has been selected in that cycle. Apart from avoiding bottlenecks, a control strategy that follows this guideline has one important characteristic: it is admissible, that is the first-found solution is surely the best-scored one.</Paragraph>
    <Paragraph position="3"> This approach of exploiting only language constraints, if followed to its extremes, leads to an insufficient exploitation of time adjacency, which is a different criterion for designing an efficient control strategy. Time adjacency is at the base of the so-called island-driven parsing approaches, which recently received renewed attention \[Stock et al. 1989\]. Here the idea is to select only fillers that are temporally adjacent to the header, so that we can limit the number of word hypotheses that can be extracted from the lattice (i.e. that satisfy language and time adjacency constraints) and consequently the parse trees that have to be generated.</Paragraph>
    <Paragraph position="4"> The parsing process proceeds through elementary activities, or operators, that represent top-down steps (EXPAND and FILL operators) or bottom-up steps (Ac-TIVATE and PREDICT operators). The JOIN operator describes the activity in which a KS merges together parsing processes that had evolved separately; this may correspond either to a bottom-up or to a top-down step.</Paragraph>
    <Paragraph position="5"> By suitably defining when and how the KSs apply the operators it is possible to trade off with the language constraint and the time adjacency criteria with the result of switching down admissibility by a little amount while simultaneously gain a consistent reduction of the number of generated parse trees. The control strategy that has been adopted, described in detail in \[Giachin and Rullent 1990\], accepts a limited risk of getting the wrong solution in the first place (about 1.5%) but is balanced by a great speed-up in the parsing of a lattice.</Paragraph>
    <Paragraph position="6"> 5 Coping with special speech problems The adjacency between consecutive word hypotheses is seldom perfect, being them affected by a certain amount of gap or overlap. This is due to the fact that the end of a word is slightly confused with the beginning of the consecutive word. The understanding level is tolerant towards these phenomena and defines thresholds on maximum allowed gap or overlap between supposedly consecutive words.</Paragraph>
    <Paragraph position="7"> While coarticulation affects all words, it severely compromises the recognition of what are currently called, with an admittedly imprecise term, function words.</Paragraph>
    <Paragraph position="8"> Function words, such as articles, prepositions, etc., are generally short and they tend to be uttered very imprecisely, so that often they are not included in the lattice. Moreover, function words are often acoustically ineluded in longer words. The parsing strategy then does not rely on function words \[Giaehin and Rullent 1988\].</Paragraph>
    <Paragraph position="9"> The idea is that KS slots corresponding to function words are divided into three categories, namely short, long, and unknown. Short words are never searched in the lattice, and a plaeeholder is put in the Phrase Hypothesis (PH) that includes it. Long words are always searched, and failure is declared if no one is found. Unknown words are searched, but a plaeeholder may be put in the PH if some conditions are met. In a first phase, the categorization of a KS slot was made on the basis of the morphological features of the corresponding function words and on their length (e.g., words with one or two phonemes were declared &amp;quot;short&amp;quot; and never searched). Subsequent experiments showed that, unexpectedly, some very short words may be recognized with virtually no errors, while others, though longer, are much more difficult to recognize. Hence, better results have been obtained when the categorization has been made on the basis of the phonetic features of the words rather than of the morphological ones.</Paragraph>
    <Section position="1" start_page="36" end_page="37" type="sub_section">
      <SectionTitle>
5.1 Feedback verification procedure
</SectionTitle>
      <Paragraph position="0"> Though skipping function words permits to successfully analyze sentences for which these words were not detected, it also implies that the acoustic information of small portions of the waveform is not exploited, and this may lead the parser to find a wrong solution.</Paragraph>
      <Paragraph position="1"> Also, function words may be sometimes essential to correctly understand the meaning of a sentence. In order to cope with these problems, a two-way interaction between the recognition module and the parser has been investigated, called feedback verification procedure \[Baggia et aL 1991b\]. According to this procedure, the parser, instead of stopping at the first solution, continues to run until a predefined amount of resources is consumed. During this period many different solutions are found, possibly containing multiple possibilities in place of missing words. These solutions are then fed back to the recognizer which analyzes them sequentially. The recognizer task realigns the solutions against the acoustic data and attributes them a new likelihood score. Tile best-scored solution is then selected as the correct one.</Paragraph>
      <Paragraph position="2"> As a side effect, the best-matching candidate for function words that were missing in the lattice is also found.</Paragraph>
      <Paragraph position="3">  The verification procedure creates the best conditions to find these words with good reliability: for each place-holder a very small number of candidates are proposed, and the previous and following words are usually norreal reliable words. Hence the recognizer can detect the word with good accuracy. An example of a solution generated by the parser for the utterance &amp;quot;Ci sono messaggi da Rossi il venti?&amp;quot; (literally: &amp;quot;There are mails from Rossi on twenty?&amp;quot;) is shown in Fig. 4. The &amp;quot;??&amp;quot; symbol in the solution represents a possibly missing function word ignored during parsing, that is expanded into a set of candidates, according to the grammar, to be fed back to the recognizer.</Paragraph>
      <Paragraph position="4"> In addition to accurately finding function words, the verification procedure has the advantage that the final scores assigned to solutions by the recognizer are more accurate than those assigned to them by the parser, because these scores have been computed on the same time interval after a global realignment of the sentences.</Paragraph>
      <Paragraph position="5"> Hence comparing the solutions on the basis of their score is a more reliable procedure. The drawback of the verification procedure is that total analysis times are slightly increased by the overload imposed to the recognizer and by the fact that the parser must continue the analysis after the first solution is found.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML