File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/h94-1042_metho.xml

Size: 14,416 bytes

Last Modified: 2025-10-06 14:13:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1042">
  <Title>INTEGRATED TECHNIQUES FOR PHRASE EXTRACTION FROM SPEECH</Title>
  <Section position="3" start_page="0" end_page="228" type="metho">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Language modeling for speech recognition has focused on robustness, using statistical techniques such as n-grams, whereas work in language understanding and information extraction has relied more on rule based techniques to leverage linguistic and domain information. However, the knowledge needed in these two components of a speech language system is actually very similar. In our work, we take an integrated approach, which uses a single grammar for both language modeling and language understanding for targeted portions of the domain and uses a single parser for both training the language model and extracting information from the output of the recognizer.</Paragraph>
    <Paragraph position="1"> The goal of our work is provide speech recognition capabilities that are analogous to those of information extraction systems: given large amounts of (often low quality) speech, selectively interpret particular kinds of information. For example, in the air traffic control domain, we want to determine the flight IDs, headings, and altitudes of the planes, and to ignore other information, such as weather and ground movement.</Paragraph>
    <Paragraph position="2"> The following is a summary of the main techniques we use in our approach: Integration of N-gram and context free grammars for speech recognition: While statistically based Markovchain language models (N-gram models) have been shown to be effective for speech recognition, there is, in general, more structure present in natural language than N-gram models can capture. Linguistically based approaches that use statistics to provide probabilities for word sequences that are accepted by a grammar typically require a full coverage grammar, and therefore are only useful for constrained sublanguages. In the work presented here, we combine linguistic structure in the form of a partial-coverage phrase structure grammar with statistical N-gram techniques. The result is a robust statistical grammar which explicitly incorporates syntactic and semantic structure. A second feature of our approach is that we are able to determine which portions of the text were recognized by the phrase grammars, allowing us to isolate these phrases for more processing, thus reducing the overall time needed for interpretation.</Paragraph>
    <Paragraph position="3"> Partial parsing: It is well recognized that full coverage grammars for even subsets of natural language are beyond the state of the art, since text is inevitably errorful and new words frequently occur. There is currently a upsurge in research in partial parsing in the natural language community (e.g., Hindle 1983, Weischedel, et al. 1991), where rather than building a single syntactic tree for each sentence, a forest is returned, and phrases outside the coverage of the grammar and unknown words are systematically ignored. We are using the partial parser &amp;quot;Sparser&amp;quot; (McDonald 1992), which was developed for extracting information from open text, such as Wall Street Journal articles.</Paragraph>
    <Paragraph position="4">  Semantic grammar.&amp;quot; Central to our approach is the use of a minimal, semantically based grammar. This allows us to build targeted grammars specific to the domain. It also makes the grammar much more closely tied to the lexicon, since the lexical items appear in the rules directly and in general there are many categories, each covering only a small number of lexical items. As Schabes (1992) points out in reference to lexicalized stochastic tree adjoining grammars (SLTAG), an effective linguistic model must capture both lexical and hierarchical information.</Paragraph>
    <Paragraph position="5"> Context free grammars using only syntactic information fail to capture lexical information.</Paragraph>
    <Paragraph position="6"> Figure 1 shows a block diagram of the overall approach with the two components which use the parser shaded: the model construction component and the interpretation component.</Paragraph>
    <Paragraph position="7"> For both the language modeling and information extraction, we are using the partial parser Sparser (McDonald 1992). Sparser is a bottom-up chart parser which uses a semantic phrase structure grammar (i.e. the nonterminals are semantic categories, such as HEADING or FLIGHT-ID, rather than traditional syntactic categories, such as CLAUSE or NOUN-PHRASE). Sparser makes no assumption that the chart will be complete, i.e. that a top level category will cover all of the input, or even that all terminals will be covered by categories, effectively allowing unknown words to be ignored. Rather it simply builds constituent structure for those phrases that are in its grammar.</Paragraph>
    <Paragraph position="8"> In Section Two, we describe language modeling, and in Three, we focus on semantic interpretation. In Section Four, we present the results of our initial tests in the air traffic control domain, and finally we conclude with future directions for the work.</Paragraph>
  </Section>
  <Section position="4" start_page="228" end_page="230" type="metho">
    <SectionTitle>
2. LANGUAGE MODELING
</SectionTitle>
    <Paragraph position="0"> There are two main inputs to the model construction portion of the system: a transcribed speech training set and a phrase-structure grammar. The phrase-structure grammar Overall Approach is used to partially parse the training text. The output of this is: (1) a top-level version of the original text with subsequences of words replaced by the non-terminals that accept those subsequences; and (2) a set of parse trees for the instances of those nonterminals.</Paragraph>
    <Section position="1" start_page="228" end_page="229" type="sub_section">
      <SectionTitle>
3.1 Rule Probabilities
</SectionTitle>
      <Paragraph position="0"> Figure two below shows a sample of the rules in the ATC grammar followed by examples of transcribed text and the text modified by the parser. Note that in this case, where goal is to model aircraft identifiers and a small set of air traffic control commands, other phrases like the identification of the controller, traffic information, etc., are ignored. They will be modelled by the n-gram, rather than as specific phrases.</Paragraph>
      <Paragraph position="2"> Using the modified training text we construct a probabilistic model for sequences of words and nonterminals. The parse trees are used to obtain statistics for the estimation of production probabilities for the rules in the grammar. Since we assume that the production probabilities depend on their context, a simple count is  insufficient. Smoothed maximum likelihood production probabilities are estimated based on context dependent counts. The context is defined as the sequence of rules and positions on the right-hand sides of these rules leading from the root of the parse tree to the non-terminal at the leaf. The probability of a parse therefore takes into account that the expansion of a category may depend on its parents. However, it does not take into consideration the expansion of the sister nonterminals, though we are currently exploring means of doing this (cf. Mark, et al. 1992).</Paragraph>
      <Paragraph position="3"> In the above grammar (Figure 2), the expansion of TAKEOFF-ACTION may be different depending on whether it is part of rule 5 or rule 6. Therefore, the &amp;quot;context&amp;quot; of a production is a sequence of rules and positions that have been used up to that point, where the &amp;quot;position&amp;quot; is where in the RHS of the rule the nonterminal is. For example, in the parse shown below (Figure 4), the context of R2 (TAKEOFF-ACTION &gt; &amp;quot;takeoff') is rule 8/position 2, rule 6/position 3. We discuss the probabilities required to evaluate the probability of a parse in the next section.</Paragraph>
      <Paragraph position="5"> In order to use a phrase-structure grammar directly in a time-synchronous recognition algorithm, it is necessary to construct a finite-state network representation If there is no recursion in the grammar, then this procedure is straightforward: for each rule, each possible context corresponds to a separate subnetwork. The subnetworks for different rules are nested. We are currently comparing methods of allowing limited recursion (e.g. following Pereira &amp; Wright 1990). Figure 5 shows the expansion of the rules in from Figure 2.</Paragraph>
      <Paragraph position="6"> There have been several attempts to use probability estimates with context free grammars. The most common technique is using the Inside-Outside algorithm (e.g.</Paragraph>
      <Paragraph position="7"> Pereira &amp; Schabes 1992, Mark, et al. 1992) to infer a grammar over bracketed texts or to obtain Maximum-Likelihood estimates for a highly ambiguous grammar. However, most require a full coverage grammar, whereas we assume that only a selective portion of the text will be covered by the grammar. A second difference is that they use a syntactic grammar, which results in the parse being highly ambiguous (thus requiring the use of the Inside-Outside algorithm). We use a semantic grammar, with which there is rarely multiple interpretations for a single utterance. 1</Paragraph>
    </Section>
    <Section position="2" start_page="229" end_page="230" type="sub_section">
      <SectionTitle>
3.2 Probability Estimation
</SectionTitle>
      <Paragraph position="0"> Both the context-dependent production probabilities of the phrase grammar and one for the Markov chain probabilities for the top-level N-gram model must be estimated. We use the same type of &amp;quot;backing-off' approach in both cases. For the phrase grammar, we estimate probabilities of the form P(rn+ 1 I (r I, Pl), (r2, P2) ..... (rn, Pn)) where r i are the rules and Pi are the positions within the rules. In the N-gram case, we are estimating P(Sn+l I sl, s2 ..... Sn) where Sl, s2 ..... Sn is the sequence of words and non-terminals leading up to Sn+l. In both cases, the estimate is based on a combination of the Maximum-Likelihood estimate, and the estimates in a reduced context:</Paragraph>
      <Paragraph position="2"> The Maximum-Likelihood (ML) estimate reduces to a simple relative-frequency computation in the N-gram case.</Paragraph>
      <Paragraph position="3"> In the phrase grammar case, we assume that the parses are in general unambiguous, which has been the case so far in our domain. Specifically, we only consider a single parse and accumulate relative frequency statistics for the various contexts in order to obtain the ML production  probabilities.</Paragraph>
      <Paragraph position="4"> The approach we use to backing off is described in Placeway, et al. (1993). Specifically, we form pBO(y ix 1 ..... Xn ) = pML(y Ix 1 .... Xn) (1 - 0) +pBO(ylx 2 ..... x n) 0.</Paragraph>
      <Paragraph position="5"> The value of 0 depends on the context Xl ..... Xn and is motivated by approximation of the probability of 0= r/(n+r) where r is the number of different next symbols/rules seen in the context and n is the number of times the context was observed.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="230" end_page="231" type="metho">
    <SectionTitle>
3. INFORMATION EXTRACTION
</SectionTitle>
    <Paragraph position="0"> The final stage of processing is the interpretation of the recognized word sequence. We use the same phrase structure grammar for interpretation as that used to build the recognition model. However, in this last phase, we take advantage of the semantic interpretation facility of the parser.</Paragraph>
    <Paragraph position="1"> Most approaches to natural language understanding separate parsing (finding a structural description) from interpretation (finding a semantic analysis). In the work presented here, we use a single component for both. The Sparser system integrates parsing and interpretation to determine &amp;quot;referents&amp;quot; for phrases incrementally as they are recognized, rather than waiting for the entire parse to finish. The referent of a phrase is the object in the domain model that the phrase refers to. For example, the initial domain model consists of objects that have been created for entities which are known to the system in advance, such as airlines. When the name of an airline is recognized, such as &amp;quot;Delta&amp;quot;, its referent is the airline object, #&lt;airline delta&gt;. Referents for entities that cannot be anticipated, such as number sequences and individual airplanes, are created incrementally Controller Transmission: when the phrase is recognized. Figure 6 shows an example of the edges created by the parser and their referents.</Paragraph>
    <Paragraph position="2"> When a referent actually refers to an entity in the world, such as a runway or airplane, then the same referent object is cataloged and reused each time that entity is mentioned.</Paragraph>
    <Paragraph position="3"> The referent for a number sequence is a number object with the value the sequence represents. The referent for the entire phrase &amp;quot;Delta three five nine&amp;quot; is an object of type airplane. In some cases, the object will also be indexed by various subparts (such as indexing a flight ID by the digit portion of the ID) to aid in disambiguating incomplete subsequent references. For example, in the pilot reply in Figure 6, indexing allows the system to recognize that the number &amp;quot;three five nine&amp;quot; actually refers to the previously mentioned Delta flight.</Paragraph>
    <Paragraph position="4"> We extend the notion of referent from simply things in the world to utterance acts as well, such as commands. Each time a command is uttered, a new referent is created.</Paragraph>
    <Paragraph position="5"> Command referents are templates which are created when some core part is recognized and then added to compositional as other (generally optional) information is recognized. So following our earlier example of tower clearances, rules 4, 5, and 6 instantiate a takeoff clearance template and fill in the action type, whereas rules 7 and 8 fill in the &amp;quot;runway&amp;quot; field. We show examples of each of these groups and the templates in Figure 7 below:</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML