XML Viewer - h91-1016

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/h91-1016_metho.xml
Size: 8,488 bytes
Last Modified: 2025-10-06 14:12:44
<?xml version="1.0" standalone="yes"?>
<Paper uid="H91-1016">
  <Title>EVALUATION OF THE CMU ATIS SYSTEM</Title>
  <Section position="4" start_page="0" end_page="102" type="metho">
    <SectionTitle>
THE PHOENIX SYSTEM
</SectionTitle>
    <Paragraph position="0"> Some problems posed by spontaneous speech are:  * User noise - breath noise, filled pauses and other user generated noise * Environment noise - door slams, phone rings, etc. * Out-of-vocabulary words - The subject says words that the system doesn't know.</Paragraph>
    <Paragraph position="1"> * Grammatical coverage - Subjects often use grammatically ill-formed utterances and restart and repeat  phrases.</Paragraph>
    <Paragraph position="2"> Phoenix address these problems by using non-verbal sound models, an out-of-vocabulary word model and flexible parsing.  Models for sounds other than speech have been shown to significantly increase performance of HMM-based recognizers for noisy input. \[2\] \[4\] In this technique, additional models are added to the system that represent non-verbal sounds, just as word models represent verbal sounds. These models are trained exactly as ff they were word models, but using the noisy input. Thus, sounds that are not words are allowed to map onto tokens that are also not words.</Paragraph>
    <Paragraph position="3"> Out-of-vocabulary Word Model This module has not yet been implemented, In order to deal with out-of-vocabulary words, we will use a technique essentially like the one presented by BBN. \[5\] We will create an explicit model for out-of-vocabulary words. This model allows any triphone (context dependent phone) to follow any other triphone (given of course that the context is the same) with a bigram</Paragraph>
    <Section position="1" start_page="0" end_page="2" type="sub_section">
      <SectionTitle>
Flexible Parsing
</SectionTitle>
      <Paragraph position="0"> Our concept of flexible parsing combines ~ame based semantics with a semantic phrase grammar. We use a frame based parser similar to the DYPAR parser used by Carbonell, et al. to process ill-formed text, \[6\] and the MINDS system previously developed at CMU. \[7\] Semantic information is represented in a set of frames. Each blame contains a set of slots representing pieces of information. In order to fill the slots in the frames, we use a partitioned semantic phrase grammar. Each slot type is represented by a separate finite-state network which specifies all ways of saying the meaning represented by the slot. The grammar is a semantic grammar, non-terminals are semantic concepts instead of parts of speech. The grammar is also written so that phrases can stand alone (be recognized by a net) as well as being embedded in a sentence. Strings of phrases which do not form a grammatical English sentence are still parsed by the system. The grammar is compiled into a set of finite-state networks. It is partitioned in the sense that, instead of one big network, there are many small networks. Networks can &amp;quot;call&amp;quot; other networks, thereby significantly reducing the overall size of the system.</Paragraph>
      <Paragraph position="1"> These networks are used to perform pattern matches against input word strings. This general approach has been described in earlier papers. \[1\] \[3\] The operation of the parser can be viewed as &amp;quot;phrase spotting&amp;quot;. A beam of possible interpretations are pursued simultaneously.</Paragraph>
      <Paragraph position="2"> An interpretation is a frame with some of its slots filled. The f'mite-state networks perform pattern matches against the input string. When a phrase is recognized, it attempts to extend all current interpretations. That is, it is assigned to slots in active interpretations that it can fill. Phrases assigned to slots in the same interpretation are not allowed to overlap. In ease of overlap, multiple interpretations are produced. When two interpretations for the same frame end with the same phrase, the lower scoring one is pruned. This amounts to dynamic programming on series of phrases. The score for an interpretation is the number of input words that it accounts for. At the end of the utterance, the best scoring interpretation is output.</Paragraph>
      <Paragraph position="3"> In our system, slots (pattern specifications) can be at different levels in a hierarchy. Higher level slots can contain the information specified in several lower level slots. These higher level forms allow more specific relations between the lower level slots to be specified. In the utterance &amp;quot;leaving denver and arriving in boston after five pro&amp;quot;, &amp;quot;leaving denver&amp;quot; is a \[deparUloc\] and &amp;quot;arriving in boston&amp;quot; is an \[arrive loci, but there is ambiguity as to whether &amp;quot;after 5 pro&amp;quot; is \[depart_time_range\] or \[arrive_timejange\]. The existence of the higher level slot \[ARRIVE\] allows this to be resolved. One rewrite for the slot \[ARRIVE\] is (\[arrive loc\] \[arrive_time range\]) in which the two lower level slots are specfically associated. Thus two interpretations for this utterance are produced, leaving denver and arriving in boston after 5 pm</Paragraph>
      <Paragraph position="5"> \[arrive_loc\] arriving in boston \[depart time_range\] after 5 pm  \[depart_loc \] leaving denver \[ARRIVE\] \[arrive_loc\] arriving in boston \[arrive time range\] after 5 pm In picking which interpretation is correct, higher level slots are preferred to lower level ones because the associations between concepts is more tightly bound, thus the second (correct) interpretation is picked here.</Paragraph>
      <Paragraph position="6"> Our strategy is to apply grammatical constraints at the phrase level and to associate phrases in frames. Phrases represent word strings that can fill slots in frames. The slots represent information which, taken together, the frame is able to act on. We also use semantic rather than lexical grammars, Semantics provide more constraint than parts of speech and must ultimately be delt with in order to take actions. Applying constraints at the phrase level is more flexible than recognizing sentences as a whole while providing much more constraint than word-spotting. Restarts and repeats are most often between phases, so individual phrases can still be recognized correctly. Poorly constructed grammar often consists of well-formed phrases, and is often semantically well-formed. It is only syntactically incorrect.</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="102" type="sub_section">
      <SectionTitle>
System Structure
</SectionTitle>
      <Paragraph position="0"> The overall structure of our current system is shown in Figure 1. We use the Sphinx system as our recognizer module \[8\].</Paragraph>
      <Paragraph position="1"> Sphinx is a speaker independent continuous speech recognition system.</Paragraph>
      <Paragraph position="2"> Curremly the recognizer and parser are not integrated. The speech input is digitized and vector quantized and then passed to the Sphinx recognizer. The recognizer uses a bigram language model to produce a single best word string from the speech input. This word string is then passed to the frame-based parser which assigns word slxings to slots in frames as explained above.</Paragraph>
      <Paragraph position="3"> The slots in the best scoring frame are then used to build objects. In this process, all dates, times, names, etc. are mapped into a standard form for the routines that build the database query. The objects represent the information that was extracted from the utterance. There is also a currently active set of objects which represent constraints from previous utterances. The new objects created from the frame are merged with the current set of objects. At this step ellipsis and anaphora are resolved. Resolution of ellipsis and anaphora is relatively simple in this system. The slots in frames are semantic, thus we know the type of object needed for the resolution. For ellipsis, we add the new objects.</Paragraph>
      <Paragraph position="4"> For anaphora, we simply have to check that an object of that type already exists.</Paragraph>
      <Paragraph position="5"> Each frame has an associated function. After the information is  function takes the action appropriate for the frame. It builds a database query (if appropriate) from objects, sends it to SYBASE (the DataBase Management System we use) and displays output to the user.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML