File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-1502_intro.xml

Size: 3,576 bytes

Last Modified: 2025-10-06 14:06:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1502">
  <Title>The TreeBanker: a Tool for Supervised Training of Parsed Corpora</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In a language understanding system where full, linguistically-motivated analyses of utterances are desired, the linguistic analyser needs to generate possible semantic representations and then choose the one most likely to be correct. If the analyser is a component of a pipelined speech understanding system, the problem is magnified, as the speech recognizer will typically deliver not a word string but an N-best list or a lattice; the problem then becomes one of choosing between multiple analyses of several competing word sequences.</Paragraph>
    <Paragraph position="1"> In practice, we can only come near to satisfactory disambiguation performance if the analyser is trained on a corpus of utterances from the same source (domain and task) as those it is intended to process. Since this needs to be done afresh for each new source, and since a corpus of several thousand sentences will normally be needed, economic considerations mean it is highly desirable to do it as automatically as possible. Furthermore, those aspects that cannot be automated should as far as possible not depend on the attention of experts in the system and in the representations it uses.</Paragraph>
    <Paragraph position="2"> The Spoken Language Translator (SLT; Becket et al, forthcoming; Rayner and Carter, 1996 and 1997) is a pipelined speech understanding system of the type assumed here. It is constructed from general-purpose speech recognition, language processing and speech synthesis components in order to allow relatively straightforward adaptation to new domains. Linguistic processing in the SLT system is carried out by the Core Language Engine (CLE; Alshawi, 1992). Given an input string, N-best list or lattice, the CLE applies unification-based syntactic rules and their corresponding semantic rules to create zero or more quasi-logical form (QLF, described below; Alshawi, 1992; Alshawi and Crouch, 1992) analyses of it; disambiguation is then a matter of selecting the correct (or at least, the best available) QLF.</Paragraph>
    <Paragraph position="3"> This paper describes the TreeBanker, a program that facilitates supervised training by interacting with a non-expert user and that organizes the results of this training to provide the CLE with data in an appropriate format. The CLE uses this data to analyse speech recognizer output efficiently and to choose accurately among the interpretations it creates. I assume here that the coverage problem has been solved to the extent that the system's grammar and lexicon license the correct analyses of utterances often enough for practical usefulness (Rayner, Bouillon and Carter, 1995).</Paragraph>
    <Paragraph position="4"> The examples given in this paper are taken from the ATIS (Air Travel Inquiry System; Hemphill et al, 1990) domain. However, wider domains, such as that represented in the North American Business News (NAB) corpus, would present no particular problem to the TreeBanker as long as the (highly non-trivial) coverage problems for those domains were close enough to solution. The examples given here are in fact all for Englis\]h, but the TreeBanker has also successfully been used for Swedish and French customizations of the CLE (GambPSck and Rayner, 1992; Rayner, Carter and Bouillon, 1996).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML