File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/w05-1525_evalu.xml

Size: 2,929 bytes

Last Modified: 2025-10-06 13:59:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1525">
  <Title>Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Generic parsing for multi-domain semantic interpretation</Title>
  <Section position="4" start_page="0" end_page="196" type="evalu">
    <SectionTitle>
3 Evaluation
</SectionTitle>
    <Paragraph position="0"> As a rough baseline, we compared the bracketing accuracy of our parser to that of a statistical parser (Bikel, 2002), Bikel-M, trained on 4294 TRIPS  parse trees from the Monroe corpus (Stent, 2001), task-oriented human dialogs in an emergency rescue domain. 100 randomly selected utterances were held out for testing. The gold standard for evaluation is created with the help of the parser (Swift et al., 2004). Corpus utterances are parsed, and the parsed output is checked by trained annotators for full-sentence syntactic and semantic accuracy, reliable with a kappa score 0.79. For test utterances for which TRIPS failed to produce a correct parse, gold standard trees were manually constructed independently by two linguists and reconciled. Table 1 shows results for the 100 test utterances and for the subset for which TRIPS nds a spanning parse (74).</Paragraph>
    <Paragraph position="1"> Bikel-M performs somewhat better on the bracketing task for the entire test set, which includes utterances for which TRIPS failed to nd a parse, but it is lower on complete matches, which are crucial for semantic interpretation.</Paragraph>
    <Paragraph position="2">  recall, P: precision, CM: complete match).</Paragraph>
    <Paragraph position="3"> Word senses are an important part of the LF representation, so we also evaluated TRIPS on word sense tagging against a baseline of the most common word senses in Monroe. There were 546 instances of ambiguous words in the 100 test utterances. TRIPS tagged 90.3% (493) of these correctly, compared to the baseline model of 75.3% (411) correct.</Paragraph>
    <Paragraph position="4"> To evaluate portability to new domains, we compared TRIPS full sentence accuracy on a subset of Monroe that underwent a fair amount of development (Tetreault et al., 2004) to corpora of keyboard tutorial session transcripts from new domains in basic electronics (BEETLE) and differentiation (LAM) (Table 2). The only development for these domains was addition of missing lexical items and two grammar rules. TRIPS full accuracy requires correct speech act, word sense and thematic role assignment as well as complete constituent match.</Paragraph>
    <Paragraph position="5"> Error analysis shows that certain senses and sub-categorization frames for existing words are still  accuracy in 3 domains (Acc: full accuracy; Cov.: # spanning parses; Prec: full acc. on spanning parses). needed in the new domains, which can be recti ed fairly quickly. Finding and addressing such gaps is part of bootstrapping a system in a new domain.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML