File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0307_intro.xml

Size: 1,904 bytes

Last Modified: 2025-10-06 14:03:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0307">
  <Title>A Framework for Annotating Information Structure in Discourse</Title>
  <Section position="4" start_page="0" end_page="45" type="intro">
    <SectionTitle>
2 Corpus and Tools
</SectionTitle>
    <Paragraph position="0"> The Switchboard Corpus (Godfrey et al., 1992) consists of 2430 spontaneous phone conversations (average six minutes), between speakers of American English, for three million words. The corpus is distributed as stereo speech signals with an orthographic transcription per channel time-stamped at the word level. A third of this is syntactically parsed as part of the Penn Treebank (Marcus et al., 1993) and has dialog act annotation (Shriberg et al., 1998).</Paragraph>
    <Paragraph position="1"> We used a subset of this. In adherence with current standards, we converted all the existing annotations, and are producing the new discourse annotations in a coherent multi-layered XML-conformant schema, using NXT technology (Carletta et al., 2004).1 This allows us to search over and integrate information from the many layers of annotation, including the 1Beside the NXT tools, we also used the TIGER Switchboard lter (Mengel and Lezius, 2000) for the XMLconversion. Using existing markup we automatically selected and ltered NPs to be annotated, excluding locative, directional, and adverbial NPs and dis uencies, and adding possessive pronouns. See (Nissim et al., 2004) for technical details.</Paragraph>
    <Paragraph position="2">  sound les. NXT tools can be easily customised to accommodate different layers of annotation users want to add, including data sets that have low-level annotations time-stamped against a set of synchronized signals, multiple, crossing tree structures, and connection to external corpus resources such as gesture ontologies and lexicons (Carletta et al., 2004).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML