File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-1011_intro.xml
Size: 2,174 bytes
Last Modified: 2025-10-06 14:02:18
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-1011"> <Title>Sentence-Internal Prosody Does not Help Parsing the Way Punctuation Does</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Method </SectionTitle> <Paragraph position="0"> The data used for this study is the transcribed version of the Switchboard Corpus as released by the Linguistic Data Consortium. The Switchboard Corpus is a corpus of telephone conversations between adult speakers of varying dialects. The corpus was split into training and test data as described in Charniak and Johnson (2001). The training data consisted of all les in sections 2 and 3 of the Switchboard treebank. The testing corpus consists of les sw4004.mrg to sw4153.mrg, while les sw4519.mrg to sw4936.mrg were used as development corpus.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Prosodic variables </SectionTitle> <Paragraph position="0"> Prosodic information for the corpus was obtained from forced alignments provided by Hamaker et al. (2003) and Ferrer et al. (2002).</Paragraph> <Paragraph position="1"> Hamaker et al. (2003) provided word alignments between the LDC parsed corpus and new alignments of the Switchboard Coprus. Most of the differences between the two alignments were individual lexical items. In cases of differences, we kept the lexical item from the LDC version. Ferrer et al. (2002) provided very rich prosodic information including duration, pausing, f0 information, and individual speaker statistics for each word in the corpus. The information obtained from this corpus was aligned to the LDC corpus.</Paragraph> <Paragraph position="2"> It is not known exactly which prosodic variables convey the information about syntactic boundaries that is most useful to a modern syntactic parser, so we investigated many different combinations of these variables. We looked for changes in pitch and duration that we expected would correspond to syntactic boundaries. While we tested many combinations of variables, they were mainly based on the variables PAU DUR N,</Paragraph> </Section> </Section> class="xml-element"></Paper>