File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/05/p05-2020_abstr.xml

Size: 952 bytes

Last Modified: 2025-10-06 13:44:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-2020">
  <Title>Learning Information Structure in The Prague Treebank</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper investigates the automatic identification of aspects of Information Structure (IS) in texts. The experiments use the Prague Dependency Treebank which is annotated with IS following the Praguian approach of Topic Focus Articulation. We automatically detect t(opic) and f(ocus), using node attributes from the treebank as basic features and derived features inspired by the annotation guidelines. We show the performance of C4.5, Bagging, and Ripper classifiers on several classes of instances such as nouns and pronouns, only nouns, only pronouns. A baseline system assigning always f(ocus) has an F-score of 42.5%. Our best system obtains 82.04%.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML