File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/p05-2020_concl.xml
Size: 1,737 bytes
Last Modified: 2025-10-06 13:54:43
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-2020"> <Title>Learning Information Structure in The Prague Treebank</Title> <Section position="7" start_page="119" end_page="119" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> In this paper we investigated the problem of learning aspects of Information Structure from annotated data. We presented results from a study trying to verify whether Information Structure can be learned using mostly syntactic features. We used the Prague Dependency Treebank which is annotated with IS following the Praguian theory of Topic Focus Articulation. The results show that we can reliably identify t(opic) and f(ocus) with over 82% Weighted Averaged F-score while the baseline is at 42%.</Paragraph> <Paragraph position="1"> Issues for further research include, on the one hand, a deeper investigation of the Topic-Focus Articulation in the Prague Dependency Treebank, by improving the feature set, considering also the distinction between contrastive and non-contrastive t items and, most importantly, by investigating how we can use the t/f annotation in PDT (and respectively our results) in order to detect the Topic/Focus partitioning of the whole sentence.</Paragraph> <Paragraph position="2"> On the other hand, we want to benefit from the experience with the Czech data in order to create an English corpus annotated with Information Structure. We want to exploit a parallel English-Czech corpus available as part of the PDT, in order to extract correlations between different linguistic dimensions and Topic/Focus in the Czech data and investigate how they can be transferred to the English version of the corpus.</Paragraph> </Section> class="xml-element"></Paper>