File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-2024_intro.xml
Size: 1,752 bytes
Last Modified: 2025-10-06 14:01:44
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-2024"> <Title>References to Named Entities: a Corpus Study</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Automatically generated summaries, and particularly multi-document summaries, suffer from lack of coherence One explanation is that the most widespread summarization strategy is still sentence extraction, where sentences are extracted word for word from the original documents and are strung together to form a summary. Syntactic form and its influence on summary coherence have not been taken into account in the implementation of a full-fledged summarizer, except in the preliminary work of (Schiffman et al., 2002).</Paragraph> <Paragraph position="1"> Here we conduct a corpus study focusing on identifying the syntactic properties of first and subsequent mentions of people in newswire text (e.g., &quot;Chief Petty Officer Luis Diaz of the U.S. Coast Guard in Miami&quot; followed by &quot;Diaz&quot;). The resulting statistical model of the flow of referential expressions suggest a set of rewrite rules that can transform the summary back to a more coherent and readable text.</Paragraph> <Paragraph position="2"> In the following sections, we first describe the corpus that we used and then the statistical model that we developed. It is based on Markov chains and captures how subsequent mentions are conditioned by earlier mentions.</Paragraph> <Paragraph position="3"> We close with discussion of our evaluation, which measures how well the highest probability path in the model can be used to regenerate the sequence of references.</Paragraph> </Section> class="xml-element"></Paper>