File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0703_intro.xml
Size: 3,815 bytes
Last Modified: 2025-10-06 14:02:28
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0703"> <Title>Event Clustering on Streaming News Using Co-Reference Chains and Event Words</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> News, which is an important information source, is reported anytime and anywhere, and is disseminated across geographic barriers through Internet. Detecting the occurrences of new events and tracking the processes of the events (Allan, Carbonell, and Yamron, 2002) are useful for decision-making in this fast-changing network era.</Paragraph> <Paragraph position="1"> Event clustering automatically groups documents by events that are specified in the documents in a temporal order. The research issues behind event clustering include: how many features can be used to determine event clusters, which cue patterns can be employed to relate news stories in the same event, how the clustering strategies affect the clustering performance using retrospective data or on-line data, how the time factor affects clustering performance, and how multilingual data is clustered.</Paragraph> <Paragraph position="2"> Chen and Ku (2002) considered named entities, other nouns and verbs as cue patterns to relate news stories describing the same event. A centroid-based approach with a two-threshold scheme determines relevance (irrelevance) between a news story and a topic cluster. A leastrecently-used removal strategy models the time factor in such a way that older and unimportant terms will have no effect on clustering. Chen, Kuo and Su (2003) touched on event clustering in multilingual multi-document summarization. They showed that translation after clustering is better than translation before clustering, and translation deferred to sentence clustering, which reduces the propagation of translation errors, is most promising. Fukumoto and Suzuki (2000) proposed concepts of topic words and event words for event tracking.</Paragraph> <Paragraph position="3"> They introduced more semantic approach for feature selection than the approach of parts of speech. Wong, Kuo and Chen (2001) employed these concepts to select informative words for headline generation, and to rank the extracted sentences in multi-document summarization (Kuo, Wong, Lin, and Chen, 2002).</Paragraph> <Paragraph position="4"> Bagga and Baldwin (1998) proposed entity-based cross-document co-referencing which uses co-reference chains of each document to generate its summary and then use the summary rather than the whole article to select informative words to be the features of the document. Azzam, Humphreys, and Gaizauskas (1999) proposed a primitive model for text summarization using co-reference chains as well. Silber and McCoy (2002) proposed a text summarization model using lexical chains and showed that proper nouns and anaphora resolution is indispensable.</Paragraph> <Paragraph position="5"> The two semantics-based feature selection approaches, i.e., co-reference chains and event words, are complementary in some sense. The former denotes equivalence classes of noun phrases, and the latter considers both nominal and verbal features, which appear across paragraphs.</Paragraph> <Paragraph position="6"> This paper will employ both co-reference chains and event words for temporal event clustering. An event clustering system using co-reference chains is described in Section 2. The evaluation method and the related experimental results are described in Section 3. The event words are introduced and discussed in Section 4. Section 5 proposes a summation model and a two-level model, respectively for event clustering using both co-reference chains and event words. Section 6 concludes the remarks.</Paragraph> </Section> class="xml-element"></Paper>