File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0418_intro.xml
Size: 3,857 bytes
Last Modified: 2025-10-06 14:01:54
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0418"> <Title>Identifying Events using Similarity and Context</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Early work in natural language processing included ambitious research on the representation and use of information about commonly experienced situations (Schank and Riesbeck, 1981). The concept of a script was introduced in this research, to explain how people understand these situations and make inferences about them. A script is a stereotypical sequence of events that occur as part of a larger situation and can be used to infer missing details from a partial description of the larger occurrence, in essence providing a means for extracting information that is not actually present in a text.</Paragraph> <Paragraph position="1"> Research on scripts includes demonstrations of hand built scripts (Cullingford, 1978) and sketchy scripts (De-Jong, 1982) and the adjustment of hand-built scripts using a genetic algorithm (Mauldin, 1989). Work on learning schemata under constrained circumstances (Mooney and DeJong, 1985) pursues similar goals.</Paragraph> <Paragraph position="2"> Our research has indicated that scripts may not explicitly occur in common types of text, such as newspaper stories or incident reports. Other research also appears to support this conclusion (Clark and Porter, 1995). Therefore, we are investigating event correlations as a more appropriate and extractable knowledge structure. In general, it appears that long event sequences do not reliably recur in our data. We instead look for reliable correlations between a small number of events.</Paragraph> <Paragraph position="3"> Our goal is to automatically extract correlated events from text, using only a partial parser as outside information. To support this goal, we need to group clauses from distinct texts into coherent events, handling several sources of variety in descriptions of the same type of occurrence. Synonymy and abbreviations are two common contributors. A more important phenomenon is the existence of semantic categories keyed to the events themselves. A number of different objects may participate in an event, and yet have dissimilarities that place them in different conventional semantic categories. For example, a tree and a parked vehicle may both be collided with in different aircraft crashes, yet it is difficult to conceive of a reasonably specific semantic category that contains both.</Paragraph> <Paragraph position="4"> Each is a physical object, yet there are a large number of other physical objects that would not reasonably participate in a crash in the same way (books, hamburgers, and moons are a few examples).</Paragraph> <Paragraph position="5"> As a result of these phenomena, conventional semantic lexicons, whether hand built or automatically generated, differ from our work in two regards. First, they group words, not clauses. Second, they use pre-defined semantic categories instead of contextual relevance.</Paragraph> <Paragraph position="6"> Our answer to this problem is a technique that uses textual similarity and context from neighboring events to decide when to group clauses. The only outside resource we use is a partial parser. Our technique takes parsed text and partially built event sequences and uses them to group clauses that represent the same event.</Paragraph> <Paragraph position="7"> In the remainder of this paper, we present a brief overview of the sequence learning system before describing how we create events. We also evaluate our event formation technique using human judges' ratings of the cohesiveness of the resulting events. Finally, we discuss areas of related work before concluding the paper.</Paragraph> </Section> class="xml-element"></Paper>