File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-3021_metho.xml
Size: 7,834 bytes
Last Modified: 2025-10-06 14:09:49
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-3021"> <Title>Automating Temporal Annotation with TARSQI</Title> <Section position="5" start_page="81" end_page="81" type="metho"> <SectionTitle> 3 EVITA </SectionTitle> <Paragraph position="0"> Evita (Events in Text Analyzer) is an event recognition tool that performs two main tasks: robust event identification and analysis of grammatical features, such as tense and aspect. Event identification is based on the notion of event as defined in TimeML.</Paragraph> <Paragraph position="1"> Different strategies are used for identifying events within the categories of verb, noun, and adjective.</Paragraph> <Paragraph position="2"> Event identification of verbs is based on a lexical look-up, accompanied by a minimal contextual parsing, in order to exclude weak stative predicates such as be or have. Identifying events expressed by nouns, on the other hand, involves a disambiguation phase in addition to lexical lookup. Machine learning techniques are used to determine when an ambiguous noun is used with an event sense. Finally, identifying adjectival events takes the conservative approach of tagging as events only those adjectives that have been lexically pre-selected from TimeBank1, whenever they appear as the head of a predicative complement. For each element identified as denoting an event, a set of linguistic rules is applied in order to obtain its temporally relevant grammatical features, like tense and aspect. Evita relies on preprocessed input with part-of-speech tags and chunks. Current performance of Evita against TimeBank is .75 precision, .87 recall, and .80 Fmeasure. The low precision is mostly due to Evita's over-generation of generic events, which were not annotated in TimeBank.</Paragraph> </Section> <Section position="6" start_page="81" end_page="82" type="metho"> <SectionTitle> 4 GUTenLINK </SectionTitle> <Paragraph position="0"> Georgetown's GUTenLINK TLINK tagger uses hand-developed syntactic and lexical rules. It handles three different cases at present: (i) the event is anchored without a signal to a time expression within the same clause, (ii) the event is anchored without a signal to the document date speech time frame (as in the case of reporting verbs in news, which are often at or offset slightly from the speech time), and (iii) the event in a main clause is anchored with a signal or tense/aspect cue to the event in the main clause of the previous sentence. In case (iii), a finite state transducer is used to infer the likely temporal relation between the events based on TimeML tense and aspect features of each event. For example, a past tense non-stative verb followed by a past perfect non-stative verb, with grammatical aspect maintained, suggests that the second event precedes the first.</Paragraph> <Paragraph position="1"> GUTenLINK uses default rules for ordering events; its handling of successive past tense non-stative verbs in case (iii) will not correctly order sequences like Max fell. John pushed him.</Paragraph> <Paragraph position="2"> GUTenLINK is intended as one component in a larger machine-learning based framework for ordering events. Another component which will be developed will leverage document-level inference, as in the machine learning approach of (Mani et al., 2003), which required annotation of a reference time (Reichenbach, 1947; Kamp and Reyle, 1993) for the event in each finite clause.</Paragraph> <Paragraph position="3"> An early version of GUTenLINK was scored at .75 precision on 10 documents. More formal Precision and Recall scoring is underway, but it compares favorably with an earlier approach developed at Georgetown. That approach converted event-event TLINKs from TimeBank 1.0 into feature vectors where the TLINK relation type was used as the class label (some classes were collapsed). A C5.0 decision rule learner trained on that data obtained an accuracy of .54 F-measure, with the low score being due mainly to data sparseness.</Paragraph> </Section> <Section position="7" start_page="82" end_page="82" type="metho"> <SectionTitle> 5 Slinket </SectionTitle> <Paragraph position="0"> Slinket (SLINK Events in Text) is an application currently being developed. Its purpose is to automatically introduce SLINKs, which in TimeML specify subordinating relations between pairs of events, and classify them into factive, counterfactive, evidential, negative evidential, and modal, based on the modal force of the subordinating event. Slinket requires chunked input with events.</Paragraph> <Paragraph position="1"> SLINKs are introduced by a well-delimited sub-group of verbal and nominal predicates (such as regret, say, promise and attempt), and in most cases clearly signaled by the context of subordination.</Paragraph> <Paragraph position="2"> Slinket thus relies on a combination of lexical and syntactic knowledge. Lexical information is used to pre-select events that may introduce SLINKs. Predicate classes are taken from (Kiparsky and Kiparsky, 1970; Karttunen, 1971; Hooper, 1975) and subsequent elaborations of that work, as well as induced from the TimeBank corpus. A syntactic module is applied in order to properly identify the subordinated event, if any. This module is built as a cascade of shallow syntactic tasks such as clause boundary recognition and subject and object tagging. Such tasks are informed from both linguistic-based knowledge (Papageorgiou, 1997; Leffa, 1998) and corpora-induced rules (Sang and D'ej'ean, 2001); they are currently being implemented as sequences of finite-state transducers along the lines of (A&quot;it-Mokhtar and Chanod, 1997). Evaluation results are not yet available.</Paragraph> </Section> <Section position="8" start_page="82" end_page="82" type="metho"> <SectionTitle> 6 SputLink </SectionTitle> <Paragraph position="0"> SputLink is a temporal closure component that takes known temporal relations in a text and derives new implied relations from them, in effect making explicit what was implicit. A temporal closure component helps to find those global links that are not necessarily derived by other means. SputLink is based on James Allen's interval algebra (1983) and was inspired by (Setzer, 2001) and (Katz and Arosio, 2001) who both added a closure component to an annotation environment.</Paragraph> <Paragraph position="1"> Allen reduces all events and time expressions to intervals and identifies 13 basic relations between the intervals. The temporal information in a document is represented as a graph where events and time expressions form the nodes and temporal relations label the edges. The SputLink algorithm, like Allen's, is basically a constraint propagation algorithm that uses a transitivity table to model the compositional behavior of all pairs of relations. For example, if A precedes B and B precedes C, then we can compose the two relations and infer that A precedes C. Allen allowed unlimited disjunctions of temporal relations on the edges and he acknowledged that inconsistency detection is not tractable in his algebra. One of SputLink's aims is to ensure consistency, therefore it uses a restricted version of Allen's algebra proposed by (Vilain et al., 1990). Inconsistency detection is tractable in this restricted algebra. null A SputLink evaluation on TimeBank showed that SputLink more than quadrupled the amount of temporal links in TimeBank, from 4200 to 17500.</Paragraph> <Paragraph position="2"> Moreover, closure adds non-local links that were systematically missed by the human annotators. Experimentation also showed that temporal closure allows one to structure the annotation task in such a way that it becomes possible to create a complete annotation from local temporal links only. See (Verhagen, 2004) for more details.</Paragraph> </Section> class="xml-element"></Paper>