File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/w01-1315_metho.xml
Size: 18,914 bytes
Last Modified: 2025-10-06 14:07:44
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-1315"> <Title>The Annotation of Temporal Information in Natural Language Sentences</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Temporal annotation </SectionTitle> <Paragraph position="0"> The idea of developing a treebank enriched with semantic information is not new. In particular such semantically annotated corpora have been used in research on word sense disambiguation (wordNet, Eagles, Simple) and semantics role interpretation (Eagles). The public availability of large syntactically annotated treebanks (Penn, Verbmobil, Negra) makes such work attractive, particularly in light of the success that empirical methods have had (Kilgarriff & Rosenzweig 2000). Traditional semantic representational formalisms such as DRT (Kamp & Reyle 1993) are ill suited to semantic annotation. Since these formalisms are developed in the service of theories of natural language interpretation, they are - rightly - both highly articulated and highly constrained. In short, they are often too complex and sometimes not expressive enough for the purposes at hand (as the experience of Poesio et.</Paragraph> <Paragraph position="1"> al. (1999) makes clear). Our proposal here is to adopt a radically simplified semantic formalism which, by virtue of its simplicity, is suited the temporal-annotation application.</Paragraph> <Paragraph position="2"> The temporal interpretation of a sentence, for our purposes, can simply be taken to be the set of temporal relations that a speaker naturally takes to hold among the states and events described by the verbs of the sentence. To put it more formally, we associate with each verb a temporal interval, and concern ourselves with relations among these intervals. Of the interval relations discussed by Allen (1984), we will be concerned with only two: precedence and inclusion. Taking t talk to be the time of talking</Paragraph> <Paragraph position="4"> to be the time of asking and t remember to be the time of remembering, the temporal interpretation (1c), for example, can be given by the following table:</Paragraph> <Paragraph position="6"> Such a table, in effect, stores the native speaker's judgement about the most natural temporal interpretation of the sentence.</Paragraph> <Paragraph position="7"> Since our goal was to annotate a large number of sentences with their temporal interpretations, with the goal of examining the interaction between the lexical and syntactic structure, it was imperative that the interpretation be closely tied to its syntactic context. We needed to keep track of both the semantic relations among times referred to by the words in a sentence and the syntactic relations among the words in the sentences that refer to these times, but not much more. By adopting existing technology for syntactic annotation, we were able do this quite directly, by essentially building the information in this table into the syntax.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 The annotation system </SectionTitle> <Paragraph position="0"> To carry out our temporal annotation, we made use of the Annotate tool for syntactic annotation developed in Saarbrucken by Brants and Plaehn (2000). We exploited an aspect of the system originally designed for the annotation of anaphoric relations: the ability to link two arbitrary nodes in a syntactic structure by means of labeled &quot;secondary edges.&quot; This allowed us to add a layer of semantic annotation directly to that of syntactic annotation.</Paragraph> <Paragraph position="1"> A sentence was temporally annotated by linking the verbs in the sentence via secondary edges labeled with the appropriate temporal relation.</Paragraph> <Paragraph position="2"> As we were initially only concerned with the relations of precedence and inclusion, we only had four labels: &quot;<&quot; , &quot;[?]&quot;, and their duals. Sentence (1a), then, is annotated as in (2).</Paragraph> <Paragraph position="3"> (2) John kissed the girl he met at the party The natural ordering relation between the kissing and the meeting is indicated by the labeled edge. Note that the edge goes from the verb associated with the event that fills the first argument of the relation to the verb associated with the event that fills the second argument of the relation.</Paragraph> <Paragraph position="4"> The annotation of (1c), which was somewhat more complex, indicates the two relations that hold among the events described by the sentence.</Paragraph> <Paragraph position="5"> (3) He remembered talking and asking her name In addition to encoding the relations among the events described in a sentence, we anticipated that it would be useful to encode also the relationship between these events and the time at which the sentence is produced. This is, after all, what tenses usually convey. To encode this temporal indexical information, we introduce into the annotation an explicit representation of the speech time. This is indicated by the &quot; deg &quot; symbol, which is automatically prefaced to all sentences prior to annotation.</Paragraph> <Paragraph position="6"> The complete annotation for sentence (1a), then, is (4).</Paragraph> <Paragraph position="7"> the girl he met at the party As we see in (5), this coding scheme enables us to represent the different interpretations that past tensed and present tensed clauses have.</Paragraph> <Paragraph position="8"> the girl who is at the party Notice that we do not annotate the tenses themselves directly.</Paragraph> <Paragraph position="9"> Note that in the case of reported speech, the time associated with the embedding verb plays, for the embedded sentence, much the same role that the speech time plays for the main clause.</Paragraph> <Paragraph position="10"> Formally, in fact, the relational analysis implicit in our notation makes it possible to avoid many of the problems associated with the treatment of these constructions (such as those discussed at length by von Stechow (1995)). We set these issues aside here.</Paragraph> <Paragraph position="11"> It should be clear that we are not concerned with giving a semantics for temporal markers, but rather with providing a language within which we can describe the temporal information conveyed by natural language sentences. With the addition of temporal indexical annotation, our annotation system gains enough expressive power to account for most of the relational information conveyed by natural language sentences. Left out at this point is temporalmetrical information such as that conveyed by the adverbial &quot;two hours later.&quot;</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Annotation procedure </SectionTitle> <Paragraph position="0"> The annotation procedure itself is quite straightforward. We begin with a syntactically annotated treebank and add the speech time marker to each of the sentences. The annotator then simply marks the temporal relations among verbs and the speech time for each sentence in the corpus. This is accomplished in accordance with the following conventions: (ii) the edge goes from the element that fills the first argument of the relation to the element that fills the second; (iii) edge labels indicate the temporal relation that holds; (iv) edge labels can be &quot;>&quot;, &quot;<&quot;, &quot;[?]&quot; and &quot;[?]&quot; Annotators are instructed to annotate the sentences as they naturally understand them. When the treebank is made up of a sequence of connected text, the annotators are encouraged to make use of contextual information.</Paragraph> <Paragraph position="1"> The annotation scheme is simple, explicit and theory neutral. The annotator needs only to exercise his native competence in his language and he doesn't need any special training in temporal semantics or in any specific formal language; in pilot studies we have assembled small temporal annotated databases in few hours. Our current database consists of 300 sentences from six languages.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Comparing annotations </SectionTitle> <Paragraph position="0"> It is well known that hand-annotated corpora are prone to inconsistency (Marcus, Santorini & Marcinkiewicz, 1993) and to that end it is desirable that the corpus be multiply annotated by different annotators and that these annotations be compared. The kind of semantic annotation we are proposing here introduces an additional complexity to inter-annotation comparison, in that the consistency of an annotation is best defined not in formal terms but in semantic terms. Two annotations should be taken to be equivalent, for example, if they express the same meanings, even if they use different sets of labeled edges.</Paragraph> <Paragraph position="1"> To make explicit what semantic identity is, we provide our annotations with a model theoretic interpretation. The annotations are interpreted with respect to a structure D,<,[?] , where D is the domain (here the set of verbs tokens in the corpus) and < and [?] are binary relations on D.</Paragraph> <Paragraph position="2"> Models for this structure are assignments of pairs of entities in D to < and [?] satisfying the following axioms: - [?]x,y,z. x<y & y<z - x<z - [?]x,y,z. x[?]y & y[?]z - x[?]z - [?]w,x,y,z. x<y & z[?]x & w[?]y - z<w - [?]w,x,y,z. x<y & y<z & x [?]w & z[?]w - y[?]w Thus < and [?] have the properties one would expect for the precedence and inclusion relation. We are assuming that in the cases of interest verbs refer to simply convex events. Intuitively, the set of verb tokens in the corpus corresponds the set of times at which an event or state of the type indicated by the verb takes place or holds. In our corpus the number of sentences that involved quantified or generic event reference was quite low.</Paragraph> <Paragraph position="3"> An annotated relation of the following form iff all relations associated with the sentence are satisfied by the model. Intuitively an annotated is satisfied by a model if the model assigns the appropriate relation to the verbs occurring in the sentence.</Paragraph> <Paragraph position="4"> There are four semantic relations that can hold among between annotations. These can be defined in model-theoretic terms: * Annotation A and B are equivalent if all models satisfying A satisfy B and all models satisfying B satisfy A.</Paragraph> <Paragraph position="5"> * Annotation A subsumes annotation B iff all models satisfying B satisfy A.</Paragraph> <Paragraph position="6"> * Annotations A and B are consistent iff there are models satisfying both A and B.</Paragraph> <Paragraph position="7"> * Annotations A and B are inconsistent if there are no models satisfying both A and B. We can also define the minimal model satisfying an annotation in the usual way. We can then compute a distance measure between two annotations by comparing set of models satisfying the annotations. Let M In other words, the distance is the number of relation pairs that are not shared by the annotations normalized by the number that they R do share. We can use this metric to quantify the &quot;goodness&quot; of both annotations and annotators. Consider again (1c). We gave one annotation for this in (3). In (6) and (7) there are two alternative annotations.</Paragraph> <Paragraph position="8"> (6) He remembered talking and asking her name (7) He remembered talking and asking her name As we can compute on the basis of the semantics for the annotations (6) is equivalent with (3) they are no distance apart, while (7) is inconsistent with (3) - they are infinitely far apart. The annotation (8) is compatible (7) and is a distance of 1 away from it.</Paragraph> <Paragraph position="9"> (8) He remembered talking and asking her name As in the case of structural annotation, there are a number of ways of resolving inter-annotator variation. We can chose the most informative annotation as the correct one, or the most general. Or we can combine annotations. The intersection of two compatible annotations gives an equally compatible annotation which contains more information than either of the two alone. We do not, as of yet, have enough data to determine which of these strategies is most effective.</Paragraph> <Paragraph position="10"> In preliminary work, we had two annotators annotate 50 complex sentences extracted randomly from the BNC. The results were quite encouraging. Although the annotations were identical in only 70% of the cases, the annotations were semantically consistent in 85% of the cases.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Applications of temporal </SectionTitle> <Paragraph position="0"> annotation There are any number of applications for a temporally annotated corpus such as that we have been outlining. Lexicon induction is the most interesting, but, as we indicated at the outset, this is a long-term project, as it requires a significant investment in hand annotation. We hope to get around this problem. But even still, there are a number of other applications which require less extensive corpora, but which are of significant interest. One of these has formed the initial focus of our research, and this is the development of a searchable multilingual database.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Multilingual database </SectionTitle> <Paragraph position="0"> Our annotation method has been applied to sentences from a variety of languages, creating a searchable multi-language treebank. This database allows us to search for sentences that express a given temporal relation in a language.</Paragraph> <Paragraph position="1"> We have already developed a pilot multilingual database with sentences from the Verbmobil database (see an example in fig. 1) and we have developed a query procedure in order to extract relevant information.</Paragraph> <Paragraph position="2"> Fig.1 A temporally annotated sentence from the Verbmobil English treebank as displayed by @nnotate.</Paragraph> <Paragraph position="3"> As can be seen, the temporal annotation is entirely independent of the syntactic annotation. In the context of the Annotate environment a number of tools have been developed (and are under development) for the querying of structural relations. Since each sentence is stored in the relational database with both syntactic and temporal semantic annotations, it is possible to make use of these querying tools to query on structures, on meanings, and on structures and meanings together. For example a query such as: &quot;Find the sentences containing a relative clause which is interpreted as temporally overlapping the main clause&quot; can be processed. This query is</Paragraph> <Paragraph position="5"> encoded as a partially specified tree, as indicated below: In this structure, both the syntactic configuration of the relative clause and the temporal relations between the matrix verb and the speech time and between the matrix verb and the verb occurring in the relative clause are represented. Querying our temporally annotated treebank with this request yields the following result: The application to cross-linguistic research should be clear. It is now possible to use the annotated tree-bank as an informant by storing the linguistically relevant aspects of the temporal system of a language in a compact searchable database.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Aid for translation technology </SectionTitle> <Paragraph position="0"> Another potential application of the annotation system is as an aid to automatic translation systems. That the behaviour of tenses differ from language to language makes the translation of tenses difficult. In particular, the application of example-based techniques faces serious difficulties (Arnold, et. al. 1994). Adding the intended temporal relation to the database of source sentences makes it possible to moderate this problem.</Paragraph> <Paragraph position="1"> For example in Japanese (9a) is properly translated as (10a) on one reading, where the embedded past tense is translated as a present tense, but as (10b) on the other, where the verb is translated as a past tense.</Paragraph> <Paragraph position="2"> (9) a. Bernard said that Junko was sick (10)a. Bernard-wa Junko ga byookida to it-ta lit: Bernard said Junko is sick b. Bernard-wa Junko-ga byookidata to it-ta.</Paragraph> <Paragraph position="3"> lit: Bernard said Junko was sick.</Paragraph> <Paragraph position="4"> Only the intended reading can distinguish these two translations. If this is encoded as part of the input, we can hope to achieve much more reasonable output.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Extracting cues for temporal </SectionTitle> <Paragraph position="0"> interpretation While we see this sort of cross-linguistic investigation as of intrinsic interest, our real goal is the investigation of the lexical and grammatical cues for temporal interpretation. As already mentioned, the biggest problem is one of scale. Generating a temporally annotated treebank of the size needed is a serious undertaking.</Paragraph> <Paragraph position="1"> It would, of course, be of great help to be able to partially automate this task. To that end we are currently engaged in research attempting to use overt cues such as perfect marking and temporal conjunctions such as before and after to bootstrap our way towards a temporally annotated corpus. Briefly, the idea is to use these overt markers to tag a corpus directly and to use this to generate a table of lexical preferences. So, for example, the sentence (11) can be tagged automatically, because of the presence of the perfect marking.</Paragraph> <Paragraph position="2"> the girl he had met This automatic tagging will allow us to assemble an initial data set of lexical preferences, such as that that would appear to hold between kiss and meet. If this initial data is confirmed by comparison with hand-tagged data, we can use this information to automatically annotate a much larger corpus based on these lexical preferences. It may then be possible to begin to carry out the investigation of cues to temporal interpretation before we have constructed a large hand-coded temporally annotated treebank.</Paragraph> </Section> </Section> class="xml-element"></Paper>