XML Viewer - w03-1903

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1903_metho.xml
Size: 24,038 bytes
Last Modified: 2025-10-06 14:08:40
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1903">
  <Title>Ontology-based linguistic annotation</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The Ontology-based linguistic
</SectionTitle>
    <Paragraph position="0"> annotation framework An ontology is a formal specification of a conceptualization (Gruber, 1993). A conceptualization can be understood as an abstract representation of the world or domain we want to model for a certain purpose.</Paragraph>
    <Paragraph position="1"> The ontological model underlying this work is basically the one in (Bozsak et al., 2002). According to this model, an ontology is defined as follows: Definition 1 (Ontology) An ontology is a structure a0a2a1a4a3 a5a7a6a9a8a11a10a13a12a14a8a16a15a17a8a11a10a19a18a21a20 consisting of (i) two disjoint sets a6 and a15 called concept identifiers and relation identifiers respectively, (ii) a partial order a10a9a12 on a6 called concept hierarchy or taxonomy, (iii) a function a22a23a1a24a15a26a25  on a15 called relation hierarchy.</Paragraph>
    <Paragraph position="2"> In addition, the underlying ontological model also allows to define axioms: Definition 2 (a31 -Axiom System) Let a31 be a logical language. An a31 -axiom system for an ontology a0 as defined above is a pair (AI,a32 ) where (i) AI is a set whose elements are called axiom identifiers and (ii) a32 : AI a25 a31 is a mapping. The elements of A:=a32 (AI) are called axioms.</Paragraph>
    <Paragraph position="3"> In our ontological framework, a relation a33 can for example be defined as symmetric, i.e. SYM(a33 ). Now if F-logic (Kifer et al., 1995) is used as the underlying logical language such as in (Staab and M&amp;quot;adche, 2000), the translation of the SYM axiom identifier is as follows:</Paragraph>
    <Paragraph position="5"> In addition, we will also distinguish special type of relations which we will call attributes. These are relations with a plain datatype as range, i.e. relations a56a37a62a29a63 with signatures of the type</Paragraph>
    <Paragraph position="7"> where a67 is a plain datatype such as a string, an integer, etc.</Paragraph>
    <Paragraph position="8"> Our framework basically offers three ways of annotating a text with regard to an ontology: a69 a linguistic expression appearing in a text can be annotated as an instance of a certain ontological concept a70 a62 a6 a69 a linguistic expression in a text can be annotated as an attribute instance of some other linguistic expression previously annotated as a certain concept a70 a62 a6 a69 the semantic relation between two linguistic expressions respectively annotated as instances of two concepts a70a72a71a11a8a16a70a74a73 a62 a6 can be annotated as an instance of relation a33 a62 a15 if a22a75a5a76a33a77a20a51a3a28a5a54a70a78a71a79a8a16a70a74a73a79a20 The advantages of an ontology-based linguistic annotation framework as described above are the following: null a69 The formalization of the annotation scheme as an ontology as well as the use of standard formalisms such as RDF (Lassila and Swick, 1999) or OWL 2 to encode it, allow to reuse the scheme across different annotation tools. This meets the interoperability requirement mentioned in (Ide, 2002).</Paragraph>
    <Paragraph position="9"> a69 The specification of the annotation task, i.e. the annotation scheme, can be performed in an arbitrary ontology development environment and thus becomes completely independent of the annotation tool actually used.</Paragraph>
    <Paragraph position="10"> a69 The ontology-based linguistic annotation model offers the kind of flexibility mentioned in (Ide, 2002) in the sense that it is general enough to be applied to a broad variety of annotation tasks.</Paragraph>
    <Paragraph position="11"> a69 The fact that annotation is performed with respect to an ontological hierarchy offers annotators the possibility to choose the appropriate level of annotation detail such that they are never forced to overspecify, i.e. to annotate more specifically than they actually feel com- null In addition, a hierarchical annotation offers further possibilities regarding the computation of the agreement between different annotators as well as the evaluation of a system against a certain annotation. In this sense, instead of measuring only the categorial agreement between annotators with the kappa statistic (Carletta, 1996) or the performance of a system in terms of precision/recall, we could take into account the hierarchical organization of the categories or concepts by making use of measures considering the 'hierarchical distance' between two concepts such as proposed by (Hahn and Schnattinger, 1998) or (M&amp;quot;adche et al., 2002). Furthermore, the use of an ontology-based and thus more semantic framework for linguistic annotation has two further, very interesting properties. On the one hand, the use of an ontology helps to constrain the possible relations between two concepts, thus reducing the amount of errors in the annotation process. For example when annotating Coreferencerelations in a text, it seems obvious that an event and an entity will never be coreferring and in fact such an erroneous annotation can be avoided if the underlying ontological model actually forbids such an annotation (see below). Furthermore, by using axioms such as described above for example stating that Coreference is reflexive, symmetric and transitive - thus representing an equivalence relation the evaluation of systems becomes much easier and more straightforward when using an inference machine such as (Decker et al., 1999). If an annotator for example annotates the following coreferences: Coreference(A,B) and Coreference(B,C) a system's answer such as Coreference(A,C) will actually be counted as correct due to the fact that Coreference is defined as a transitive relation within the ontology.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Annotating anaphoric relations
</SectionTitle>
    <Paragraph position="0"> Before showing how our framework can be applied to the annotation of anaphoric relations in written texts, the assumptions underlying our model have to be explained. First, we aim at a more semantic annotation of anaphoric relations than for example described in (M&amp;quot;uller and Strube, 2001) because we think that such a model can to some extent be subsumed by the one we propose. In fact, we will understand the term anaphoric in a much wider sense in line with (Krahmer and Piwek, 2000) and (van Deemter and Kibble, 2000). They argue for example that coreference is not a necessary property of anaphora such as proposed in (M&amp;quot;uller and Strube, 2001). So annotating the relation between two expressions as anaphoric will correspond to the most general relation in our hierarchy. In particular, in our model Identity or Coreference will only be a special type of anaphoric relation (compare figure 2).</Paragraph>
    <Paragraph position="1"> On the other hand, bridging will be defined in our framework in line with (Asher and Lascarides, 1999) as &amp;quot;the inference that two objects or events that are introduced in a text are related in a particular way that isn't explicitly stated&amp;quot;. Thus Coreference or Identity can represent an anaphoric relation or more specifically a bridging reference depending on whether the identity relation is explicit or not. Consider the following minimal pair:  (2) John bought a car yesterday. The car was in a good state.</Paragraph>
    <Paragraph position="2"> (3) John bought a car yesterday. The vehicle was  in a good state.</Paragraph>
    <Paragraph position="3"> In example (2), the anaphoric relation is explicit due to the matching heads of the NPs a car and The car. In (3) the anaphoric or bridging relation is not explicit as world knowledge such as that cars are vehicles is needed to resolve the reference. In the semantics-based model for the annotation of anaphoric relations we propose in this paper, both examples will in fact be annotated as instances of the Coreference or Identity relation. Consequently, we will completely omit the concept bridging reference in the ontology underlying the annotation. In fact, we claim that the classification of an anaphora as a bridging reference, direct anaphora, pronominal anaphora, etc. such as pursued in (M&amp;quot;uller and Strube, 2001) can be seen as a byproduct of a more semantic classification as proposed here if additional grammatical information provided by the annotators is available. This grammatical information can be added to the concepts depicted in figure 2 in form of attributes specifying the grammatical form of the expression, i.e. whether it is for example a noun, an NP, a pronoun, a verb or a VP, as well as information about its head, gender or tense. The semantic classification proposed here together with the grammatical information modeled as attributes of a concept will then yield a classification as envisioned by (M&amp;quot;uller and Strube, 2001). For example, if two expressions are annotated as coreferring, this semantic relation can be further classified as nominal anaphora if the referring expression is a pronoun, as direct anaphora if the heads of the expression match or as a bridging reference otherwise. On the other hand, all the Non-Identity relations modeled in the ontology underlying the annotation task will lead to a classification as a bridging reference (compare figure 2). However, it should be mentioned that we do not aim at such a 'grammatical' classification of anaphoric relations. We envision a task as in (Asher and Lascarides, 1999), where bridging reference resolution corresponds to the task of finding the discourse referent serving as antecedent as well as the semantic relation between this discourse referent and the one of the referring expression.</Paragraph>
    <Paragraph position="4"> In our model, an expression can be antecedent for more than one referring expression, an assumption which seems to be commonly shared by many annotation schema. However, in our model a certain expression can also refer to more than one antecedent. (Poesio and Reyle, 2001) for instance show that the antecedent of a referring expression can in fact be ambiguous in a way that the overall interpretation of the expression or sentence is not affected. Furthermore, (Poesio and Reyle, 2001) argue that it is not clear whether the addressees of an utterance actually are aware of all the possible antecedents for a certain referring expression, if they underspecify the antecedent of a referring expression in case the over-all interpretation is not affected or if they just choose one of the possible antecedents without being aware of the other ones. In any case, a model for the annotation of anaphoric or bridging relations should not a priori exclude that referring expressions can have more than one antecedent. Consequently, the annotation of the semantic relation between a referring expression and an antecedent can neither take place at the antecedent nor the referring expression such as in (M&amp;quot;uller and Strube, 2001), but in a functional way, i.e. at a virtual edge between them.</Paragraph>
    <Paragraph position="5"> The ontology underlying our annotation scheme is depicted schematically in figure 1 We distinguish two types of eventualities: events and states, and model the discourse relations described in (Las null carides and Asher, 1991) as semantic relations between them. In addition, we distinguish between three types of (meta-) entities: sets of entities, intensional entities (van Deemter and Kibble, 2000) and (real-world) entities together with the potential relations such as member of, part of, etc. between them as well as to other types: An entity for example can play a certain thematic role in some event (compare figure 1).</Paragraph>
    <Paragraph position="6"> With such a concept hierarchy as well as semantic relations with a precisely defined signature, we can for example overcome annotation problems of intensionality and predication as discussed in (van Deemter and Kibble, 2000). In order to profit from the benefits of a hierarchical annotation, we also define a hierarchy on the semantic relations (see figure 2). Thus if annotators for example feel that there is an anaphoric relation between two linguistic expressions, but can not specify the type of relation, they can choose the most general relation in the hierarchy, i.e. anaphoric relation. As mentioned in section 2, the idea is that annotators are never forced to overspecify and can annotate at the hierarchical level they feel comfortable with.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 CREAM and OntoMat
</SectionTitle>
    <Paragraph position="0"> CREAM is an annotation and authoring framework and OntoMat-Annotizer (OntoMat for short) is its concrete implementation. The framework itself was developed for the creation of ontology-based annotation in the context of the Semantic Web. Its main objective is thus the transformation of existing syn- null mantic relations.</Paragraph>
    <Paragraph position="1"> tactic resources (viz. textual documents) into interlinked knowledge structures that represent relevant underlying information (Handschuh et al., 2001).</Paragraph>
    <Paragraph position="2"> However, with an apropriate ontology one can also take advantage of the framework and use it for linguistic annotation. In the subsequent section we will explain only the features that are relevant to this purpose. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 CREAM Features
</SectionTitle>
      <Paragraph position="0"> OntoMat's document viewer visualizes the document contents. The user may easily provide annotations by selecting pieces of text and aligning it with parts of the ontology. The document viewer supports various formats3 (HTML, PDF, XML, etc.). The Ontology and Fact Browser is the visual interface to the ontology and the annotated facts. The annotation framework needs guidance from the ontology.</Paragraph>
      <Paragraph position="1"> In order to allow for sharing of knowledge, newly created annotations must be consistent with a given ontology. Otherwise, if annotators instantiate arbitrary classes and properties the semantics of these properties remains void and the annotation thus useles. null Both the Ontology and Fact Browser and the document editor/viewer are intuitive to use: Drag'n'drop helps to avoid syntax errors and typos and a good visualization of the ontology helps the annotators to correctly choose the most appropriate class for an  HTML/XHTML and plain text. A support for PDF is in development. null instance (compare figure 3).</Paragraph>
      <Paragraph position="2">  An annotation in our context is a set of instantiations of classes, relationships and attributes. This instances are not directly embedded into the text, but are pointing to appropriate fragments of the document. The link between the annotation and the document is done by using XPointer (DeRose et al., 2001) as a adressing mechanism. This has some advantages with regards to the flexibility of annotation as it allows (i) multiple annotation (ii) nested annotation and (iii) overlapping annotation of text segments. null  The annotation inference server reasons on the instances and on the ontology. Thereby, it also takes into account the axioms modeled within the ontology and can thus be used in the evaluation of a certain system such as described in section 2. We use Ontobroker's F-Logic-based inference engine (Decker et al., 1999) as annotation inference server. The F-Logic inference engine combines orderingindependent reasoning in a high-level logical language with a well-founded semantics.</Paragraph>
      <Paragraph position="3">  CREAM supports different ways of storing the annotation. This flexiblity is given by the XPointer technique which allows to separate the annotation from the document. Hence, the annotations can be stored together with the document. Alternatively or simultaneously it is also possible to store them remote, either in a separate file or in the annotation inference server.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Annotaing anaphoric relations
</SectionTitle>
      <Paragraph position="0"> The ontology described in section 3 is available in the form of DAML+OIL4 classes and properties, in OWL, as pure RDF-Schema and in F-Logic. In the following, we shortly explain how OntoMat can be used for the creation of instances consistent with the ontology described in section 3.</Paragraph>
      <Paragraph position="1"> Figure 3 shows the screen for navigating the ontology and creating annotations in Ontomat. The right  pane displays the document and the left panes shows the ontological structures contained in the ontology, namely classes, attributes and relations. In addition, the left pane shows the current semantic annotation knowledge base, i.e. existing class instances, attribute instances and relationship instances created during the semantic annotation. First of all, the user browses a document by entering the URL of the web document that he would like to annotate. Then he loads the corresponding ontology into the ontology browser. He selects a text fragment by highlighting it. There are two possibilities for the text fragment to be annotated: as an instance or as a relation. In the case of an instance, the user selects in the ontology the class where the text fragment fits in, e.g. for the expression &amp;quot;a car&amp;quot; in example 2, he would select the class entity. By clicking on the class, the annotation gets created and thus the text fragment will be displayed as an instance of the selected class in the ontology browser. The relationships between the created instances can be specified, e.g. the entity The car can be annotated as coreferring with the preceding entity a car as described in section 2. For this purpose, when selecting a certain class instance as well as a corresponding semantic relation from the ontology, OntoMat already presents the possible target class instances according to the range restrictions of the chosen relation. Hereby erroneous annotations of relations are avoided (compare section 2). Futhermore, literal attributes can be assigned to every created instance by typing them into the related attribute field. The choice of the predefined attributes depends on the class the instance belongs to. Thereby, instances of a certain concept can be annotated with grammatical information about how they are linguistically expressed, i.e. through an NP, a noun, a pronoun, a verb, etc. (compare section 3).</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Discussion of Related Work
</SectionTitle>
    <Paragraph position="0"> There is a vast amount of frameworks and tools developed for the purpose of linguistic annotation.</Paragraph>
    <Paragraph position="1"> However, in this paper we will focus on the discussion of frameworks for the annotation of anaphoric or discourse relations in written texts. In the annotation scheme proposed by (M&amp;quot;uller and Strube, 2001) in the context of their annotation tool MMAX and in contrast to the one proposed in this paper, anaphoric relations are restricted to coreferring expressions, while bridging relations are restricted to non-coreferring ones. In line with (Krahmer and Piwek, 2000) and (van Deemter and Kibble, 2000) this is in our view a too strict definition of anaphora so that we propose a more relation-based classification of anaphoric and bridging relations. Furthermore, in (M&amp;quot;uller and Strube, 2001), anaphoric relations are further differentiated according to the lexical items taking part in the relation. We have shown that under the assumption that the corresponding grammatical information is provided by the annotators, such a classification can be seen as a byproduct of a more semantic one such as outlined in this paper. In addition, (M&amp;quot;uller and Strube, 2001) propose to specify antecedence with regard to equivalence classes rather than with regard to particular antecedents.</Paragraph>
    <Paragraph position="2"> However, this has the disadvantage that the information about the actual antecedent an annotator has selected is actually lost. Thus in our annotation proposal the fact that the Coreference relation forms equivalence classes is modeled by an underlying axiom system which can be exploited in the evaluation of a system against the annotation standard.</Paragraph>
    <Paragraph position="3"> The annotation scheme proposed by Poesio et al.</Paragraph>
    <Paragraph position="4"> (Poesio and Vieira, 1998) is a product of a corpus-based analysis of definite description (DD) use showing that more than 50% of the DDs in their corpus are discourse new or unfamiliar. Thus in Poesio et al.'s annotation scheme definite descriptions are also explicitly annotated as discourse new.</Paragraph>
    <Paragraph position="5"> The MUC coreference scheme (Hirschman and Chinchor, 1997) is restricted to the annotation of coreference relations, where coreference is also defined as an equivalence relation. Though this annotation scheme may seem quite simple, we agree with (Hirschman and Chinchor, 1997) that it is complex enough when taking into account the agreement of the annotators on a task. In fact, it has been shown that the agreement of subjects annotating bridging (Poesio and Vieira, 1998) or discourse (Cimiano, 2003) relations can be too low for tentative conclusion to be drawn (Carletta, 1996). The motivation of the MUC coreference scheme was thus to develop an annotation scheme leading to a good agreement.</Paragraph>
    <Paragraph position="6"> On the other hand, our motivation is to show how our ontology-based framework can be applied to the annotation of anaphoric relations in written texts and from this perspective the MUC coreference annotation scheme would have been in fact too restricted to actually show all the advantages of our approach.</Paragraph>
    <Paragraph position="7"> The UCREL (Fligelstone, 1992) and DRAMA (Passoneau, 1996) annotation schemes are more related to ours than the schemes above in the sense that they also provide a rich set of particular bridging relations that can be annotated. However, in contrast to the ontology-based framework presented in this paper, these bridging relations are not constrained with regard to the conceptual types of their arguments, so that erroneous annotations can not be avoided.</Paragraph>
    <Paragraph position="8"> The coreference annotation scheme proposed within the MATE Workbench project consists of a core as well as an extended scheme (Davies et al., 1998).</Paragraph>
    <Paragraph position="9"> The core scheme is in principle identical with the MUC coreference scheme and is restricted to the annotation of coreference in the sense of (van Deemter and Kibble, 2000). The extended scheme also allows the annotation of bound anaphors, of the relationship between a function and its values, of different set, part and possession relations, of instantiation relations as well as of event relations. The MATE scheme is related to our ontology-based annotation scheme in the sense that relations are also annotated as triples via the link-tag (Davies et al., 1998). As in our framework, the MATE scheme also allows to mark up ambiguities of reference. However, in contrast to the MATE scheme our framework has no means to specify a preference order on these ambiguous antecedents. On the other hand, the MATE scheme also includes a reasonable and complete taxonomy of markables as well as some features relevant for the annotation of coreference in dialogues such as the treatment of hesitations, disfluencies and misunderstandings.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML