File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1002_metho.xml
Size: 24,410 bytes
Last Modified: 2025-10-06 14:07:25
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1002"> <Title>ADAM: An Architecture for xml-based Dialogue Annotation on Multiple levels</Title> <Section position="3" start_page="0" end_page="9" type="metho"> <SectionTitle> 2 Corpus Description </SectionTitle> <Paragraph position="0"> The ADAM Corpus, which is currently being developed, will consist of 450 Italian dialogues, both human-human and humanmachine, belonging to the tourist domain. The human-machine component of the corpus is represented by human-machine dialogues over the phone, which are recordings of actual interactions occurring between customers and the Italian national railway information system (FS-Informa, (Bahia et al., 2000)).</Paragraph> <Paragraph position="1"> The human-human component is represented by task-oriented dialogues between a person playing the role of a travel agent and another playing the role of a customer. Each dialogue in the corpus is represented by an or- null thographic transcription, recording the words uttered by the speakers plus any other non linguistic sound. The transcription is linked to an audio file in PCM format. In addition, each dialogue is annotated according to five annotation levels, namely prosody, morphosyntax, syntax, semantics and pragmatics (see below).</Paragraph> </Section> <Section position="4" start_page="9" end_page="11" type="metho"> <SectionTitle> 3 ADAM: Architectural Principles </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="9" end_page="9" type="sub_section"> <SectionTitle> 3.1 Reusability Requirements </SectionTitle> <Paragraph position="0"> The ADAM approach is mainly driven by the need of meeting the requirements of potential users of annotated corpora, with a particular emphasis on corpora reusability. An annotated corpus is reusable as far as it complies with several requirements, such as corpus representativeness and use of standardized annotation schemes (see some widely renoWned standardization efforts such as EA-GLES, MATE, DRI, ATLAS, etc.). The physical format or mark-up language for corpus encoding is another crucial issue, as it will be argued in section 3.4. In addition to this, we claim that an annotated corpus is useful beyond the immediate particular application aims only to the extent to which it is designed so as to meet two other important needs. A fundamental requirement appears to revolve around the way annotation is organized, structured and represented in a corpus. In short, it is essential that the annotation, i.e. the linguistic information added to the data, should be easily and quickly modifiable at a moderately low cost by subsequent users of the corpus. We can think of at least two possible scenarios, referring to two orthogonal dimensions of customization operations. First, it might be the case that a user wishes to reuse a corpus which is annotated for several types of linguistic information, but lacks of a particular annotation type; the potential user could nevertheless be interested in the existing annotations, and would like to supplement them with a new one. On the other hand, it might be the case that a user is interested in some annotation only (e.g., pos-tagging or syntactic structure) and s/he might want to leave aside other annotation types. Reusability of an annotated corpus can thus be thought of as a function of the extent to which new levels of linguistic information can be added, or uninteresting ones can be removed. This is what we call the vertical dimension of customization in annotated corpora. Second, for each level of linguistic analysis, an annotated corpus is likely to be reused depending on the extent to which existing annotation can be changed, so as to accommodate different annotation practices. It is often the case that a corpus which is annotated with a given annotation scheme &quot;hardwires&quot; the annotation so as it is impossible to replace the annotation without reverting to the raw text and rebuilding the annotation from scratch, which is enormously expensive.</Paragraph> <Paragraph position="1"> This is what we call the horizontal dimension off customization of an annoted corpus.</Paragraph> <Paragraph position="2"> The extent to which an annotated corpus can be compliant with these two requirements clearly depends on the architectural choices made at the design level: if, for instance, all types of annotation are flattened onto a single representation level, it is clear that the customizing operations above become hardly feasible. We claim that the vertical and horizontal customization requirements can be easily achieved on the one hand by appealing to the two related notions of modularity of annotation and use of annotation meta-schemes, and on the other by exploiting a physical format of encoding that fully supports them. In the next three sections we illustrate the two concepts as well as the way they are implemented in the ADAM Corpus.</Paragraph> </Section> <Section position="2" start_page="9" end_page="10" type="sub_section"> <SectionTitle> 3.2 Annotation Modularity </SectionTitle> <Paragraph position="0"> In an annotated corpus, several different types of annotation or linguistic information may be present in relation to the same input data. These types of information can be thought of as independent, yet related, levels or dimensions of linguistic description . We thus can think of a level of prosodic analysis, another of pos-tagging, another of semantic analysis, etc. By annotation modularity we mean that the different layers of an- null notation are to be kept independent one of another. In the ADAM Corpus we provide five layers of annotation: prosodic, morpho-syntactic (pos tagging), syntactic, conceptualsemantic, and pragmatic. Each dialogue in the corpus is annotated at the above five levels of analysis; synchronization among the different analyses and between these and the speech signal is ensured by the different annotations (stored as separate files) making reference to the same input file. This file, containing the transcription of the dialogue, is in turn linked to the audio file in PCM (a-low or u-low) format. As it will be argued in section 3.3, support for this structure is provided by the use of XML as mark-up language. By adopting this structure, annotation layers are linguistically heterogeneous and mutually orthogonal, so that changing one of them affects others only to a limited extent; layers are nevertheless indirectly related through a) their hinging on a common reference file (the &quot;raw&quot; text represented by the transcription file); b) the indirect correlation of the linguistic information they convey. This vertical modularity of the ADAM approach has interesting consequences for the purposes of reusability. A potential user of the ADAM Corpus is left free to select, among the proposed levels of annotation, those which best reflect his/her theoretical and practical interests. (S)he can also feel the need for adding a new layer of information, not contemplated in today's ADAM realization. By the way, level modularity is also of theoretical interest, since most annotation schemes we know differ mainly in the way pieces of linguistic information categorized, rather than in the intrinsic nature of these levels. Moreover, level modularity seems to have a useful impact on our theoretical understanding of the linguistic phenomena at stake, since it is capable of expressing correlations between layers, and ultimately between dimensions of linguistic analysis. null</Paragraph> </Section> <Section position="3" start_page="10" end_page="11" type="sub_section"> <SectionTitle> 3.3 Annotation meta-schemes </SectionTitle> <Paragraph position="0"> Horizontal customization in annotated corpora can be enhanced by implementing the concept of annotation meta-schemes. The different layers of linguistic description irapried by the concept of annotation modularity presuppose as many annotation schemes. As it will be made clear in section 4, for each of the five annotation layers envisaged for the ADAM Corpus, a particular annotation scheme has been designed and applied.</Paragraph> <Paragraph position="1"> However, it should be emphasized how the ADAM specifications do not merely amount to another set of ready-made, off-the-shelf annotation schemes. Rather, we would like to focus the attention on what we call an annotation meta-scheme, and on the implications of this choice. According to our view, an annotation meta-scheme is a general descriptive framework in which different annotation schemes can be accommodated. In many cases the same unit of linguistic information can be annotated in different, arguably mutually incompatible ways, which are nonetheless all compatible with the recommended vertical modularity described above: so it is better to provide the potential user with the possibility of adopting any arbitrary annotation scheme without being forced to re-build the annotation from scratch or to forcefully comply with some other annotation scheme, no matter how standardized. To do so, it is necessary to have a representation format for the annotation that is general enough for competing schemes to be mutually substitutable.</Paragraph> <Paragraph position="2"> In ADAM we achieve this aim by building a general scheme where those features that are common to several competing schemes become slots or descriptive element tags to be associated with linguistic elements; the values of these attributes can be any arbitrary set of tags. Let's consider, for instance, the case of pragmatic annotation. The main difference between annotation schemes for this level of analysis lies in the particular types of dialogue act chosen rather than in the notion of dialogue act itself, which appears to be uncontroversial. If, however, we adopt a scheme where the basic descriptive element of any arbitrarily long set of words is the general tag <dialogue act>, further described by an attribute type, different schemes can be applied to the same corpus without totally discarding the existing annotation: a substitution in the set of values will be enough. It is our belief that enforcing this practice in the design of annotation schemes will bring us to more effective corpora exchange and reuse 3</Paragraph> </Section> <Section position="4" start_page="11" end_page="11" type="sub_section"> <SectionTitle> 3.4 XML-based Annotation </SectionTitle> <Paragraph position="0"> In fact, actual corpus reusability also crucially depends on the physical \]ormat or mark-up language used for corpus encoding. The mark-up language used for the ,encoding of the ADAM Corpus is XML. XML proved to be the ideal candidate for a number of reasons, all related to corpus reusability. First, it is an emerging and widespread standard, which ensures a good degree of corpus reusability in the times to come. Second, because of its platform-independence it enhances the potential for wide circulation of the annotated material, together with a considerable flexibility of use. More crucially, however, XML proved essential for implementation of the architectural choices described above. Annotation modularity is supported via extensive use of Xlink elements (DeRose et al., 2000).</Paragraph> <Paragraph position="1"> Each XML element in the annotation files is actually an hypertextual link which refers to an element (or set of elements) in the transcription file. All annotations for each dialogue are thus connected to the same input reference source (the transcription), thus ensuring synchronization of the different annotations and still preserving their independence. On the other hand, the concept of annotation meta-scheme is implemented by making the XML translation of the different annotation schemes content-independent. In other words, a general preference was given towards representing the different ~nnotation tags as values of generic, scheme-independent attributes of XML elements. In this way the different annotation schemes (represented as different DTDs) are represented in a generic enough way, so that a future user of the corpus will only need to change the values of Sin addition, the meta-scheme can be seen as a tool for effective compariso n of alternative annotation schemes.</Paragraph> <Paragraph position="2"> the different attributes for the entire annotation scheme to be changed. We believe that this approach represents a further value of the</Paragraph> </Section> </Section> <Section position="5" start_page="11" end_page="12" type="metho"> <SectionTitle> ADAM Corpus. </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="11" end_page="12" type="sub_section"> <SectionTitle> 3.5 Previous and related work </SectionTitle> <Paragraph position="0"> Our work builds on some important standardization efforts which were going on during the past few years in the field of dialogue annotation (DRI, EAGLES, and MATE). We are also indebted to the experience gained in other projects using stand-off XML annotation, and in particular to the MATE project. The multi-level markup framework adopted in ADAM closely reflects the MATE approach (Dybkjaer et al., 1998). In addition, in our project we are using the MATE workbench (Dybkjaer and Bernsen, 2000) for visualization and information extraction purposes. However, at the best of our knowledge ADAM is the first corpus being architecturally designed by explicitly adopting the concept of annotation modularity and meta-scheme at different levels. A recent standardization project in the annotation field is constituted by the ATLAS (Bird et al., 2000) consortium, including NIST, LDC and MITRE.</Paragraph> <Paragraph position="1"> The ATLAS architecture is based on a formal model for annotating linguistic data (Bird and Liberman, 1999). ATLAS offers a threelayers solution to the problem of integrating different data storage formats by providing a logical level which consists of the language formalism and the API. The architecture we are proposing for the ADAM corpus is not a software architecture such as the one implemented by ATLAS. While the latter one meets the requirement of flexible and dynamic extension of the sofware modules, the ADAM architecture mainly refers to functional organization of the different annotation layers. In ADAM the flexibility requirement is about the possible extensions of those layers. In addition, the ATLAS architecture covers a large variety of possibly annotated data (not only linguistic data, but visual data of different kinds too), while ADAM is only focused on linguistic and speech annotations.</Paragraph> </Section> </Section> <Section position="6" start_page="12" end_page="14" type="metho"> <SectionTitle> 4 Levels of Annotation </SectionTitle> <Paragraph position="0"> The ADAM's five levels of annotation were mainly chosen in consideration of their interest for practical applications of the annotated material. In spite of the number of levels considered, and their sometimes conflicting requirements, we tried to develop a coherent, unitary approach to design and application of annotation schemes. In particular, in developing the different annotation schemes for the five levels envisaged, attention was paid to be consistent with criteria of robustness, wide coverage and compliance with existing standards.</Paragraph> <Section position="1" start_page="12" end_page="12" type="sub_section"> <SectionTitle> 4.1 The prosodic level </SectionTitle> <Paragraph position="0"> For the annotation of the prosodic phenomena of dialogue we are adopting the meta-scheme for prosody annotation developed by Quazza and Garrido (Klein et al., 1998) within the MATE project. This meta-scheme allows to annotate the prosodic phenomena of natural dialogue by distinguishing the following four sub-levels of prosodic annotation: The four levels do not represent a fixed hierarchy. The two phonetic levels are directly aligned with the speech signal and in this sense may be considered as base levels. The two phonological levels keep a natural relationship both with the base prosodic levels and with other linguistic units. In the actual use of the scheme, the levels and their links can be fully or partially specified. In a linguistic text-oriented analysis, prosody could be considered in its function, leaving out the details of its realization. In this case, the sole phonological levels may be filled and linked to the orthographic level of words. Complex schemes like ToBI could be used in this way, or simpler schemes providing labels to distinguish types of accents, associated with words, and types of intonation boundaries. In a speech technology context, a more signaloriented approach could be adopted. In order to recognize or synthesize prosodic patterns, detailed phonetic descriptions are necessary, requiring both phonetic segmentation and phonetic representation of intonation - in terms of pitch movements or target f0 levels.</Paragraph> <Paragraph position="1"> The ADAM prosodic annotation is done at the level of the prosodic phrasing. The third section of Appendix B reports an excerpt of the prosodic annotation file of a human-human dialogue of the corpus. These are four dialogue turns whose text is translated in the first section of the Appendix. The element breakindex allows to encode the ToBI labels which constitute the values of the attribute type. For example, the brkndx_O07 is annotated with type label 3p to mark-up an hesitation pause.</Paragraph> </Section> <Section position="2" start_page="12" end_page="13" type="sub_section"> <SectionTitle> 4.2 The Morpho~Syntactic and Syntactic Levels </SectionTitle> <Paragraph position="0"> The ADAM proposal for the morphosyntactic level is a two-layer annotation structure, containing respectively information on word category and morphosyntactic features (pos tagging), and non recursive phrasal nuclei (called chunks). Robustness and coverage were a crucial aspect in the development of the two schemes, in particular for what concerns i) syntactic constructions specific of spoken dialogues (ellipses, anacolutha, non verbal predicative sentences etc.), and ii) disfluencies (repetitions, false starts, trailing off etc.). The morphosyntactic annotation level encodes the following information: a) identification of morphological words and linking to their corresponding orthographic counterparts; b) annotation of their pos-category; c) annotation of morphosyntactic features (such as number, gender, person, tense, etc.); d) annotation of their corresponding lemma. The partic~ar tag set, though adapted to representation of Italian, is compliant with EAGLES recommendations (Gibbon, 1999). In addition, the tag set is structured into a core scheme, supplying basic means for annotating morphological information, and a periphery tag set, which serves the purpose of making provision for further linguistic annotation to be added to obligatory information. The syntactic annotation level is built on top of the previous one and consists in identification of non-recursive phrasal nuclei (called chunks) and annotation of their category (Mengel et al., 1999). The preference given to shallow parsing over, e.g., phrase structure trees is chiefly motivated by the locality of the analysis offered by this approach, a useful feature if one wants to prevent a local parsing failure from backfiring and causing the entire parse of an utterance to fail. This is particularly desirable when dealing with particularly noisy and fragmented input such as spoken dialogue transcripts. For an illustration of morphosyntactic and syntactic annotation, see examples 4 and 5 in Appendix B.</Paragraph> </Section> <Section position="3" start_page="13" end_page="13" type="sub_section"> <SectionTitle> 4.3 The Conceptual Level </SectionTitle> <Paragraph position="0"> The annotation scheme for the conceptual level has been designed on the following requirements and assumptions: * portability: although most of concepts encode strictly domain-dependent information, the annotation scheme should be domain-independent as much as possible; * expressiveness: the scheme should allow the representation of the content of complex dialogues; * minimality: a turn should be annotated in a unique way; * simplicity: the syntactical complexity of the concept is to be minlmized; * locality: the annotation should not take in account the history of the dialogue.</Paragraph> <Paragraph position="1"> The proposed annotation scheme takes inspiration from the so called &quot;Frame-based Description Languages&quot; (Cattoni and Franconi, 1990), a well established framework in the field of the Knowledge Representation. In our annotation scheme a concept is encoded like a &quot;frame&quot;, a typed structure with &quot;slots&quot;. Slots represent the properties of the concept and its relations with Other concepts. Slots are encoded with the couple <slot-name, slotvalue>: the former contains the name of a property, the latter either a simple value or a reference to another concept. This recursion allows the encoding of complex and structured semantics information. Concepts are typed: different types of concepts (e.g. &quot;trip&quot;, &quot;room&quot;) encode different contents to be represented. null For example given the sentence to be annotated &quot;the train leaves from rome at eigth o'clock of monday fifteen&quot;, its conceptual annotation is: where the concept c_001 of type trip has tre slots; the slot representing the departure-time encodes a reference (introduced by the character '*') to the other concept c_002 of type time.</Paragraph> <Paragraph position="2"> The annotation scheme is domain independen/: the tag set does not change when the domain changes since the domain-dependent information is encoded in the values of the attributes. The user is free to adopt the preferred ontology, although a good reference are the symbols adopted by the C-STAR consortium (Waibel, 1996) for the inter-lingua: they have been developed on the basis of the experience on six different (Asiatic and European) languages and this appears to guarantee a good portability inter-lingua.</Paragraph> </Section> <Section position="4" start_page="13" end_page="14" type="sub_section"> <SectionTitle> 4.4 The Pragmatic Level </SectionTitle> <Paragraph position="0"> For annotating the pragmatic level of dialogue, we base our work on the concept of dialogue act. Informally speaking, a dialogue act tag is a label belonging to a tag set which refers to a given iUocutionary dimension that may be performed by uttering a sentence. A dialogue utterance may be annotated with a dialogue act label for representing the discourse function it plays in the dialogue. The annotation scheme used for the pragmatic level of the ADAM corpus is an extension of both DAMSL (Core and Allen, 1997) and SWITCHBOARD-DAMSL (Jurafsky et al., 1997). The extension was not motivated by domain, rather by the dependency on the dialogue type. Actually, most of dialogue acts encode information that is strictly dependent on whether the communication is task-oriented, familiar, formal, and so on. So the inventory of dialogue acts labei should be sufficiently wide to cover different types of dialogues, and sufficiently open to add new dialogue act labels for different annotation tasks. For the design of the extended tag set we have identified the following requirements and assumptious: null * minimality: an utterance should be tagged with an unique dialogue act label; * context-sensitiveness: each turn is managed by considering the previous turns, that is the annotation should take into account the history of the dialogue.</Paragraph> <Paragraph position="1"> The tag set used in the corpus is reported in Table 1 (see the Appendix). In Appendix B, Section 7 the four dialogue turns translated at the beginning of the Appendix are annotated by using the ADAM pragmatic tag set. Each dialogue turn is annotated as a whole by tagging the communicative level of the turn itself (if it is about the task or about managing the task, for example). Within the turn the different communicative intentions are labeled on the basis of the dialogue act tag set.</Paragraph> </Section> </Section> class="xml-element"></Paper>