File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/p01-1040_evalu.xml
Size: 3,566 bytes
Last Modified: 2025-10-06 13:58:47
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1040"> <Title>A Common Framework for Syntactic Annotation</Title> <Section position="9" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> Despite its seeming complexity, the XCES framework is designed to reduce overhead for annotators and users. Part of the work of the XCES is to provide XML support (e.g., development of XSLT scripts, XML schemas, etc.) for use by the research community, thus eliminating the need for XML expertise at each development site. Because XMLencoded annotated corpora are increasingly used for interchange between processing and analytic tools, we are developing XSLT scripts for mapping, and extraction of annotated data, import/export of (partially) annotated material, and integration of results of external tools into existing annotated data in XML. Tools for editing annotations in the abstract format, which automatically generate virtual AML from Data Category and Dialect Specifications, are already under development in the context of work on the Terminological Markup Language, and a tool for automatically generating RDF specifications for user-specified data categories has already been developed in the SALT project.9 Several freely distributed interpreters for XSLT have also been developed (e.g., xt10, Xalan11). In practice, annotators and users of annotated corpora will rarely see XML and RDF instantiations of annotated data; rather, they will access the data via interfaces that automatically generate, interpret, and display the data in easy-to-read formats.</Paragraph> <Paragraph position="1"> The abstract model that captures the fundamental properties of syntactic annotation schemes provides a conceptual tool for assessing the coherence and consistency of existing schemes and those being developed.</Paragraph> <Paragraph position="2"> The model enforces clear distinctions between implicit and explicit information (e.g., functional relations implied by structural relations in constituent analyses), and phrasal and functional relations. It is alarmingly common for annotation schemes to represent these different kinds of information in the same way, rendering their distinction computationally intractable (even if they are perfectly understandable by the informed human reader). Hand-developed annotation schemes used in treebanks are often described informally in guidebooks for annotators, leaving considerable room for variation; for example, Charniak (1996) notes that the PTB implicitly contains more than 10,000 context-free rules, most of which are used only once. Comparison and transduction of schemes becomes virtually impossible under such circumstances. While requiring that annotators make relations explicit and consider the mapping to the XCES abstract format increases overhead, we feel that the exercise will help avoid such problems and can only lead to greater coherence, consistency, and inter-operability among annotation schemes.</Paragraph> <Paragraph position="3"> The most important contribution to inter-operability of annotation schemes is the Data Category Registry. By mapping site-specific categories onto definitions in the Registry, equivalences (and non-equivalences) are made explicit. Again, the provision of a standard set of categories, together with the requirement that scheme-specific categories are mapped to them where possible, will contribute to greater consistency and commonality among annotation schemes.</Paragraph> </Section> class="xml-element"></Paper>