File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-4006_metho.xml
Size: 8,284 bytes
Last Modified: 2025-10-06 14:10:19
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-4006"> <Title>Knowtator: A Protege plug-in for annotated corpus construction</Title> <Section position="3" start_page="1" end_page="1" type="metho"> <SectionTitle> . A </SectionTitle> <Paragraph position="0"> Protege knowledge-base typically consists of class, instance, slot, and facet frames. Class definitions represent the concepts of a domain and are organized in a subsumption hierarchy. Instances correspond to individuals of a class. Slots define properties of a class or instance and relationships between classes or instances. Facets constrain the values that slots can have.</Paragraph> <Paragraph position="1"> Protege has garnered widespread usage by providing an architecture that facilitates the creation of third-party plug-ins such as visualization tools and inference engines. Knowtator has been implemented as a Protege plug-in and runs in the Protege environment. In Knowtator, an annotation schema is defined with Protege class, instance, slot, and facet definitions using the Protege knowledge-base editing functionality. The defined annotation schema can then be applied to a text annotation task without having to write any task specific software or edit specialized configuration files. Annotation schemas in Knowtator can model both syntactic (e.g. shallow parses) and semantic phenomena (e.g. protein-protein interactions).</Paragraph> </Section> <Section position="4" start_page="1" end_page="273" type="metho"> <SectionTitle> 2 Related work </SectionTitle> <Paragraph position="0"> There exists a plethora of manual text annotation tools for creating annotated corpora. While it has been common for individual research groups to build customized annotation tools for their specific Figure 1 Simple co-reference annotations in Knowtator annotation tasks, several text annotation tools have emerged in the last few years that can be employed to accomplish a wide variety of annotation tasks. Some of the better general-purpose annotation these tools is distributed with a limited number of annotation tasks that can be used 'out of the box.' Many of the tasks that are provided can be customized to a limited extent to suit the requirements of a user's annotation task via configuration files. In Callisto, for example, a simple annotation schema can be defined with an XML DTD that allows the creation of an annotation schema that is essentially a tag set augmented with simple (e.g. string) attributes for each tag. In addition to configuration files, WordFreak provides a plug-in architecture for creating task specific code modules that can be integrated into the user interface.</Paragraph> <Paragraph position="1"> A complex annotation schema might include hierarchical relationships between annotation types and constrained relationships between the types.</Paragraph> <Paragraph position="2"> Creating such an annotation schema can be a formidable challenge for the available tools either http://mmax.eml-research.de/.</Paragraph> <Paragraph position="3"> because configuration options are too limiting or because implementing a new plug-in is too expensive or time consuming.</Paragraph> </Section> <Section position="5" start_page="273" end_page="274" type="metho"> <SectionTitle> 3 Implementation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="273" end_page="274" type="sub_section"> <SectionTitle> 3.1 Annotation schema </SectionTitle> <Paragraph position="0"> Knowtator approaches the definition of an annotation schema as a knowledge engineering task by leveraging Protege's strengths as a knowledge-base editor. Protege has user interface components for defining class, instance, slot, and facet frames.</Paragraph> <Paragraph position="1"> A Knowtator annotation schema is created by defining frames using these user interface components as a knowledge engineer would when creating a conceptual model of some domain. For Knowtator the frame definitions model the phenomena that the annotation task seeks to capture.</Paragraph> <Paragraph position="2"> As a simple example, the co-reference annotation task that comes with Callisto can be modeled in Protege with two class definitions called markable and chain. The chain class has two slots references and primary_reference which are constrained by facets to have values of type markable. This simple annotation schema can now be used to annotate co-reference phenomena occur- null ring in text using Knowtator. Annotations in Knowtator created using this simple annotation schema are shown in Figure 1.</Paragraph> <Paragraph position="3"> A key strength of Knowtator is its ability to relate annotations to each other via the slot definitions of the corresponding annotated classes. In the co-reference example, the slot references of the class chain relates the markable annotations for the text extents 'the cat' and 'It' to the chain annotation. The constraints on the slots ensure that the relationships between annotations are consistent.</Paragraph> <Paragraph position="4"> Protege is capable of representing much more sophisticated and complex conceptual models which can be used, in turn, by Knowtator for text annotation. Also, because Protege is often used to create conceptual models of domains relating to biomedical disciplines, Knowtator is especially well suited for capturing named entities and relations between named entities for those domains.</Paragraph> </Section> <Section position="2" start_page="274" end_page="274" type="sub_section"> <SectionTitle> 3.2 Features </SectionTitle> <Paragraph position="0"> In addition to its flexible annotation schema definition capabilities, Knowtator has many other features that are useful for executing text annotation projects. A consensus set creation mode allows one to create a gold standard using annotations from multiple annotators. First, annotations from multiple annotators are aggregated into a single Knowtator annotation project. Annotations that represent agreement between the annotators are consolidated such that the focus of further human review is on disagreements between annotators.</Paragraph> <Paragraph position="1"> Inter-annotator agreement (IAA) metrics provide descriptive reports of consistency between two or more annotators. Several different match criteria (i.e. what counts as agreement between multiple annotations) have been implemented.</Paragraph> <Paragraph position="2"> Each gives a different perspective on how well annotators agree with each other and can be useful for uncovering systematic differences. IAA can also be calculated for selected annotation types giving very fine grained analysis data.</Paragraph> <Paragraph position="3"> Knowtator provides a pluggable infrastructure for handling different kinds of text source types.</Paragraph> <Paragraph position="4"> By implementing a simple interface, one can annotate any kind of text (e.g. from xml or a relational database) with a modest amount of coding.</Paragraph> <Paragraph position="5"> Knowtator provides stand-off annotation such that the original text that is being annotated is not modified. Annotation data can be exported to a simple XML format.</Paragraph> <Paragraph position="6"> Annotation filters can be used to view a subset of available annotations. This may be important if, for example, viewing only named entity annotations is desired in an annotation project that also contains many part-of-speech annotations. Filters are also used to focus IAA analysis and the export of annotations to XML.</Paragraph> <Paragraph position="7"> Knowtator can be run as a stand-alone system (e.g. on a laptop) without a network connection.</Paragraph> <Paragraph position="8"> For increased scalability, Knowtator can be used with a relational database backend (via JDBC).</Paragraph> <Paragraph position="9"> Knowtator and Protege are provided under the Mozilla Public License 1.1 and are freely available with source code at http://bionlp.sourceforge.net/ Knowtator and http://protege.stanford.edu, respectively. Both applications are implemented in the Java programming language and have been successfully deployed and used in the Windows, MacOS, and Linux environments.</Paragraph> </Section> </Section> class="xml-element"></Paper>