File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-2019_intro.xml

Size: 2,621 bytes

Last Modified: 2025-10-06 14:02:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2019">
  <Title>eBonsai: An integrated environment for annotating treebanks</Title>
  <Section position="2" start_page="0" end_page="109" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Statistical approach has been a main stream of natural language processing research for the last decade. Particularly, syntactically annotated corpora (treebanks), such as Penn Treebank (Marcus et al., 1993), Negra Corpus (Skut et al., 1997) and EDR Corpus (Jap, 1994), contribute to improve the performance of morpho-syntactic analysis systems. It is notorious, however, that building a large treebank is labor intensive and time consuming work. In addition, it is quite difficult to keep quality and consistency of a large treebank. To remedy this problem, there have been many attempts to develop software tools for annotating treebanks (Plaehn and Brants, 2000; Bird et al., 2002).</Paragraph>
    <Paragraph position="1"> This paper presents an integrated environment for annotating treebanks, called eBonsai. Figure 1 shows a snapshot of eBonsai. eBonsai first performs syntactic analysis of a sentence using a parser based on GLR algorithm (MSLR parser) (Tanaka et al., 1993), and provides candidates of its syntactic structure. An annotator chooses a correct structure from these candidates.</Paragraph>
    <Paragraph position="2"> When choosing a correct structure, the annotator can consult the system to retrieve already annotated similar sentences to make the current decision. Integration of annotation and retrieval is a significant feature of eBonsai.</Paragraph>
    <Paragraph position="3"> To realize the tight coupling of annotation and retrieval, eBonsai has been implemented as the following two plug-in modules of an universal tool platform: Eclipse (The Eclipse Foundation, 2001).</Paragraph>
    <Paragraph position="4"> + Annotation plug-in module: This module helps to choose a correct syntactic structure from candidate structures.</Paragraph>
    <Paragraph position="5"> + Retrieval plug-in module: This module retrieves similar sentences to a sentence in question from already annotated sentences in the treebank.</Paragraph>
    <Paragraph position="6"> These two plug-in modules work cooperatively in the Eclipse framework. For example, information can be transferred easily between these two modules in a copy-and-past manner. Furthermore, since they are implemented as Eclipse plug-in modules, these functionalities can also interact with other plug-in modules and Eclipse native features such as CVS.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML