File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-4009_intro.xml

Size: 5,271 bytes

Last Modified: 2025-10-06 14:01:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-4009">
  <Title>WordFreak: An Open Tool for Linguistic Annotation</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Components
</SectionTitle>
    <Paragraph position="0"> WordFreak has a number of different types of components. These include two types of data visualization components, annotation scheme components which define the type of annotation which is taking place, and automatic annotators or taggers. Each of these components implements a common interface so that adding additional components only requires implementing the same interface. Additionally, WordFreak examine the environment in which it is run and gathers up any components which implement one of these interfaces. This allows components to be added to it without re-compilation of the original source code.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Visualization
</SectionTitle>
      <Paragraph position="0"> The visualization components are called Viewers and Choosers. Prototypically the Viewer is where the user looks to perform the annotation. WordFreak currently contains four such Viewers which display text, trees, a concordance, and tables respectively. While particular viewers are better suited to certain tasks, multiple viewers can be used simultaneously. The viewer are displayed in a tabbed-pane for easy access but can also be removed if the user wishes to see multiple views of the data simultaneously. null The second type of visualization components are called Choosers. These are typically used to display the choices that an annotator needs to make in a particular annotation scheme. Choosers are specific to an annotation scheme but are constructed via a set of re-usable chooser components. For example, a typical chooser consist of a navigation component which allows the user to move  through annotations, a buttons component parameterized to contain names of the relationships your annotating, and a comment component which allows a user to make a free-form comments about a particular annotation. Currently there are chooser components for the above described tasks as well as tree representations which have been used to display annotation choices for tasks such as coreference and word sense disambiguation.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Task Definitions
</SectionTitle>
      <Paragraph position="0"> Adapting WordFreak to new annotation tasks is a common task. This has led us to try and minimizes the amount of new code that needs to be written for new task definitions. We have used a two tiered approach to new task definitions.</Paragraph>
      <Paragraph position="1"> The first employs the inheritance mechanisms available in Java. To define a new task or annotation scheme one can simply sub-classes an existing AnnotationScheme class, initialize what types of annotations the new task will be based on, define the names of the set of relationships you will be positing over these annotations, and specify what chooser components you want to use to display this set of names. While many options can be customized such as keyboard short-cuts, color assignment to particular relationships, and constraints on valid annotation, the default assignments use the most likely settings so a typical annotation scheme requires under 100 lines of well delimited code. Annotation schemes which involve more complicated interactions such as coreference and word sense disambiguation have taken approximately 300 lines of code specific to that task.</Paragraph>
      <Paragraph position="2"> The second mechanism, which is currently being developed, allows a task to be parameterized with an xml file. This can be applied if an existing annotation scheme similar to your task has already been developed.</Paragraph>
      <Paragraph position="3"> At present we have used this mechanism to customize named-entity and coreference task which are similar to their corresponding MUC or ACE tasks. Likewise this mechanism can be used to customize the tag sets used for different types of tree-banking tasks.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Automatic Annotators
</SectionTitle>
      <Paragraph position="0"> We have integrated a number of automatic annotators to work with WordFreak. These include sentence detectors, POS taggers, parsers, and coreference revolvers. The APIs these annotators implement allow them to optionally determine the order that annotation choices are displayed to the user as well as provide a confidence measure with each annotation they determine automatically.</Paragraph>
      <Paragraph position="1"> The first mechanism is quite useful for tasks which have a large number of potential choices such as POS tagging or coreference resolution in that the most likely choices can be displayed first. The confidence measure can be used for active learning or just to assist in the correction of the automatic annotator. We are currently in the process of adapting open source taggers to be used and distributed as plug-ins to WordFreak.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML