File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-2041_metho.xml

Size: 12,491 bytes

Last Modified: 2025-10-06 14:09:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2041">
  <Title>Tree Annotation Tool using Two-phase Parsing to Reduce Manual Effort for BuildingaTreebank</Title>
  <Section position="4" start_page="238" end_page="239" type="metho">
    <SectionTitle>
3 Tree Annotation Tool
</SectionTitle>
    <Paragraph position="0"> The tree annotation tool is composed of segmentation, tree annotation for intra-structure, and tree annotation for inter-structure as shown in Figure1.</Paragraph>
    <Section position="1" start_page="238" end_page="239" type="sub_section">
      <SectionTitle>
3.1 Sentence Segmentation
</SectionTitle>
      <Paragraph position="0"> The sentence segmentation consists of three steps: segmentation step, examination step, and cancellation step. In the segmentation step, the annotator segments a long sentence. In the examination step, the tree annotation tool checks the length of each segment. In the cancellation step, it induces the annotator to merge the adjacent short segments by cancelling some brackets. Given a short sentence, the tree annotation tool skips over the sentence segmentation.</Paragraph>
      <Paragraph position="1"> As shown in the top of Figure 2, the annotator segments a sentence by clicking the button &amp;quot;)(&amp;quot; between each pair of words. Since the tool regards segments as too short given the segment length 9, it provides the cancel buttons. And then, the annotator merges some segments by clicking the third button, the fifth button, and the sixth button of the middle figure. The bottom figure does not include the fourth button of the middle figure because a new segment will be longer than the segment length 9. When every segment is suitable, the annotator exclusively clicks the confirm button ignoring the cancel buttons.</Paragraph>
    </Section>
    <Section position="2" start_page="239" end_page="239" type="sub_section">
      <SectionTitle>
3.2 Tree Annotation for Intra-Structure
</SectionTitle>
      <Paragraph position="0"> The tree annotation for intra-structure consists of three steps: generation step, cancellation step, and reconstruction step. In the generation step, the parser generates a candidate intra-structure for each segment of a sentence. And then, the tree annotation tool shows the annotator the candidate intra-structure. In the cancellation step, the annotator can cancel some incorrect constituents in the candidate intra-structure. In the reconstruction step, the annotator reconstructs the correct constituents to complete a correct intra-structure.</Paragraph>
      <Paragraph position="1"> For example, the tree annotation tool shows the candidate intra-structure of a segment as shown in the top of Figure 3. Assuming that the candidate intra-structure includes two incorrect constituents, the annotator cancels the incorrect constituents after checking the candidate intra-structure as represented in the middle figure. And then, the annotator reconstructs the correct constituent, and the intra-structure is completed as described in the bottom of Figure 3.</Paragraph>
    </Section>
    <Section position="3" start_page="239" end_page="239" type="sub_section">
      <SectionTitle>
3.3 Tree Annotation for Inter-Structure
</SectionTitle>
      <Paragraph position="0"> The tree annotation for inter-structure also consists of three steps: generation step, cancellation step, and reconstruction step. In the generation step, the parser generates a candidate inter-structure based on the given correct intrastructures. And then, the tree annotation tool shows an annotator the candidate syntactic structure which includes both the intra-structures and the inter-structure. In the cancellation step, an  annotator can cancel incorrect constituents. In the reconstruction step, the annotator reconstructs correct constituents to complete a correct syntactic structure.</Paragraph>
      <Paragraph position="1"> For example, the tree annotation tool represents the candidate syntactic structure of a sentence as illustrated in the top of Figure 4. Assuming that the candidate syntactic structure includes two incorrect constituents, the annotator cancels the incorrect constituents, and reconstructs the correct constituent. Finally, the intra-structure is completed as described in the bottom of Figure 3.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="239" end_page="242" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> In order to examine how much the proposed tree annotation tool reduces manual effort for building a Korean treebank, it is integrated with a parser (Park et al., 2004), and then it is evaluated on the test sentences according to the following criteria. The segment length (Length) indicates that a segment of a sentence is splitted when it  is longer than the given segment length. Therefore, the annotator can skip the sentence segmentation when a sentence is shorter than the segment length. The number of segments (#Segments) indicates the number of segments splitted by the annotator. The number of cancellations (#Cancellations) indicates the number of incorrect constituents cancelled by the annotator where the incorrect constituents are generated by the parser. The number of reconstructions (#Reconstructions) indicates the number of constituents reconstructed by the annotator. Assume that the annotators are so skillful that they do not cancel their decision unnecessarily. On the other hand, the test set includes 3,000 Korean sentences which never have been used for training the parser in a Korean treebank, and the sentences are the part of the treebank converted from the KAIST language resource (Choi, 2001). In this section, we analyze the parsing performance and the reduction effect of manual effort according to segment length for the purpose of finding the best segment length to minimize manual effort.</Paragraph>
    <Section position="1" start_page="240" end_page="240" type="sub_section">
      <SectionTitle>
4.1 Parsing Performance According to
Segment Length
</SectionTitle>
      <Paragraph position="0"> For the purpose of evaluating the parsing performance given the correct segments, we classify the constituents in the syntactic structures into the constituents in the intra-structures of segments and the constituents in the inter-structures.</Paragraph>
      <Paragraph position="1"> Besides, we evaluate each classified constituents based on labeled precision, labeled recall, and distribution ratio. The labeled precision (LP) indicates the ratio of correct candidate constituents from candidate constituents generated by the parser, and the labeled recall (LR) indicates the ratio of correct candidate constituents from constituents in the treebank (Goodman, 1996). Also, the distribution ratio (Ratio) indicates the distribution ratio of constituents in the intra-structures from all of constituents in the original structure.</Paragraph>
      <Paragraph position="2"> Table 1 shows that the distribution ratio of the constituents in the intra-structures increases according to the longer segment length while the distribution ratio of the constituents in the inter-structures decreases. Given the segment length 1, the constituents in the inter-structures of a sentence are the same as the constituents of the sen- null tence because there is no evaluated constituent on intra-structure of one word. In the same way, the constituents in the intra-structures of a sentence are the same as the constituents of the sentence given the segment length 28 because all test sentences are shorter than 28 words.</Paragraph>
      <Paragraph position="3"> As described in Table 1, the labeled precision and the labeled recall decrease on the intra-structures according to the longer segment length because both the parsing complexity and the parsing ambiguity increase more and more. On the other hand, the labeled precision and the labeled recall tend to increase on the inter-structures since the number of constituents in the inter-structures decrease, and it makes the parsing problem of the inter-structure easy. It is remarkable that the labeled precision and the labeled recall get relatively high performance on the inter-structures given the segment length less than 2 because it is so easy that the parser generates the syntactic structure for 3 or 4 words.</Paragraph>
    </Section>
    <Section position="2" start_page="240" end_page="241" type="sub_section">
      <SectionTitle>
4.2 Reduction Effect of Manual Effort
</SectionTitle>
      <Paragraph position="0"> In order to examine the reduction effect of manual effort according to segment length, we measure the frequency of the annotator's intervention on the proposed tool, and also classify the frequency into the frequency related to the intra-structure and the frequency related to the inter-structure.</Paragraph>
      <Paragraph position="1">  As shown in Figure 5, the total manual effort includes the number of segments splitted by the annotator, the number of constituents cancelled by the annotator, and the number of constituents reconstructed by the annotator.</Paragraph>
      <Paragraph position="2"> Figure 5 shows that too short segments can aggravate the total manual effort since the short segments require too excessive segmentation work. According to longer length, the manual effort on the intra-structures increase because the number of the constituents increase and the labeled precision and the labeled recall of the parser decrease. On the other hand, the manual effort on the inter-structures decreases according to longer length on account of the opposite reasons. As presented in Figure 5, the total manual effort is reduced best at the segment length 9. It describes that the annotation's intervention related to cancellation and reconstruction remarkably decrease although it requires the annotator to segment some sentences.</Paragraph>
      <Paragraph position="3"> Also, Figure 5 describes that we hardly expect the effect of two-phase parsing based on the long segments while the short segments require too excessive segmentation work.</Paragraph>
    </Section>
    <Section position="3" start_page="241" end_page="242" type="sub_section">
      <SectionTitle>
4.3 Comparison with Other Methods
</SectionTitle>
      <Paragraph position="0"> In this experiment, we compare the manual effort of the four methods: the manual tree annotation tool (only human), the tree annotation tool using the parser (no segmentation), the tree annotation tool using the parser with the clause-based sentence segmentation (clause-based segmentation), and the tree annotation tool using the parser with the length-based sentence segmentation (length-based segmentation) where the segment length is 9 words.</Paragraph>
      <Paragraph position="1"> As shown in Figure 6, the first method (only  human) does not need any manual effort related to segmentation and cancellation but it requires too expensive reconstruction cost. The second method (no segmentation) requires manual effort related to cancellation and reconstruction without segmentation. As compared with the first method, the rest three methods using the parser can reduce manual effort by roughly 50% although the parser generates some incorrect constituents. Furthermore, the experimental results of the third and fourth methods shows that the two-phase parsing methods with sentence segmentation can reduce manual effort more about 9.4% and 24.5% each as compared with the second method.</Paragraph>
      <Paragraph position="2"> Now, we compare the third method (clause-based segmentation) and the fourth method (length-based segmentation) in more detail. As represented in Figure 6, the third method is somewhat better than the fourth method on manual effort related to intra-structure. It show that the parser generates more correct constituents given the clause-based segments because the intra-structure of the clause-based segments is more formalized than the intra-structure of the length-based segments. However, the third method is remarkably worse than the fourth method on manual effort related to inter-structure. It describes that the third method can split a short sentence into shorter segments unnecessarily since the third method allows the segment length covers a wide range. As already described in Figure 5, too short segments can aggravate the manual effort. Finally, the experimental results shows that the length-based segments help the tree annotation tool to reduce manual effort rather than the clause-based segments.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML