File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-2041_intro.xml
Size: 3,538 bytes
Last Modified: 2025-10-06 14:02:56
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-2041"> <Title>Tree Annotation Tool using Two-phase Parsing to Reduce Manual Effort for BuildingaTreebank</Title> <Section position="3" start_page="0" end_page="238" type="intro"> <SectionTitle> 2 Previous Works </SectionTitle> <Paragraph position="0"> Up to data, several approaches have been developed in order to reduce manual effort for building a treebank. They can be classified into the approaches using the heuristics (Hindle, 1989; Chang et al., 1997) and the approaches using the rules extracted from an already built treebank (Kwak et al., 2001; Lim et al., 2004).</Paragraph> <Paragraph position="1"> The first approaches are used for Penn Tree-bank (Marcus et al., 1993) and the KAIST language resource (Lee et al., 1997; Choi, 2001).</Paragraph> <Paragraph position="2"> Given a sentence, the approaches try to assign an unambiguous partial syntactic structure to a segment of each sentence based on the heuristics.</Paragraph> <Paragraph position="3"> The heuristics are written by the grammarians so that they are so reliable (Hindle, 1989; Chang et al., 1997). However, it is too difficult to modify the heuristics, and to change the features used for constructing the heuristics (Lim et al., 2004).</Paragraph> <Paragraph position="4"> The second approaches are used for SEJONG treebank (Kim and Kang, 2002). Like the first approaches, they also try to attach the partial syntactic structure to each sentence according to the rules. The rules are automatically extracted from an already built treebank. Therefore, the extracted rules can be updated whenever the annotator wants (Kwak et al., 2001; Lim et al., 2004). Nevertheless, they place a limit on the manual effort reduction and the annotating efficiency improvement because the extracted rules are less credible than the heuristics.</Paragraph> <Paragraph position="5"> In this paper, we propose a tree annotation tool using a parser for the purpose of shifting the responsibility of extracting the reliable syntactic rules to the parser. It is always ready to change the parser into another parser. However, most parsers still tend to show low performance on the long sentences (Li et al., 1990; Doi et al., 1993; Kim et al., 2000). Besides, one of the reasons to decrease the parsing performance is that the initial syntactic errors of a word or a phrase propagates to the whole syntactic structure.</Paragraph> <Paragraph position="6"> In order to prevent the initial errors from propagating without any modification of the parser, the proposed tool requires the annotator to segment a sentence. And then, it performs two-phase parsing for the intra-structure of each segment and the inter-structure. The parsing methods using clause-based segmentation have been studied to improve the parsing performance and the parsing complexity (Kim et al., 2000; Lyon and Dickerson, 1997; Sang and Dejean, 2001). Nevertheless, the clause-based segmentation can permit a short sentence to be splitted into shorter segments unnecessarily although too short segments increase manual effort to build a treebank.</Paragraph> <Paragraph position="7"> For the sake of minimizing manual effort, the proposed tree annotation tool induces the annotator to segment a sentence according to few heuristics verified by experimentally analyzing the already built treebank. Therefore, the heuristics can prefer the specific length unit rather than the linguistic units such as phrases and clauses.</Paragraph> </Section> class="xml-element"></Paper>