XML Viewer - w06-1519

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1519_intro.xml
Size: 1,808 bytes
Last Modified: 2025-10-06 14:04:01
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1519">
  <Title>Extracting Syntactic Features from a Korean Treebank</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In a Tree Adjoining Grammar, a feature structure is associated with each node in an elementary tree (Vijay-Shanker and Joshi, 1991). This feature structure contains information about how the node interacts with other nodes in the tree. It consists of a top part, which generally contains information relating to the super-node, and a bottom part, which generally contains information relating to the sub-node.</Paragraph>
    <Paragraph position="1"> In this paper, we present a system which can extract syntactic feature structures from a Tree-bank to develop a Feature-based Lexicalized Tree Adjoining Grammars. Several works have been on extracting grammars, especially using TAG formalism proposed. Chen (2001) has extracted lexicalized grammars from English Penn Treebank and there are other works based on Chen's procedure such as Nasr (2004) for French and Habash and Rambow (2004) for Arabic. Xia et al. (2000) developed the uniform method of a grammar extraction for English, Chinese and Korean. Neumann (2003) extracted Lexicalized Tree Grammars from English Penn Treebank for English and from NEGRA Treebank for German.</Paragraph>
    <Paragraph position="2"> However, none of these works have tried to extract syntactic features for FB-LTAG.</Paragraph>
    <Paragraph position="3"> We use with Sejong Treebank (SJTree) which contains 32 054 eojeols (the unity of segmentation in the Korean sentence), that is, 2 526 sentences. SJTree uses 43 part-of-speech tags and 55 syntactic tags (Sejong Project 2003).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML