File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1519_metho.xml

Size: 6,349 bytes

Last Modified: 2025-10-06 14:10:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1519">
  <Title>Extracting Syntactic Features from a Korean Treebank</Title>
  <Section position="4" start_page="0" end_page="133" type="metho">
    <SectionTitle>
2 Extracting a Feature structure for
FB-LTAG
</SectionTitle>
    <Paragraph position="0"> FB-LTAG grammars eventually use reduced tagset because FB-LTAG grammars contain their syntactic information in features structures. For example, NP_SBJ syntactic tag in LTAG is changed into NP and a syntactic feature &lt;case=nominative&gt; is added. Therefore, we use actually a 13 reduced tagset for FB-LTAG grammars compared with a 55 syntactic tagset for an LTAG without features. From full-scale syntactic tags which end with _SBJ (subject), _OBJ (object) and _CMP (attribute), we extract &lt;case&gt; features which describe argument structures in the sentence.</Paragraph>
    <Paragraph position="1"> Alongside &lt;case&gt; features, we also extract &lt;mode&gt; and &lt;tense&gt; from morphological analyses in SJTree. Since however morphological analyses for verbal and adjectival endings in SJTree are simply divided into EP, EF and EC which mean non-final endings, final endings and conjunctive endings, respectively, &lt;mode&gt; and &lt;tense&gt; features are not extracted directly from SJTree. In this paper, we analyze 7 non-final endings (EP) and 77 final endings (EF) used in SJTree to extract automatically &lt;mode&gt; and &lt;tense&gt; features. In general, EF carries &lt;mode&gt; inflections, and EP carries &lt;tense&gt; inflections.</Paragraph>
    <Paragraph position="2"> Conjunctive endings (EC) are not concerned with &lt;mode&gt; and &lt;tense&gt; features and we only extract &lt;ec&gt; features with its string value. &lt;ef&gt; and &lt;ep&gt; features are also extracted with their string values. Some of non-final endings like si are extracted as &lt;hor&gt; features which have honorary meaning. In extracted FB-LTAG grammars, we present their lexical heads in a bare infinitive with morphological features such as &lt;ep&gt;, &lt;ef&gt; and &lt;ec&gt; which make correspond with its inflected forms.</Paragraph>
    <Paragraph position="3">  &lt;det&gt; is another automatically extractable feature in SJTree and it is extracted from both syntactic tag and morphological analysis unlike other extracted features. For example, while &lt;det=-&gt; is extracted from dependant nouns which always need modifiers (extracted by morphological analyses), &lt;det=+&gt; is extracted from _MOD phrases (extracted by syntactic tags).</Paragraph>
    <Paragraph position="4"> From syntactic tag DP which contains MMs (determinative or demonstrative), &lt;det=+&gt; is also extracted. See Table 1 for all the extractable features from SJTree.</Paragraph>
    <Paragraph position="5">  of instantiating mode and tense string values like eoss, da, go, etc.</Paragraph>
    <Paragraph position="6"> &lt;hor&gt; honorific +/ null Korean does not need features &lt;person&gt; or &lt;number&gt; as in English. Han et al. (2000) proposed several features for Korean FBLTAG which we do not use in this paper, such as &lt;advpp&gt;, &lt;top&gt; and &lt;aux-pp&gt; for nouns and &lt;clause-type&gt; for predicates. While postpositions are separated from eojeol during our grammar extraction procedure, Han et al. considered them as &amp;quot;one&amp;quot; inflectional morphology of noun phrase eojeol. &lt;aux-pp&gt; adds semantic meaning of auxiliary postpositions such as only, also etc. which we can not extract automatically from SJTree or other Korean Treebank corpora because syntactically annotated Treebank corpora generally do not contain such semantic information. &lt;top&gt; marks the presence or absence of a topic marker in Korean like neun, however topic markers are annotated like a subject in SJTree which means that only &lt;case=nominative&gt; is extracted for topic markers. &lt;clause-type&gt; indicates the type of the clause which has its values such as main, coord(inative), subordi(native), adnom(inal), nominal, aux-connect. Since the distinction of the type of the clause is very vague except main clause in Korea, we do not adopt this feature. Instead, &lt;ef&gt; is extracted if a clause type is a main clause and for &lt;ec&gt; is extracted for other types.</Paragraph>
  </Section>
  <Section position="5" start_page="133" end_page="134" type="metho">
    <SectionTitle>
3 Experimentations
</SectionTitle>
    <Paragraph position="0"> The actual procedure of feature extraction is implemented by two phases. In the first phase, we convert syntactic tags and morphological analysis into feature structure as explained above (see Table 2 for our conversion scheme for syntactic tags and see Table 3 for morphological analyses). In the second phase, we complete feature structure onto nodes of the &amp;quot;spine (path between root and anchor, node in an initial tree and path between root and foot node in an auxiliary tree)&amp;quot;. For example, we put the same feature of VV bottom in Figure 1a onto VV top, VP top/bottom and S bottom because nodes in dorsal spine share certain number of feature of  VV bottom. The initial tree for a verb balpyoha.eoss.da ('announced') in (1) is completed like Figure 1b for a FB-LTAG.</Paragraph>
    <Paragraph position="1"> (1) ilbon oemuseongeun jeuggag haemyeong seongmyeongeul balpyohaessda . (1) ilbon oimuseong.eun (1) Japan ministy_of_foreign_affairs.Nom (1) jeukgak haemyeng seongmyeng.eul (1) immediately elucidation declaration.Acc (1) balpyo.ha.eoss.da (1) announce.Pass.Ter (1) 'The ministry of foreign affairs in Japan (1) immediately announced their elucidation'  b: &lt;ep&gt; = eoss b: &lt;ef&gt; = da b: &lt;mode&gt; = decl b: &lt;tense&gt; = past t: &lt;ep&gt; = x, &lt;ef&gt; = y, &lt;mode&gt; = i, &lt;tense&gt; = j t: &lt;ep&gt; = x, &lt;ef&gt; = y, &lt;mode&gt; = i, &lt;tense&gt; = j b: &lt;ep&gt; = x, &lt;ef&gt; = y, &lt;mode&gt; = i, &lt;tense&gt; = j t: &lt;ep&gt; = x, &lt;ef&gt; = y, &lt;mode&gt; = i, &lt;tense&gt; = j b: &lt;ep&gt; = x, &lt;ef&gt; = y, &lt;mode&gt; = i, &lt;tense&gt; = j t: b: &lt;ep&gt; = x, &lt;ef&gt; = y, &lt;mode&gt; = i, &lt;tense&gt; = j</Paragraph>
    <Paragraph position="3"> Table 4 shows the results of experiments in extracting feature-based lexicalized grammars. See Park (2006) for the detail extraction scheme.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML