File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2146_metho.xml

Size: 9,199 bytes

Last Modified: 2025-10-06 14:07:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2146">
  <Title>Structural disambiguation of morpho-syntactic categorial parsing for Korean *</Title>
  <Section position="3" start_page="0" end_page="1002" type="metho">
    <SectionTitle>
2 Overview of KCCG
</SectionTitle>
    <Paragraph position="0"> This section briefly reviews the basic KCCG formalism. null Following (Steedman, 1985), order-preserving type-raising rules are used to convert nouns in grammar into the functors over a verb. The following rules are obligatorily activated during parsing when case-marking morphemes attach to nora1 stems.</Paragraph>
    <Paragraph position="2"> This rule indicates that a noun in the presence of a case morpheme becomes a functor looking for a verb on its right; this verb is also a flmctor looking for the original noun with the appropriate case on its left. Alter tile noun functor combines with the appropriate verb, the result is a flmctor, which is looking for the remaining arguments of the verb. 'v' is a w~riable tbr a verb phrase at ally level, e.g., the verb of a matrix clause or the verb of an embedded clause. And 'v' is matched to all of  the &amp;quot;v\[X\]\Args&amp;quot; patterns of the verl, categories. Since all case-marked ilouns in Korean occur in front of the verb, we don't need to e, mploy the directional rules introduced by (Hoffman, 1995).</Paragraph>
    <Paragraph position="3"> We extend the combinatory rules ibr uncm'ried flmctions as follows. The sets indicated by braces in these rules are order-free.</Paragraph>
    <Paragraph position="5"> Using these rules, a verb can apply to its arguments in any order, or as in most cases, the casednarked noun phrases, which are type-raised flmctors, can apply to the, al)t)roi)riate verbs.</Paragraph>
    <Paragraph position="6"> Coordination constructions are moditied to allow two type-raised noml 1)hrases that are looking tbr the saxne verb to combine together. Since noun phrases, or a noun phrase and adverb phrase, are fimctors, the following composition rules combine two flmctions with a set vahle al'gulnents.</Paragraph>
    <Paragraph position="8"/>
  </Section>
  <Section position="4" start_page="1002" end_page="1002" type="metho">
    <SectionTitle>
3 Basic morph-syntactic chart
</SectionTitle>
    <Paragraph position="0"> parsing Korean chart parser has been developed based on our KCCG modeling with a 10(},0()0 morpheme dictionary. Each morpheme entry in the dictionary has morphological category, morphotactics connectivity and KCCG syntax (:ategories tbr the morpheme.</Paragraph>
    <Paragraph position="1"> In the morphological analysis stage, a unknown word treatment nmthod based on a morpheme pattern dictionary and syllable bigrams is used after (Cha et al., 1998). POS(part-of speech) tagger which is tightly coupled with the morphological analyzer removes the irrelewmt morpheme candidates from the lnorpheme graph. The morpheme graph is a compact representation method of Korean morphological structure. KCCG parser analyzes the morpheme graph at once through the morpheme graph embedding technique (Lee et al., 1996).</Paragraph>
    <Paragraph position="2"> The KCCG parser incrementally analyzes the sentence, eojeol by eojeol :1 Whenever an eojeol is newly processed by the morphological analyzer, the morphenms resulted in a new morpheme graph are embedded in a chart and analyzed and combined with the previous parsing results.</Paragraph>
  </Section>
  <Section position="5" start_page="1002" end_page="1005" type="metho">
    <SectionTitle>
4 Statistical structured
</SectionTitle>
    <Paragraph position="0"> disambiguation for KCCG parsing Th(' statistics which have been used in the experinlents have been collected fronl the KCCG parsed corpora. The data required for training have been collected by parsing the standard Korean sentence types 2, example sentences of grammar book, and colloquial sentences in trade interview domain 3 and hotel reservation domain 4. We use about; 1500 sentences for training and 591 indq)endent sentences for evaluation. null The evaluation is based on parsewfl method (Black el, a\]., 1991). In the evaluation, &amp;quot;No-crossing&amp;quot; is 1;11o number of selltellces which have no crossing brackets between the result and |;tie corresponding correct trees of the sentences. &amp;quot;Ave. crossing&amp;quot; is the average number of crossings per sentence.</Paragraph>
    <Section position="1" start_page="1002" end_page="1003" type="sub_section">
      <SectionTitle>
4.1 Basic statistical model
</SectionTitle>
      <Paragraph position="0"> A basic method of choosing the nlost plausible parse tree is to order the prot)abilities by the lexical preib, rences 5 and the syntactic merge probability. In general, a statistical parsing model defines the conditional probability, 1&amp;quot;(71S), for each candidate tree r tbr a sentence S. A generative model uses the observation that maximising P(% S) is equivalent to maximising P(rIS) 6.</Paragraph>
      <Paragraph position="1">  Thus, when S is a sentence consisted of a sequence of morphemes tagged for part-of-speech, (w~, t~), (w2, t2), ..., (w,,, tu), where wi is a i th morpheme, ti is the part-of-speech tag of the morpheme wi, and cij is a category with relative position i, j, the basic statistical model will be given by:</Paragraph>
      <Paragraph position="3"> bilities to a bottom-up composition of the tree.</Paragraph>
      <Paragraph position="5"> The basic statistical model has been applied to morpheme/part-of-speech/category 3-tuple.</Paragraph>
      <Paragraph position="6"> Due to the sparseness of the data, we have used part-of-speech/category pairs 7 together, i.e., collected the frequencies of the categories associated with the part-of-speeches assigned to the morpheme. Table 1 illustrates the sample entries of the category probability database. In table, 'nal (fly)' has two categories with 0.6375 mid 0.3625 probability respectively. Table 2 illustrates the sample entries of the merge probability database using equation 7.</Paragraph>
      <Paragraph position="7"> frequency (old ,tl ) 7We define this as P(cljltl) ~ fvcq ...... y(tD &amp;quot;  Table 3 summarizes the results on an open test set of 591 sentences.</Paragraph>
    </Section>
    <Section position="2" start_page="1003" end_page="1004" type="sub_section">
      <SectionTitle>
4.2 Head-head co-occurrence heuristics
</SectionTitle>
      <Paragraph position="0"> In the basic statistical model, lexicM dependencies between morphemes that take part in merging process cannot be incorporated into the model. When there is a different morpheme with the same syntactic category, it can be a miss match on merging process. This linfitation can be overcome through the co-occurrence between the head morphemes of left and right sub-constituent.</Paragraph>
      <Paragraph position="1"> When B h is a head morphenm of left subconstituent, r is a case relation, C h is a head morpheme of right sub-constituent as shown in figure 1, head-head co-occurrence heuristics are defined by:</Paragraph>
      <Paragraph position="3"> Tile head-head co-occurrence heuristics have been augmented to equation 5 to model the lexical co-occurrence preference in category merging process. Table 4 illustrates the sample entries of the co-occurrence probability database.</Paragraph>
      <Paragraph position="4"> In Table 4, a morpheme 'sac (means 'bird')', which has a &amp;quot;MCK (common noun)&amp;quot; ms POS tag, has been used a nominative of verb 'hal (means 'fly')' with 0.8925 probability.</Paragraph>
      <Paragraph position="5">  nl, (v/(v\nont))\n t , v/(v\np\[nom\]) I).2197 The modified model has been tested Oil the same set of the open sentences as in the 1)asic model ext)eriment. 'l~fl)le 5 smnmarizes the result of these expcwiments.</Paragraph>
      <Paragraph position="6"> * Ezperimcnt: (linear combination af th, c basic model and the head-h, cad co-occurrence heuristics).</Paragraph>
      <Paragraph position="8"> if cij is a terminal,</Paragraph>
    </Section>
    <Section position="3" start_page="1004" end_page="1005" type="sub_section">
      <SectionTitle>
4.3 The coverage heuristics
</SectionTitle>
      <Paragraph position="0"> If&amp;quot; there is a case relation or a modification relation in two constituents, coverage heuristics designate it is easier to add the smaller tree to the larger one ttlan to merge the two medium sized trees. On the contrary, in the coordination relation, it is easier to nmrge two medium sized trees. We implemented these heuristics using /;tie tbllowing coverage score: Case relation, modification relation:  'e x x/left .~,a,l.,'~c. ~o.,,,.~,.,,.:,.. x ,'#lht .~,O,l,,.,, ,:o.,,~,.,,,~ 1~ leJ't subtree cove,.aqe + R~ ~b~r('.e ~;.~t .</Paragraph>
      <Paragraph position="1"> A coverage heuristics are added to the basic: model to model the structural preferences. Table 6 shows the results of the experinlents on the same set of the open sentences.</Paragraph>
      <Paragraph position="2"> * Ezpcriment: (the basic model to th, c COV_scorc heuristics). We have used (;tie COV_.sco're as the exponent weight feature for this experiment since the two nmnl)ers arc; in the different nature of statistics.</Paragraph>
      <Paragraph position="4"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML