File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0635_metho.xml

Size: 22,986 bytes

Last Modified: 2025-10-06 14:15:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0635">
  <Title>Corpus-Based Approach for Nominal Compound Analysis for Korean Based on Linguistic and Statistical Information</Title>
  <Section position="4" start_page="0" end_page="292" type="metho">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Nominal compound analysis is one of crucial issues that have been continuously studied by computational and theoretical linguists. Many linguists have dealt with nonlinal compounds in view of semantic interpretation, and tried to explain how nominal compounds are semanti-This work was partially supported by a KOSEF's postdoctoral fellowship grant.</Paragraph>
    <Paragraph position="1"> cally interpreted (Levi, 1978; Selkirk, 1982). In the field of natural language processing, various computational models have been established for syntactic analysis and semantic interpretation of nominal compounds (Finin, 1980; McDonald, 1982; Arens ct al. , 1987; Pustejovsky et al. , 1993; Kobayasi et al. , 1994; Vanderwerde, 1994; Lauer, 1995). Recently it has been shown that noun phrase analysis is effecrive for the improvement of the application of natural language processing such as information retrieval (Zhai, 1997).</Paragraph>
    <Paragraph position="2"> Parsing nominal compound is a basic step for ~11 problems related to it. From a bracketing point of view, structural ambiguity is also a main problem in nominal compomld analysis like in other parsing problems. Re(:ent works have shown that the corpus-b;~sed approach for nominal compound analysis makes a good result to resolve the ambiguities (Fustcjovsky et al. , 1993; Kobayasi et al. , 1994; Lauer, 1995; Zhai, 1997).</Paragraph>
    <Paragraph position="3"> Lauer (1995) has compared two diffbrent models of corpus-based approaches fbr nominal compound analysis. One was called as the adjacency model which was inspired by (Pustejovsky et al. , 1993), and the other was referred to as the dependency model which was presented by Kobayasi ~t al. (1994) 2 and Lauer (t995). Given a nominal compound of three nouns n~'-.2'a:~, let A.s. t)e a metric used to evaluate the association of two nouns. In the adjacency model, if A.~(',,l:',J.2) &gt; A.s(n2,n3), then the structure is determined as (('hi 'n2) n3). Otherwise, ('nl (',l,~ 'n:,)). On the other hand, in 2In their work, the structure is determined l)y comparing the multiplication of the ~ssociations between all two nOuns, that is, by comImring A,s('..t, 'n2)A.s(n2, n3) and AS(nl, n3) As (n2, ',l.:~). It m~tkes similar results to the dependency model.</Paragraph>
    <Paragraph position="4">  tim dClmn(h,,ncy model, the decision is det)endent on the association strength of nt for 'rt2 and ',,::. That is. the left branching tree ((at 'n2) ha) is constructed it&amp;quot; A.s(nt,'u2) &gt; As(at,ha), and I:he right branching tree ('nL (n2 'ha)) is made, ~M,,,rwise. Lauer (1995) has claimed that the ~h',lmndency model makes intuitive sense and i)r~)duces t)(,,tter results.</Paragraph>
    <Paragraph position="5"> In this paper, we propose a new model tbr ~)minal comt)ound analysis on the basis of w()rd (:o-()(:cui'ren(;(?s and grannnatical relati(mshil)s ilnmanent in nominal (:ompounds.</Paragraph>
    <Paragraph position="6"> Tim grammatical relation can sometimes ma,k(,, the (tisnmbiguation more precise as wo, ll as it gives a clue of the nonfinal inl.(Ul)r('Iation. For example, in the nominal (:~nnl)ound &amp;quot;KYEONG JAENG (competition) YUBALa(bringing about) CHEJE(system)&amp;quot; whi(:h meallS system to bring about competition, tim nominal conlpound &amp;quot;KYEONGJAENG Cl-tEJE((:oml)etition system)&amp;quot; co-occurs much more fl'equently titan &amp;quot;KYEONGJAENG YUBAL(bringing about competition)&amp;quot;. Howo.w;r, its structure is selected to be \[\[KYEONG.IAENG YUBAL\] CHEJE\]. Why it is analyzed in such a way can be shown easily by transli)rming the nominal compound to the clause. Because &amp;quot;YUBAL(bringing about)&amp;quot; is the predicatiw,, noun that derives the verb with the 1)redicative suffix attached, the modifying noun phrase can be transformed to the corresponding VP which has the meaning of &amp;quot;to bring about competition&amp;quot; (Figure 1). The verb &amp;quot;YUBAL-HA-NEUN(to bring about)&amp;quot; in VP takes the &amp;quot;KYEONG,lAENG(competition)&amp;quot; as the ob.iect. The predicative noun &amp;quot;YUBAL(bringing about)&amp;quot; also subcategorizes a noun phrase &amp;quot;KYEONGJAENG(competition)&amp;quot; in the same rammer as the verb. In the right syntactic tree of Figure 1, it should be noted that the object of a verb does not have the dependency ,elation to the noun outside the maximal 1)rojection of its head, VP. Likewise, the object &amp;quot;KYE()NGJAENG(competition)&amp;quot; does not have a,ny dependency with the other noun over the predicative noun &amp;quot;YUBAL(bringing a,t)out)&amp;quot;.</Paragraph>
    <Paragraph position="7"> :WUBAL is a noun in Korean which means to cause t,o bring about something</Paragraph>
  </Section>
  <Section position="5" start_page="292" end_page="293" type="metho">
    <SectionTitle>
2 Structure of Nominal Compound
</SectionTitle>
    <Paragraph position="0"> There is not any adjective derivation in Korean. Rather, a noun itself plays an adverbial or adjective role ill a nominal compound, or modifies other noun with possessive postposition attached. Table 1 shows various relations occurred in nominal compounds.</Paragraph>
    <Paragraph position="1"> As shown in the example, there is a relationship between two nouns which have dependency relation in a nominal compound.</Paragraph>
    <Paragraph position="2"> For instance, the first nominal compound in the example expresses compound meaning of individual nouns, i.e. the attribute that a .file has. On the other hand, in the example (c) of the example, the noun &amp;quot;GAENYEOM(concept)&amp;quot; is the object of the predicative noun &amp;quot;GUBUN(discrimination)&amp;quot;. A nominal compound, as such, often has the similar structure to a simple sentence, e.g.</Paragraph>
    <Paragraph position="3"> complement-predicate structure, as well as representing compound meaning with several nouns combined.</Paragraph>
    <Paragraph position="4"> Many researchers have tried to explain constraints given in tile process of word combination and the principle of semantic composition. Levi (1978) has tried to find the semantic constraints which govern the combination of each noun in a nominal compound.</Paragraph>
    <Paragraph position="5"> Sproat (1985) has taken into consideration the predicate-argument relation of nominals on the basis of generative syntax. He explained that the nominalization suffix nominalizes the syntactic category of a verb, but 0 role of the verb is percolated into its parent node.</Paragraph>
    <Paragraph position="6"> We claim that the nominalization is the phenomenon occurred at the syntactic level, and hence the syntactic relations should be reflected in nominal parsing. Namely, tbr accurate nominal compound parsing, we need syntactic knowledge about nominal compound in addition to lexical information about lexical selection. We propose a nominal parsing model based on two relations, which can be immediately applied to nominal interpretation. We classi(y the syntactic relations in a nominal compound as tbllows: modifier-head relation One noun (adnominal, adjective) adds n certain meaning to the other noun (head) producing a compound meaning (1, 2 in Table 1).</Paragraph>
    <Paragraph position="7"> complement-predicate relation One is the</Paragraph>
    <Paragraph position="9"> complement (subject, object, adverb) of the other noun (predicative noun) in a nominalcompound (3, 4, 5 in Table 1).</Paragraph>
    <Paragraph position="10"> When considering the complement-predicate relation, we can figure out some syntactic constraints imposed on nonfinal compounds. For example, in &amp;quot;PA'.IL(file) SOGSEONG(attribute) BYEONKYEONG (change)&amp;quot;, &amp;quot;SOGSEONG(attribute)&amp;quot; is the object of the predicative noun &amp;quot;BYEONKYEONG (change)&amp;quot;. It can be expanded to a sentence like &amp;quot;X changes the .file attribute&amp;quot;. In other words, the syntactic lewfls of two phrases &amp;quot;PA'IL SOGSEONG(file ~ttribute)&amp;quot; and &amp;quot;BYEONKYEONG(change)&amp;quot; in the compound noun are different, where one is NP and the other is VP. That the syntactic levels (i.e. syntactic categories) of nominal compounds are different means that the different method is required for the proper a,nalysis of their structures.</Paragraph>
    <Paragraph position="11"> Next, a predicative noun does not subcategorize more than two nominals with the same granunatical cases. For instance, a predicative norm in a nominal compound governs either a subject or an object at most. The situation is w-~ry sinfilar to that occurred in a sentence. In this paper, this is called one case per sentence, which means that a predicative noun cannot subcategorize two nouns of the same grammatical cases when the relations of nominals can be expanded to a sentence.</Paragraph>
  </Section>
  <Section position="6" start_page="293" end_page="294" type="metho">
    <SectionTitle>
3 Acquiring Lexical Knowledge
</SectionTitle>
    <Paragraph position="0"> We collect lexical co-occurrence instances from corpus in order to get knowledge tor nominal compound analysis. The text material is composed of 40 million (:ojeols of Yonsei Lexicographical Center corpus a.mt KAIST corpus (330M bytes). The Korean morphoh)gi(:al analyzer, the POS tagger and the partial parser are used to obtain co-occurreu(:es.</Paragraph>
    <Paragraph position="1"> In order to construct linguistic lexical data tbr nominals, we first, extracted verb-noun CO-OCcurrence (|ata fi'onl (;()rpus using the partial parser. A noun is c(mnected to a verb with a synta(:ti(: relation, and the co-occurrences are re,1)rescnted t)y triples (verb, nou'n,, syntactic rda, t'io'H,). The postpositions are reposited in tit(,, syntactic relation feld in order to represent the syntacti(: relations which might o(:cur tmtween two nouns. Nominal pairs with (:omplenmnt-predi(:ate relation are derived fl'om the data extracted.</Paragraph>
    <Paragraph position="2"> Predicative nomls l)e(:()me vexbs with the verbalization suffix such as '-HA-' attached. For exampl(,,, the predicative noun 'KEOMSAEK(retrieva.1)' is verbalized to 'KEOMSAEK-HA(retrieve)' 1)y adding the suffix '-HA-'. Theretbr(~, we (:an get  c~mq)lement-predicate relations by reducing w;rl)s to predicative nouns with cutting, if ;my, the verbalization suffix. Table 2 shows s(Hne llOun-nouIl co-occurrence examples of ,omplement-predicate relation derived in that way.</Paragraph>
    <Paragraph position="3"> Second, co-occurrences colnposed of only two 1,orals (complete nominal compound) were obrained. In Korean, complete nominal com-IT(rends arc extracted in the tbllowing way. Let us suplmse that N, NA, NP be the set of nouns, the set of nouns with tile possessive postposi,:ion, and the set of nouns with a postposition ~xcept the possessive postposition, respectively. * For eojeols et,e2,e3, where el C/ N U NA, e2 E NUNA, e3 E NP, count (n2, ha), where 'r~,2 and n3 are tile nouns that belong to e~ and e:~ respectively.</Paragraph>
    <Paragraph position="4"> The data could contain two relations e.g.</Paragraph>
    <Paragraph position="5"> modifier-head relation and complement-head relation. Therefbre, we manually divide them into two classes by hand according to the relation.</Paragraph>
    <Paragraph position="6"> Many erroneous pairs could be removed by the ma,nual process. Furthermore, we manually assign to each nominal pair syntactic relations such as SUB J, OBJ and ADV since the synta(:tic relation does not explicitly appear from Ira.its obtained in the second (Table 3), Actually, there is it() immanent syntactic relation between two nouns of modifier-head relation. On the other hand, some syntactic relation such as case marker and adverbial relation can be given to two nouns with complement-predicate relation.</Paragraph>
    <Paragraph position="7"> Some examples are given in Table 3. The data of complement-head relation are merged with those established with the partial parser, which are complement-head co-occurrences. The rest of the data have modifier-head co-occurrences.</Paragraph>
    <Paragraph position="8"> Consequently, the complement-predicate co-occurrence is represented with a triple {comp',,o'wn,, pred-noun, syn-rel) as shown in Table 2. Syntactic relation is described with postposition tbr case mark or ADV in Korean. The syntactic relation is not given to the modifier-head cooccurrence. null In the corpus based approach for natural language processing, we should take into consideration the data sparseness problem because the data do not contain whole phenomena of the language in most cases. Ma~W researchers have proposed conceptual asso(:iation to ba(:k off the lexical association on the assumption that words within a (;lass behave similarly (Resnik, 1993; Kobayasi et al. , 1994; Lauer, 1995). Namely, word classes were stored instead of word cooccurrences. null Here, we must note that predicates does not act according to their semantic category.</Paragraph>
    <Paragraph position="9"> Predicates tend to have wholly different case frames ti'om each other. Thus, we stored individual predicative nouns and semantic classes of their arguments instead of each semantic class tor two nouns: In effect, given a word co-occurrence pair ('nl,'n2) and, if any, a syntactic relation s, it is transfbrmed and counted in the fbllowing way.</Paragraph>
    <Paragraph position="10">  1. Let ci be the thesaurus class which ni belongs to. 2. If (nl,n2) are a pair in eo-occurrences of complement-predicate relation 3. Then 4. For each ci which nl belongs to, 5. Increase the \]~'equency of (ci, 'n2, s) with the count of (~1, n~).</Paragraph>
    <Paragraph position="11"> (Here, ,s is an immanent syntactic relation) 6. Else 7. For&amp;quot; each class ci and c i to which 'n~ and n2 belongs respectively, 8. Increase the .#'equency of (ci, cj) with the count of</Paragraph>
    <Paragraph position="13"> Consequently, we built two knowledge sources with different properties, so that we needed to make the method to deal with them. In the next section, we will explain the effective method of analysis based on that different lexical knowledge. null</Paragraph>
  </Section>
  <Section position="7" start_page="294" end_page="297" type="metho">
    <SectionTitle>
4 Nominal Compound Analysis
</SectionTitle>
    <Paragraph position="0"> In order to make tile process efficient, the analyzer identifies the relations in a nominal compound, if any, which can be the guideline of phrase structuring, and then analyzes the structures based on the relations.</Paragraph>
    <Paragraph position="1"> Figure 2 shows an example of the phrase structure of a nominal compound to include the complement-predicate relation. We showed that the nominal compound with the complement-predicate relation can be expanded to a simple sentence which contains NPs and VP. This means again that the nonfinal compound with  argument predicative noun syntactic relation</Paragraph>
    <Paragraph position="3"/>
    <Paragraph position="5"> the complement-predicate relation can be divided into one or more phrasal units which we (:all inside phruse.</Paragraph>
    <Paragraph position="6"> The nonfihal compound in Figure 2 has three inside phrases - NPsuBJ, NPoBJ and V. Some nonfinal compounds may not have any inside phrase. Besides, the structure in each inside phrase can be determined by the word co-occurrence based method presented by Lauer (1995) and. (Kobayasi et al. , 1994), i.e. only statistical association.</Paragraph>
    <Section position="1" start_page="295" end_page="296" type="sub_section">
      <SectionTitle>
4.1 Association between nouns
</SectionTitle>
      <Paragraph position="0"> Inside phrases can be detected based on the association, since two nouns associated with the complement-predicate relation indicate existence of an inside phrase. We distinguish the association relation by discriminating knowledge source. Thus the associations are calculated in a different way as follows. Here, ambi(n) is the number of thesaurus classes in which n appears, and Nc'p and NMH are the total number of the complement-predicate and the modifier-head co-occurrences- respectively.</Paragraph>
      <Paragraph position="1"> . Complement-Predicate The association can be computed based (m the complement-predicate relations obtained from complement-predicate co-occurrence data. It measures the strength of statistical association between a noun, 'At, and a predicative noun, n.2, with a given syntactii~ relation s which is the syntactic relation like subject, object, adverb. Let ci 1)e categories to which nl belongs. Then, the degree that nl is associated with n2 as of two nouns analyzed .</Paragraph>
      <Paragraph position="2"> the complement of n2 is defined as tbllows:</Paragraph>
      <Paragraph position="4"> The association of two nouns is estimated by the co-occurrences wlfich were collected for the modifier-head relation. In the similar way to the above, let ci and qj be the categories to which 'n, and 'n2 belongs respectively. Then, the association degree of nl and n2 is defined as tbllows: ASSOCMH(ni,n2)-- 1 xZ freq(ci,cj) NMH . a'm, bi(nl )ambi(n2) (2) The syntactic relation is deternfined by the association. If' the association between two nouns can be computed by the t'ornnfla 1, the complement-t)redicate relation is given to the nouns. If not, the relation of two nouns is simply concluded with the modifier-head relation. We can recognize the syntactic relation inside a nominal (:Oml)OmM by the association involved. In order to distinguish the associations in accordance with the relations, the association is expressed by a triple (relation, (sy'n-'re, l, v.,l'u,e.)}. Tim relation is chosen with CP or MH a~:c:ording to the fi)rmula used to estimate the a.ssocia.tion. If 'relation is CP, the syn-'rc, l has a,s its va.lue SUB J, OBJ, ADV etc., which arc given by co-oc~:urrence data acquired. ()therwise, (/) is assigned. Lastly, the value is computed by the tbrnnfla. The association is estimated in the tbllowing way,</Paragraph>
      <Paragraph position="6"/>
      <Paragraph position="8"> If no co-occurrence data for a nominal (:Oml)ound are fbund in both databases, the modifier-head relations is assumed and the left association is favored tbr unseen data. The lm;ti-wence of left association is reasonable tbr I)ra.cketing of nonfinal compounds since the left associations occupy the bracketing patterns lnuch more than the right associations as shown in Ta,l)le 6.</Paragraph>
    </Section>
    <Section position="2" start_page="296" end_page="297" type="sub_section">
      <SectionTitle>
4.2 Parsing
</SectionTitle>
      <Paragraph position="0"> Since the head always tbllows its complement in Korean, the ith noun in the nominal compound consisting of n nouns has head candidates of ,,,- i that it might be depend on, and the parser selects the most probable one from them. The parser determines the head of a complement by a,n association degree of head candidates for the complement.</Paragraph>
      <Paragraph position="1"> The easiest way is to have the head candidate list sorted on the association, and select most strongly associative one. In the process of selection, the tbllowing constraints are imposed if the relation of two nouns is complementpredicate(CP). Given a nominal compound of three nouns (?~, 1., '//,2, ha), * If (n2, ha) are related with CP and the syntactic relation of (&amp;quot;,2, &amp;quot;,:3) is the same as that of (nl, ha), then &amp;quot;~,l is not dependent on n3. This is called one case per sentence constraint. null If nl has an association with n2 by CP relation, it does not have dependency relation with ha. See Figure 1 If n2 plays an adverbial role tbr ha, then n, is not linked with rt,2.</Paragraph>
      <Paragraph position="2"> Cross dependency is not allowed. It means that dependent-head relations do not cross each other.</Paragraph>
      <Paragraph position="3"> As an example, given the nominal compound &amp;quot;iDAEJUNG(public) ~MUNHWA(culture) aBIPAN(criticism)&amp;quot;, we can get the association table as shown in Table 4. According to the table, the first and second noun can be linked with the modifier-head relation and the association degree of 0.00021. The second noun can depend on the third noun with the complement-predicate relation, and the association degree is 0.00018. Furthermore, the argument is inihrred to the object of the predicate, which can be easily recognized by the co-occurrence data extracted.</Paragraph>
      <Paragraph position="4"> The table is sorted on the association so that the parser can easily search tbr the probable candidate for head. In order to effectively detect inside phrases and check the constraints, the syntactic relation should be checked prior to the comparison of the association value. That is, the first key is the rdal:ion and the second, association value. Thus, CP &gt; MH, and the  association values are compared in case of the sanle rvlation value.</Paragraph>
      <Paragraph position="5"> As a consequence, the association table is actually implemented to the association list as follows:</Paragraph>
      <Paragraph position="7"> From the list we know it is probable that the noun &amp;quot;DAEJUNG(public)&amp;quot; is dependent on &amp;quot;BIPAN(criticism)&amp;quot; with OBJ relation. On the other hand, two words &amp;quot;DAE-JUNG(public)&amp;quot; and &amp;quot;MUNHWA(culture)&amp;quot; are tbund in modifier-head co-occurrences and thus associated with the modifier-head relation.</Paragraph>
      <Paragraph position="8"> Then, the parsing process can be defined as follows: null h, ead( n,: ) = 'at (3) l = index( max (Assoc(ni, nj))) j=i+ l,...,k Here index returns the index of noun nl whose association with ni is the maximum.</Paragraph>
      <Paragraph position="9"> Namely, the parser tries to find the following candidate tbr the head of each noun ni in a nominal compound consisting of k nouns, and make n link between them. If constraints are violated while parsing, the next candidate of the list is considered by the parser. According to the algorithm, the given example is parsed as follows: . There is only one candidate for &amp;quot;MUNHWA&amp;quot;. &amp;quot;MUNHWA(culture)&amp;quot; has the dependency on &amp;quot;BIPAN(criticism)&amp;quot; with object relation. The fact that there is tim complement-predicate relation lmtween two nouns indicates that those are the elements of inside phrases, where one belongs to NP and the other has the property of VP. The inside phrases are  to the constraints as the modifier-head relation, and &amp;quot;DAEJUNG(public)&amp;quot; is linked to &amp;quot;MUNHWA(culture)&amp;quot; with the relation.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML