File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-0603_metho.xml
Size: 17,107 bytes
Last Modified: 2025-10-06 14:15:08
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-0603"> <Title>Representation and Processing of Chinese Nominals and Compounds</Title> <Section position="3" start_page="20" end_page="20" type="metho"> <SectionTitle> 3 Role of Lexico-Semantic Rules </SectionTitle> <Paragraph position="0"> This section deals with the use of morpho-semantic lexical rules (MSLRs) in the process of large-scale acquisition. The advantage of MSLRs is twofold: first, they can be considered as a means to reduce the number of lexicon entry types, and generally to make the acquisition process faster and cheaper; second, they can enhance the results of analysis processing by creating new entries for unknown words from the lexicon, found in corpora. Lexical rules have been addressed by many researchers. Here we apply Viegas et al. (1996) methodology to Chinese.</Paragraph> <Paragraph position="1"> Briefly, applying MSLRs to the Spanish entry comprar (buy), our MSLR generator produced automatically 26 new entries (comprador-N1 (buyer), comprable-Adj (buyable), etc). This includes creating new syntax, semantics and syntax-semantic mappings with correct subcategorisations and also the right semantics; for instance, the lexical entry for cornprable will have the subcategorisation for predicative and attributive adjectives and the semantics adds the attribute &quot;FeasibilityAttribute&quot; to the basic meaning &quot;Buy&quot; of comprar. The form list generated by the morpho-semantic generator is checked against MRDs, dictionaries and corpora: only the forms found in them are submitted to the acquisition process.</Paragraph> <Paragraph position="2"> MSLRs constitute a powerful conceptual tool to extend a core lexicon from a monolingual viewpoint.</Paragraph> <Paragraph position="3"> We applied the same methodology to Chinese. The rules are language independent, what is language dependent is the morphemes to which they can apply.</Paragraph> <Paragraph position="4"> For Chinese, we do not have to worry about developing a morpho-semantic generator as the productivity in morphology is poor, if one excepts compounds characters in which semantics is not compositional (see the example of ~ (glucose) below). In this latter case, we acquired the entry manually. So rules are used to link nominalisations to verbs, and vice versa, meaning that once verbs have been acquired, nominal derivations can be produced automatically using rules.</Paragraph> <Paragraph position="6"> We present below the entry for ~-N1 (affirmation) after the application of the LR2event rule on ~-V1 (affirm).</Paragraph> <Paragraph position="7"> #0= lkey:~, gram:lpos: N\], semRep:#sem, synSenn:\[gram:#ol, semRep:#t\], \[grarn:#o2, semRep:#a\], lexRule: LR2event\[root :\[key: ~ ~, gram:\[pos:V, subc:NPVNP\], semRep:#sem=\[narne:Assert, agent:#a, therne:#t\]\], vn:#0\]\] In our corpus (from the Chinese newspaper Xinhun Daily), we found that 166 nouns could be derived from 351 verbs; that is, almost 47% of verbs can produce nouns. From an acquisition viewpoint, it is cheaper to use the mechanism of lexical rules to automatically produce nouns from verbs, with the same semantics, this is due to our transcategorial approach to semantics, where the same piece of semantics can be lexicalised as either a Noun or Verb.</Paragraph> </Section> <Section position="4" start_page="20" end_page="23" type="metho"> <SectionTitle> 4 A Transcategorial Approach </SectionTitle> <Paragraph position="0"> Compounding in Chinese is a common phenomenon (Jin, 1994; Jin and Chen, 1995; Palmer and Wu, 1995). It is mainly used to combine i) characters whose semantics is different and non compositional, and ii) sequences of nouns.</Paragraph> <Paragraph position="1"> In i) we create entries for single characters and entries for combined characters, (e.g., (2)): (2) a. ~j (grape) b. (sugar) c. ~j~j~ (glucose) In this paper we are concerned with ii) only. In the following, we investigate three ways to translate Chinese nominal compounds into English, using word order information, semantic information and co-occurring information in syntactic, semantic and transfer approaches, respectively.</Paragraph> <Section position="1" start_page="20" end_page="21" type="sub_section"> <SectionTitle> 4.1 A. Syntactic A.pproach </SectionTitle> <Paragraph position="0"> Compounds proliferate in Chinese. The head of the compound seems to be easily identified as the last noun in the sequence, and therefore in the task of translating Chinese compounds into English compounds, where English also makes use of compounds as opposed to say French, one could adopt a transfer-based approach, where each Chinese noun is translated into English in the same sequence: 1~ ~ (application software); ~$.~ q~ ~.~ (data management system). It gets a bit more complex when there is a large sequence of nouns in English, whereas it is still acceptable and normal ill Chinese. In our corpus we found compounds containing up to 6 nouns: ~ ~ ~ t~1~ ~ (~ (military theory test database management system) (the management system of database for testing military theory). In these cases, it is difficult to comprehend the compound in English and some &quot;linking information&quot; is needed to break the compound and make it understandable in English. This is where the semantics comes in, as one needs to understand the underlying relationships between the nouns, and identify &quot;sub-heads&quot; inside the Chinese compounds, which will become the heads of English smaller compounds linked via relations. For instance, in ~t1~ ~ ~ ~ '~ ~ (military theory test database management system)(the management system of database for testing military theory) one might want to &quot;break&quot; the Chinese compound into smaller English compounds &quot;management system,&quot; &quot; database&quot; and &quot;military theory&quot; with a relation &quot;test&quot; between the last compounds (the management system of database for testing military theories).</Paragraph> </Section> <Section position="2" start_page="21" end_page="22" type="sub_section"> <SectionTitle> 4.2 A Lexlco-Semantic Approach </SectionTitle> <Paragraph position="0"> We now show examples of how the semantics can help identify sub-heads inside the Chinese compound (the head of the Chinese compound is the last noun).</Paragraph> <Paragraph position="1"> Second, we show how a transcategorial approach can help go from an NN compound ~: ~ (economy policy) in Chinese to AdjN constructions (economic policy) in English. Finally, we show how nominal mismatches are dealt with as a generation issue.</Paragraph> <Paragraph position="2"> For illustration purposes, we will mainly consider compounds composed of two nouns; however, this semantic approach applies to more than 2 nouns.</Paragraph> <Paragraph position="3"> Lexemes can be mapped to Objects (O) (';Car&quot; car), Events (E) (&quot;Explode&quot; explosion), Relations (R) (&quot;Utilizes&quot; use) or Attributes (A) (&quot;ColourAttribute&quot; colour). In the case of NNs, we have 14 combinations allowed (RR and AA do not seem to co-occur), where E, O and R can be heads, with the following hierarchy of headhood: E>R>O When the semantics of the NN is expressed with a combination of identical types (e.g. EE or OO), the semantic analyser must score the constraints between the two nouns to find the head. Sometimes it is possible to find the semantic relation linking the two nouns in the ontological entry of the nouns, as in the example OO below.</Paragraph> <Paragraph position="5"> Here, both nouns are typed as O, and therefore we need a mechanism to assign tile head. The generator must identify the underlying relation between the Os. This can be done by searching for a relation R in the ontology shared by&quot; the 20s, such as &quot;appliedto&quot; with a domain which is in an ISA relationship with &quot;technology&quot; and a range also in an ISA relationship with &quot;computer&quot;. Needless to say that this approach is knowledge intensive, and in case we do not have this type of knowledge encoded we rely on a transfer-based approach following tlle Chinese word order. Here, we could successfully generate technology about computer and computer technology, with a preference on the latter.</Paragraph> <Paragraph position="6"> (OR) Object - Relation lnp \[mod ~: (n, hang2, business, AreaOfExpertise)\] \[n ~ (n, zhang3, leader, HeadOf)\]\] &quot;HeadOf&quot; is a relation and therefore the head, as the other noun is an O. The generator can lexicalise this as leader of business or business leader via a rule; the latter is assigned a preference in absence of modifiers such that we can still generate the leader of a big business instead of big business leader. Note that we do not need to use the hierarchy in the case of only two Ns to identify the head because, the head is the last noun in a Chinese compound; we showed this example, in case it entered in a larger compound such as &quot;business leader major office&quot; where one might want to break it as &quot;the major office&quot; of &quot;the business leader&quot;.</Paragraph> <Paragraph position="7"> Here, E is the head and this semantics is lexicalised as way of working or work style, with again a preference on the latter.</Paragraph> <Paragraph position="8"> (OR) illustrates our transcategorial approach: \[np lmod ~g (n, jinglji4, economy, Economy)\] \[n ~ (n, xiao4)'i4, benefit, BenefitFrom)\]\] Here is a case where our transcategorial approach to lexicon representation helps in generating an AdjN construction economic benefit for an NN Chinese compound; this is due to the fact that both economy and economic share the same semantics, and thus the generator will present both possibilities; moreover, they co-occur in English whereas economy and benefit do not. The head is easily identified in R &quot;BenefitFrom&quot; and as such the compound could also be generated as benefit to economy.</Paragraph> <Paragraph position="9"> This NNN compound presents a case of mismatch between Chinese and English, it can be paraphrased as: plan to solve key problems in science and technology. Here, a transfer-based approach would fail to translate adequately, as ~ (attack-key-problem) must be expressed as an expression equivalent to solving important problems, and as such the following English compound science technology solving key problem plans must be broken into smaller compounds with explicit relations between them. s These examples illustrate why a semantic approach is preferable, and sometimes necessary, to translate Chinese compounds into English. However, as discussed earlier, 1) this approach is knowledge intensive, and 2) English compounding seems to follow the same Chinese word order regularly enough so that we consider using a transfer approach as a back-up to generation.</Paragraph> </Section> <Section position="3" start_page="22" end_page="22" type="sub_section"> <SectionTitle> 4.3 A Transfer-based Approach </SectionTitle> <Paragraph position="0"> Semantics can be expensive to use so we also rely on a transfer-based approach as a back-up method when semantics fails to give us the semantic relation between the nouns. We can do this because English allows compounding (whereas for French and Spanish, a transfer approach would be more problematic as compounding is not as productive and relations must be identified). However, as we noticed previously, it can become difficult in English to get the meaning of a large compound, it is therefore better to &quot;break&quot; the compound into 2 or 3 compounds.</Paragraph> <Paragraph position="1"> We hope to bypass part of this problem by using co-occurring information in a transfer approachfi computer database for test of ~heory of military affairs In this case, only co-occurring information will signal the generator to link &quot;computer&quot; to &quot;database&quot; to produce &quot;computer database&quot;; this information must be encoded in the lexicon, as we show in next section.</Paragraph> <Paragraph position="2"> \[np lmod lmod \[rood \[mod ~i~ (n, junlshi4, military-affaim, MilitaryAetiwty)\] Sin &quot;Solve+Attitude&quot;, Attitude reflects the importance attached to the event.</Paragraph> <Paragraph position="3"> 6We saw that in a semantic approach the headhood hierarchy provides a good clue to break a compound. \[n ~ (n, li31un4, theory, Theory)\]\] \[n :~J~ (n, kao3he2, test, Examination)\]\] .</Paragraph> <Paragraph position="4"> \[n ~J\[~ (abbr, ti2ku4, text database, Database)\]\] \[np \[mod ~ (n, guan31i3, management, ManagementAetivity)\] \[n ~ (n, xi4tong3, system, System)\]\]\] Following the Chinese word order seems to be acceptable in English, to produce military theory test database management system. However, a better translation might be the management system of a database for testing military theory, in which case, relations between nouns must be made explicit, using the semantic information found in the ontological concepts in a semantic approach.</Paragraph> </Section> <Section position="4" start_page="22" end_page="23" type="sub_section"> <SectionTitle> 5 Processing of Chinese Nominals and Nominal Compounds </SectionTitle> <Paragraph position="0"> We utilise an efficient constraint-based control mechanism called Hunter-Gatherer (HG) (Beale, 1997) to process Chinese nominals and compounds. This mechanism has been successfully applied to the analysis of Spanish and generation of English. We refer to (Beale et al., 1995) for details on how the semantic analyser works, and (Beale et at., 1997) on how the generator works.</Paragraph> <Paragraph position="1"> In this paper, we are interested in showing how HG allows us to mark certain compositions as being dependent on each other: once we \]lave two lexicon entries that we know go together, from either syntactic, lexical, or semantic viewpoints, HG will ensure that they are correctly treated. HG gives preference to words which &quot;co-occur&quot; together, from any of the above viewpoints. The analyser simply needs to detect the co-occurrence and add the constraint that the corresponding senses be used together.</Paragraph> <Paragraph position="2"> In the case of &quot;computer database,&quot; the lexicon entry for &quot;database&quot; encodes the syntagmatic relation (LSFSyn) which keeps the semantics of the nouns compositional and signals the processor (analyser or generator) to consider the nouns as syntactically linked: #O=\[key: &quot;~ \]~&quot;, reh lsyntagmatiC/: LSFSyn \[base: #0, co-occur: \[key: &quot; ~ ~\[ ~&quot;, sense: n i, ...\]\]\]\] We provide below the example of a Chinese sentence, its English translation and relevant parts of the result of the semantic analysis, showing the analysis of the compound ~ ~ &quot;tackle-key-problem&quot;. English Literal Translation This classifier attack-key-problem big project by State-Maritime-Bureau direct whole country 34 classifier units adjective-marker 2000 more classifier scientific-and-technical personnel participate attackkey-problem, is one classifier including 7 classifier tasks, 39 classifier special topics adjective-marker large-scale engineering application project.</Paragraph> <Paragraph position="3"> English Translation This project which deals with important problems, directed by the State Maritime Bureau and in which participated more than 2,000 scientific and technical personnel from 34 units throughout the country, was a large-scale engineering application project including seven tasks and 39 special topics.</Paragraph> <Paragraph position="4"> In this paper, we showed the advantage of adopting a transcategorial (semantic-based) approach to relate verbs with their nominalisations. We showed how to use lexico-semantic rules to relate different forms carrying the same semantics. These rules can be applied at run time in analysis, thus facilitating a syntactico-semantic recovery for unknown words.</Paragraph> <Paragraph position="5"> Concerning compounds, we have shown that we cannot avoid a semantic approach if we want a high quality translation, because of the number of nouns which can enter into a Chinese compound making it difficult to get the meaning of the compound in English. Thus, breaking the compound necessitates an understanding of the Chinese compound. However, we have suggested transfer-like approach for Chinese to English translation with the use of co-occurrences to &quot;signal&quot; privileged lexical links (computer database).</Paragraph> <Paragraph position="6"> We have illustrated that by considering tile information in the lexicon as constraints, the linguistic difference between pure compositionality and cooccurrent information becomes a virtual difference for ttG.</Paragraph> </Section> </Section> class="xml-element"></Paper>