File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2403_intro.xml
Size: 6,159 bytes
Last Modified: 2025-10-06 14:04:04
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2403"> <Title>Automatic Extraction of Chinese Multiword Expressions with a Statistical Tool</Title> <Section position="2" start_page="0" end_page="17" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In real-life human communication, meaning is often conveyed by word groups, or meaning groups, rather than by single words. Very often, it is difficult to interpret human speech word by word. Consequently, for an MT system, it is important to identify and interpret accurate meaning of such word groups, or multiword expressions (MWE hereafter), in a source language and interpret them accurately in a target language.</Paragraph> <Paragraph position="1"> However, accurate identification and interpretation of MWEs still remains an unsolved problem in MT research.</Paragraph> <Paragraph position="2"> In this paper, we present our experiment on identifying Chinese MWEs using a statistical tool for MT purposes. Here, by multiword expressions, we refer to word groups whose constituent words have strong collocational relations and which can be translated in the target language into stable translation equivalents, either single words or MWEs, e.g. noun phrases, prepositional phrases etc. They may include technical terminology in specific domains as well as more general fixed expressions and idioms.</Paragraph> <Paragraph position="3"> Our observations found that existing Chinese-English MT systems cannot satisfactorily translate MWEs, although some may employ a machine-readable bilingual dictionary of idioms.</Paragraph> <Paragraph position="4"> Whereas highly compositional MWEs may pose a trivial challenge to human speakers for interpretation, they present a tough challenge for fully automatic MT systems to produce even remotely fluent translations. Therefore, in our context, we expand the concept of MWE to include those compositional ones which have relatively stable identifiable patterns of translations in the target language.</Paragraph> <Paragraph position="5"> By way of illustration of the challenge, we experimented with simple Chinese sentences containing some commonly-used MWEs in SYSTRAN (http://www.systransoft.com/) and Huan-Yu-Tong (HYT henceforth) of CCID (China Centre for Information Industry Development) (Sun, 2004). The former is one of the most efficient MT systems today, claiming to be &quot;the leading provider of the world's most scalable and modular translation architecture&quot;, while the latter is one of the most successful MT systems in China. Table 1 shows the result, where SL and TL denote source and target languages respectively.. As shown by the samples, such highly sophisticated MT tools still struggle to produce adequate English sentences..</Paragraph> <Section position="1" start_page="17" end_page="17" type="sub_section"> <SectionTitle> This afternoon </SectionTitle> <Paragraph position="0"> can practice a ball game? I hope not to be able.</Paragraph> <Paragraph position="1"> Can practise a ball game this afternoon? I hope can not.</Paragraph> <Paragraph position="2"> Ni nullKe nullNa Yang null, Rang null Men Ge nullGe null.</Paragraph> <Paragraph position="3"> You may not such do, let us pay respectively each. You cannot do like that, and let us make it Dutch. Kong null Mei Ban null Rang Ni Men null Tong Zhuo , Ni Men Jie nullJie null nullKai nullnull? Perhaps does not have the means to let you sit shares a table, did you mind sits separately? null Perhaps no way out(ly) let you sit with table, are you situated between not mind to separate to sit? Lai nullnullZhen null null nullPei .</Paragraph> </Section> <Section position="2" start_page="17" end_page="17" type="sub_section"> <SectionTitle> Selects the </SectionTitle> <Paragraph position="0"> milk coffee which ices.</Paragraph> <Paragraph position="1"> Ice breasts coffee take is selected. nullnull,nullnullPi null, null Lai null null Pei .</Paragraph> <Paragraph position="2"> Good, I want the beer, again comes to select the coffee.</Paragraph> <Paragraph position="3"> Alright, I want beer, and take the coffee of order- null Ignoring the eccentric English syntactic structures these tools produced, we focus on the translations of Chinese MWEs (see the italic characters in the Table 1) which have straightforward expression equivalents in English. For example, in this context, Xi Wang Bu Hui can be translated into &quot;hope not&quot;, Ge Fu Ge into &quot;go Dutch&quot;, Tong Zhuo into &quot;together&quot; or &quot;at the same table&quot;, Nai Ka Pei into &quot;white coffee&quot; or &quot;coffee with milk&quot;, Zai Lai Dian into &quot;want some more (in addition to something already ordered)&quot;. While these Chinese MWEs are highly compositional ones, when they are translated word by word, we see verbose and awkward translations (for correct translations, see the appendix).</Paragraph> <Paragraph position="4"> To solve such problems, we need algorithms and tools for identifying MWEs in the source language (Chinese in this case) and to accurately map them to their adequate translation equivalents in the target language (English in our case) that are appropriate for given contexts. In the previous examples, an MT tool should be able to identify the Chinese MWE Ge Fu Ge and either provide the literal translation of &quot;pay for each&quot; or map it to the more idomatic expressions of &quot;go Dutch&quot;.</Paragraph> <Paragraph position="5"> Obviously, it would involve a wide range of issues and techniques for a satisfactory solution to this problem. In this paper, we focus on the sub-issue of automatically recognising and extracting Chinese MWEs. Specifically, we test and evaluate a statistical tool for automatic MWE extraction in Chinese corpus data. As the results of our experiment demonstrate, the tool is capable of identifying many MWEs with little language-specific knowledge. Coupled with an MT system, such a tool could be useful for addressing the MWE issue.</Paragraph> </Section> </Section> class="xml-element"></Paper>