File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/c96-1037_concl.xml
Size: 3,480 bytes
Last Modified: 2025-10-06 13:57:32
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1037"> <Title>j schang@cs.nthu.edu.tw</Title> <Section position="6" start_page="213" end_page="214" type="concl"> <SectionTitle> 6. Concluding remarks </SectionTitle> <Paragraph position="0"> This paper has presented an algorithm capable of identit~,ing words and their translation in a bilingual corpus. It is effective for specific linguistic reasons.</Paragraph> <Paragraph position="1"> The significant majority of words in bilingual sentences have diverging translation; those translations are not often tbund in a bilingual dictionaly. However, those deviation are largely limited within the classes defined by thesauri.</Paragraph> <Paragraph position="2"> Therefore, by using a class-based approach, the problem's complexity can reduced in the sense that less number of candidates need to be considered with a greater likelihood of finding the correct translation. In general, a slight amotmt of precision can apparently be expended to gain a substantial increase in applicability. Our results suggest that mixed strategies can yield a broad coverage and high precision word alignment and sense tagging system which can produce richer information fbr MT and NLP tasks such as word sense disambiguation. The word sense information can provide a certain degree of generality which is lacking in most statistical procedures. The algorithm's performance discussed here can definitely be improved by enhancing the various components of the algorithm, e.g., morphological analyses, bilingual dictionary, monolingual thesauri, and rule acquisition. However, this work has presented a workable core for processing bilingual corpus. The proposed algorithm can produce effective word-alignment results with 1. Read a pair of English-Chinese sentences.</Paragraph> <Paragraph position="3"> 2. Two dummies are replace to the left of the first and to the right of the last word of the source sentence. Similar two dummies are added to the target sentence. The left dummy in the source and target sentences align with each other. Similarly, the right dummies align with each other. \]'his establishes anchor points for calculating the relative distortion score.</Paragraph> <Paragraph position="4"> 3. Perfbrm the part-of-speech tagging and analysis tbr sentences in both languages.</Paragraph> <Paragraph position="5"> 4. Lookup the words in LEXICON and C1LIN to determine the classes consistent with the part-of-speech analyses.</Paragraph> <Paragraph position="6"> 5. Follow the procedure in Section 2.3 to calculate a composite probability tbr each connection candidate according to fan-out, applicability, specificity of alignment rules, relative distortion, and dictionary evidence.</Paragraph> <Paragraph position="7"> sense tagging which can provide a basis for such N I,P tasks as word sense disambiguation (Chen and Chang 1994) and PP attachment (Chen and Chang 199&quot;5). While this paper has specifically addressed only English-Chinese corpora, the linguistic issues that motivated the algorithm are quite general and are to a great degree language independent. If such a case is true, the algorithm presented here should be adaptable to other language pairs. The prospects tbr Japanese, in particular, seem highly promising There arc some work on alignment of lPSnglish-Japanese texts using both dictionaries and statistics (Utsuro, lkeda, Yamane, Matsumoto and Nagao 1994).</Paragraph> </Section> class="xml-element"></Paper>