File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/w01-1413_evalu.xml
Size: 2,514 bytes
Last Modified: 2025-10-06 13:58:48
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-1413"> <Title>Using the Web as a Bilingual Dictionary</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Discussion and Related Works </SectionTitle> <Paragraph position="0"> Previous studies on bilingual text mainly focused on either parallel texts, non-parallel texts, or comparable texts, in which a pair of texts are written in two different languages (Veronis, 2000). However, except for governmental documents from Canada (English/French) and Hong Kong (Chinese/English), bilingual texts are usually subject to such limitations as licensing conditions, usage fees, domains, language pairs, etc. One approach that partially overcomes these limitations is to collect parallel texts from the web (Nie et al., 1999; Resnik, 1999).</Paragraph> <Paragraph position="1"> To provide better coverage with fewer restrictions, we focused on partially bilingual text. Considering the enormous volume of such texts and the variety of fields covered, we believe they are the best resource to mine for MT-related applications that involve English and Asian languages. The current system for extracting the translation of a given term is more similar to the information extraction system for term descriptions (Fujii and Ishikawa, 2000) than any other machine translation systems. In order to collect descriptions for technical term X, such as 'data mining', (Fujii and Ishikawa, 2000) collected phrases like &quot;X is Y&quot; and &quot;X is defined as Y&quot;, from the web. As our system used a scoring function based solely on byte distance, introducing this kind of pattern matching might improve its accuracy.</Paragraph> <Paragraph position="2"> Practically speaking, the factor that most influences the accuracy of the term translation extractor is the set of documents returned from the search engine. In order to evaluate the system, we used a test set that guarantees to contain at least one document with both the Japanese term and its English translation; this is a rather optimistic assumption. null Since the search engine is an uncontrollable factor, one possible solution is to make your own search engine. We are very interested in combining such ideas as focused crawling (Chakrabarti et al., 1999) and domain-specific Internet portals (McCallum et al., 2000) with the proposed term translation extractor to develop a domain-specific on-line dictionary service.</Paragraph> </Section> class="xml-element"></Paper>