File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1117_concl.xml
Size: 2,040 bytes
Last Modified: 2025-10-06 13:53:58
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1117"> <Title>Cognate Mapping -- A Heuristic Strategy for the Semi-Supervised Acquisition of a Spanish Lexicon from a Portuguese Seed Lexicon</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Conclusions and Further Work </SectionTitle> <Paragraph position="0"> In a first round of experiments, we have shown that a considerable amount of Portuguese subwords from the medical domain could be mapped to Spanish cognate stems applying simple string transformation rules. We then used the local context in language-specific corpora in order to validate these cognate pairs. However, our results also reveal the limitations of such an approach, at least for infrequent stems, due to the small corpus size. Accordingly, for future experiments one has to provide much larger text corpora, paticularly in the next steps of our experiments, in which the Spanish lexicon will be completed by subwords which cannot be generated from their Portuguese translations. Here, we will acquire new Spanish lexeme candidates by automated stemming, and retrieve their Portuguese translations by exploring their local context. This requires, however, huge corpora, exceeding the current ones by several orders of magnitude. Additionally, their documents will have to be related using clustering techniques. The usability of the resulting, mainly automatically generated Spanish extension of the MORPHOSAURUS lexicon for the purpose of cross-language text retrieval can then be evaluated in real CLIR experiments as previously done for English, German and Portuguese (cf. Hahn et al.</Paragraph> <Paragraph position="1"> (2004)).</Paragraph> <Paragraph position="2"> Acknowledgements.</Paragraph> <Paragraph position="3"> This work was partly supported by the German Research Foundation (DFG), grant KL 640/5-1, and by the Brazilian National Council for Scientific Research and Development (CNPq), grants 551277/01-7 and 550240/03-9.</Paragraph> </Section> class="xml-element"></Paper>