File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-2035_concl.xml
Size: 1,727 bytes
Last Modified: 2025-10-06 13:55:25
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2035"> <Title>Multilingual Lexical Database Generation from parallel texts in 20 European languages with endogenous resources</Title> <Section position="10" start_page="274" end_page="275" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> We showed that it is possible to contribute to the processing of languages for which few linguistic resources are available. We propose a solution to the spotting of multi-grained translation from parallel corpora. The results are surprisingly good and encourage us to improve the method, in order to reach a semi-automatic construction of a multilingual lexical database.</Paragraph> <Paragraph position="1"> The endogenous approach allows to handle inflectional variations. We also show the importance of using the proper knowledge at the proper level (sentence grain, document grain and corpus grain). An improvement would be to calculate inflectional variations at corpus grain rather than at document grain. Therefore, it is possible to plug any external and exogenous component in our architecture to improve the overall quality.</Paragraph> <Paragraph position="2"> The size of this &quot;massive compilation&quot; (we work with a 20 languages corpora) implies the design of specific strategies in order to handle it properly and quite efficiently. Special efforts have been done in order to manage the AC Corpus from our document management platform, WIMS.</Paragraph> <Paragraph position="3"> The next improvement is to precisely evaluate the system. Another perspective is to integrate an endogenous coreference solver (Giguet & Lucas, 2004).</Paragraph> </Section> class="xml-element"></Paper>