File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2401_concl.xml

Size: 3,363 bytes

Last Modified: 2025-10-06 13:55:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2401">
  <Title>Named Entities Translation Based on Comparable Corpora</Title>
  <Section position="7" start_page="7" end_page="7" type="concl">
    <SectionTitle>
6 Conclusions and Further Works
</SectionTitle>
    <Paragraph position="0"> We have presented an approach for the design and development of an entity translation system from Basque to Spanish and the different techniques and resources we have used for this work.</Paragraph>
    <Paragraph position="1"> On the one hand, we have combined bilingual dictionaries with a phonologic/spelling grammar for the entity elements' translation; on the other hand, we have applied a language-independent grammar based on edition distance. Both combinations perform well, and although the linguistic tool obtains better results, the language-independent grammar may be very useful for other experiments carried out with language-pairs others than Basque and Spanish.</Paragraph>
    <Paragraph position="2"> Because of the differences of the syntactical structures of Basque and Spanish, it is necessary to arrange the entity elements for the correct translation of whole NEs; in particular, for those entities with more than one element. For that purpose, we have used two different techniques: probabilistic rules and a simple combination method (all candidates combined with all).</Paragraph>
    <Paragraph position="3"> Finally, we have applied different resources and techniques for the selection of the best candidates. On the one hand, we have tried searching the web (Google and Wikipedia); on the other hand, we have used a comparable Basque-Spanish corpus.</Paragraph>
    <Paragraph position="4"> We have verified, that although Google is a bigger data-set, the significance of the information for NE translation task is similar to the information given by Wikipedia.</Paragraph>
    <Paragraph position="5"> All the experiments carried out with comparable corpus have performed very well, and the best results have been obtained when combining it with Wikipedia. So developing a NE translation system based on comparable information have proved to be a good way to build a robust system.</Paragraph>
    <Paragraph position="6"> However, some modules can be improved.</Paragraph>
    <Paragraph position="7"> Firstly, the methods to rank and select candidates are very simple, so if we use more complex ones, the number of candidates for the following modules would decrease considerably, and so, the system's final selection would be easier and more precise. null Regarding to the use of the web, actually we have only used Google and Wikipedia. Searches in Wikipedia are more precise than the ones made in Google and so the information they offer can be considered complementary. Furthermore, we can obtain very valuable information for other entity processes. For instance, since Wikipedia is a topic-classified encyclopedia, when you do an entity search, you can get information about the kind of documents in which the entity can occur; in other words, which is the most usual topic for it to occur in. Besides, that classification category can be very useful for entity disambiguation too.</Paragraph>
    <Paragraph position="8"> With all the improvements presented so far, we hope to get a stronger entity name translation system in the future.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML