File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1082_concl.xml

Size: 2,147 bytes

Last Modified: 2025-10-06 13:55:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1082">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Word Alignment in English-Hindi Parallel Corpus Using Recency-Vector Approach: Some Studies</Title>
  <Section position="6" start_page="653" end_page="653" type="concl">
    <SectionTitle>
4 Conclusion
</SectionTitle>
    <Paragraph position="0"> This paper focuses on developing suitable word alignment schemes in parallel texts where the size of the corpus is not large. In languages, where rich linguistic tools are yet to be developed, or available freely, such an algorithm may prove to be beneficial for various NLP activities, such as, vocabulary extraction, alignment etc. This work considers word alignment in English - Hindi parallel corpus, where the size of the corpus used is about 18 thousand words for English and 20 thousand words for Hindi.</Paragraph>
    <Paragraph position="1"> The paucity of the resources suggests that statistical techniques are not suitable for the task. On the other hand, Lexicon-based approaches are highly resource-dependent. As a consequence, they could not be considered as suitable schemes.</Paragraph>
    <Paragraph position="2"> Recency vector based approaches provide a suitable alternative. Variations of this approach have already been used for word alignment in parallel texts involving European languages and Chinese, Japanese. However, our initial experiments with these algorithms on English-Hindi did not produce good results. In order to improve their performances certain measures have been taken. The proposed algorithm improved the performance manifold. This approach can be used for word alignment in language pairs like English-Hindi.</Paragraph>
    <Paragraph position="3"> Since the available corpus size is rather small we could not compare the results obtained with various other word alignment algorithms proposed in the literature. In particular we like to compare the proposed scheme with the famous IBM models. We hope that with a much larger corpus size we shall be able to make the necessary comparisons in near future.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML