File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-2100_concl.xml

Size: 1,674 bytes

Last Modified: 2025-10-06 13:55:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2100">
  <Title>Learning and Natural Language Processing: A</Title>
  <Section position="9" start_page="784" end_page="784" type="concl">
    <SectionTitle>
7 Conclusions &amp; Future Work
</SectionTitle>
    <Paragraph position="0"> We have described in this paper a POS tagger for Hindi which can overcome the handicap of annotated corpora scarcity by exploiting the rich morphology of the language and the relatively rigid word-order within a VG. The whole work was driven by hunting down the factors that lower the accuracy of Verbs and weeding them out. A detailed study of accuracy distribution across the POS tags pointed out the cases calling for elaborate disambiguation rules. A major strength of the work is the learning of disambiguation rules, which otherwise would have been hand-coded, thus demanding exhaustive analysis of language phenomena. Attaining an accuracy of close to 94%, from a corpora of just about 15,562 words lends credence to the belief that morphological richness can offset resource scarcity . The work could lead such efforts of POS tag building for all those languages which have rich morphology, but cannot afford to invest a lot in creating large annotated corpora.</Paragraph>
    <Paragraph position="1"> Several interesting future directions suggest themselves. It will be worthwhile to investigate a statistical approach like Conditional Random Fields in which the feature functions would be constructed from morphology. The next logical step from the POS tagger is a chunker for Hindi. In fact a start on this has already been made through VG identi cation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML