File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1073_concl.xml

Size: 1,923 bytes

Last Modified: 2025-10-06 13:55:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1073">
  <Title>Maximum Entropy Based Restoration of Arabic Diacritics</Title>
  <Section position="10" start_page="582" end_page="583" type="concl">
    <SectionTitle>
8 Conclusion
</SectionTitle>
    <Paragraph position="0"> We presented in this paper a statistical model for Arabic diacritic restoration. The approach we propose is based on the Maximum entropy framework, which gives the system the ability to integrate different sources of knowledge. Our model has the advantage of successfully combining diverse sources of information ranging from lexical, segment-based and POS features. Both POS and segment-based features are generated by separate statistical systems - not extracted manually - in order to simulate real world applications. The segment-based features are extracted from a statistical morphological analysis system using WFST approach and the POS features are generated by a parsing model  critic restoration and segmentation using FST and Kneser-Ney LM. Columns marked with &amp;quot;True shadda&amp;quot; represent results on documents containing the original consonant doubling &amp;quot;shadda&amp;quot; while columns marked with &amp;quot;Predicted shadda&amp;quot; represent results where the system restored all diacritics including shadda.</Paragraph>
    <Paragraph position="1"> that also uses Maximum entropy framework. Evaluation results show that combining these sources of information lead to state-of-the-art performance.</Paragraph>
    <Paragraph position="2"> As future work, we plan to incorporate Buckwalter morphological analyzer information to extract new features that reduce the search space. One idea will be to reduce the search to the number of hypotheses, if any, proposed by the morphological analyzer. We also plan to investigate additional conjunction features to improve the accuracy of the model.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML