File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1616_intro.xml

Size: 2,110 bytes

Last Modified: 2025-10-06 14:02:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1616">
  <Title>Stemming the Qur'an</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Stemming has been widely used in several fields of natural language processing such as data mining, information retrieval, and multivariate analysis. Some applications of multivariate analysis of text involve the identification of lexical occurrences of word stems in a text. Such lexical analysis, in which the frequency of word occurrences is significant, cannot be done without some form of stemming.</Paragraph>
    <Paragraph position="1"> In morphology, variants of words which have similar semantic interpretations are considered to belong to the same stem and to be equivalent for purposes of text analysis and information retrieval. For this reason, a number of stemming algorithms have been developed in an attempt to reduce such morphological variants of words to their common stem.</Paragraph>
    <Paragraph position="2"> Various stemming algorithms for a number of languages have been proposed. The structure of these stemmers range from the simplest technique, such as removing suffixes, to a more complicated design which uses the morphological structure of words to derive a stem.</Paragraph>
    <Paragraph position="3"> In case of Arabic, several stemming algorithms have been developed. The major inadequacy of existing systems to stem the Qur'an results from the fact that most of them deal with Modern Standard Arabic as their input text; the language of the Qur'an is Classical Arabic. Orthographic variations and the use of diacritics and glyphs in the representation of the language of Classical Arabic increase the difficulty of stemming. In many respects, the Qur'an, with its unique lexicon and orthography requires dedicated attention.</Paragraph>
    <Paragraph position="4"> Therefore, I have developed a new light stemmer that uses the Qur'an in western transliteration to improve the effectiveness of the stemming of the text.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML