File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/n04-4038_evalu.xml

Size: 4,810 bytes

Last Modified: 2025-10-06 13:59:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4038">
  <Title>Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunksa0</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Data, Setup and Evaluation Metrics
</SectionTitle>
      <Paragraph position="0"> The Arabic TreeBank consists of 4519 sentences.</Paragraph>
      <Paragraph position="1"> The development set, training set and test set are the same for all the experiments. The sentences are randomly distributed with 119 sentences in the development set, 400 sentences in the test set and 4000 sentences in the training set. The data is transliterated in the Arabic TreeBank into Latin based ASCII characters using the Buckwalter transliteration scheme.6 We used the non vocalized version of the treebank for all the experiments.</Paragraph>
      <Paragraph position="2"> All the data is derived from the parsed trees in the treebank. We use a standard SVM with a polynomial kernel, of degree 2 and C=1.7 Standard metrics of Accuracy (Acc), Precision (Prec), Recall (Rec), and the F-measure, a1 a2a5a4a3a6 , on the test set are utilized.8</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Tokenization
</SectionTitle>
      <Paragraph position="0"> Results: Table 2 presents the results obtained using the current SVM based approach, SVM-TOK, compared against two rule-based baseline approaches, RULE and RULE+DICT. RULE marks a prefix if a word starts with one of five proclitic letters described in Section 4.1. A suffix is marked if a word ends with any of the possessive pronouns, enclitics, mentioned above in Section 4.1. A small set of 17 function words that start with the proclitic letters is explicitly excluded.</Paragraph>
      <Paragraph position="1"> RULE+DICT only applies the tokenization rules in RULE if the token does not occur in a dictionary. The dictionary used comprises the 47,261 unique non vocalized word entries in the first column of Buckwalter's dictStem, freely available with the AraMorph distribution. In some cases, dictionary entries retain inflectional morphology and clitics.</Paragraph>
      <Paragraph position="2">  the baseline RULE+DICT. While RULE+DICT could certainly be improved with larger dictionaries, however, the largest dictionary will still have coverage problems, therefore, there is a role for a data-driven approach such as SVM-TOK. An analysis of the confusion matrix for SVM-TOK shows that the most confusion occurs with the PREF2 class. This is hardly surprising since PREF2 is an infix category, and thus has two ambiguous boundaries.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Part of Speech Tagging
</SectionTitle>
      <Paragraph position="0"> Results: Table 3 shows the results obtained with the SVM based POS tagger, SVM-POS, and the results obtained with a simple baseline, BASELINE, where the most frequent POS tag associated with a token from the training set is assigned to it in the test set. If the token does not occur in the training data, the token is assigned the NN tag as a default tag.</Paragraph>
      <Paragraph position="1">  BASELINE on the task of POS tagging of Arabic text Discussion: The performance of SVM-POSis better than the baseline BASELINE. 50% of the errors encountered result from confusing nouns, NN, with adjectives, JJ, or vice versa. This is to be expected since these two categories are confusable in Arabic leading to inconsistencies in the training data. For example, the word for United in United States of America or United Nations is randomly tagged as a noun, or an adjective in the training data. We applied a similar SVM based POS tagging system to English text using the English TreeBank. The size of the training and test data corresponded to those evaluated in the Arabic experiments. The English experiment resulted in an accuracy of 94.97%, which is comparable to  Arabic text Discussion: The overall performance of SVM-BP is a1 a2a5a4a3a6 score of 92.08. These results are interesting in light of state-of-the-art for English BP chunking performance which is at an a1 a2a5a4a3a6 score of 93.48, against a baseline of 77.7 in CoNLL 2000 shared task (Tjong et al., 2000).</Paragraph>
      <Paragraph position="2"> It is worth noting that SVM-BP trained on the English TreeBank, with a comparable training and test size data to those of the Arabic experiment, yields an a1 a2a5a4 a6 score of 93.05. The best results obtained are for VP and PP, yielding a1 a2a5a4 a6 scores of 97.6 and 98.4, respectively.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML