File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-1086_evalu.xml

Size: 6,315 bytes

Last Modified: 2025-10-06 13:59:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1086">
  <Title>MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects</Title>
  <Section position="12" start_page="684" end_page="686" type="evalu">
    <SectionTitle>
8 Evaluation
</SectionTitle>
    <Paragraph position="0"> The goal of the evaluation is primarily to investigate how reduced lexical resources affect the performance of morphological analysis, as we will not have complete lexicons for the dialects. A second goal is to validate MAGEAD in analysis mode by comparing it to the Buckwalter analyzer (Buckwalter, 2004) when MAGEAD has a full lexicon at its disposal. Because of the lack of resources for the dialects, we use primarily MSA for both goals, but we also discuss a more modest evaluation on a  Levantine corpus.</Paragraph>
    <Paragraph position="1"> We first discuss the different sources of lexical knowledge, and then present our evaluation metrics. We then separately evaluate MSA and Levantine morphological analysis.</Paragraph>
    <Section position="1" start_page="685" end_page="685" type="sub_section">
      <SectionTitle>
8.1 Lexical Knowledge Sources
</SectionTitle>
      <Paragraph position="0"> We evaluate the following sources of lexical knowledge on what roots, i.e, combinations of radicals, are possible. Except for all, these are lists of attested verbal roots. It is not a trivial task to compile a list of verbal roots for MSA, and we compare different sources for these lists.</Paragraph>
      <Paragraph position="1">  a0 all: All radical combinations are allowed, we use no lexical knowledge at all.</Paragraph>
      <Paragraph position="2"> a0 dar: List of roots extracted by (Darwish, 2003) from Lisan Al'arab, a large Arabic dictionary. null a0 bwl: A list of roots appearing as comments in the Buckwalter lexicon (Buckwalter, 2004).</Paragraph>
      <Paragraph position="3"> a0 lex: Roots extracted by us from the list of lex null eme citation forms in the Buckwalter lexicon using surfacy heuristics for quick-and-dirty morphological analysis.</Paragraph>
      <Paragraph position="4"> a0 mbc: This is the same list as lex, except that we pair each root with the MBCs with which it was seen in the Buckwalter lexicon (recall that for us, a lexeme is a root with an MBC). Note that mbc represents a full lexicon, though it was converted automatically from the Buckwalter lexicon and it has not been hand-checked.</Paragraph>
    </Section>
    <Section position="2" start_page="685" end_page="685" type="sub_section">
      <SectionTitle>
8.2 Test Corpora and Metrics
</SectionTitle>
      <Paragraph position="0"> For development and testing purposes, we use MSA and Levantine. For MSA, we use the Penn Arabic Treebank (ATB) (Maamouri et al., 2004). The morphological annotation we use is the &amp;quot;before-file&amp;quot;, which lists the untokenized words (as they appear in the Arabic original text) and all possible analyses according to the Buckwalter analyzer (Buckwalter, 2004). The analysis which is correct for the given token in its context is marked; sometimes, it is also hand-corrected (or added by hand), while the contextually incorrect analyses are never hand-corrected. For development, we use ATB1 section 20000715, and for testing, Sections 20001015 and 20001115 (13,885 distinct verbal types).</Paragraph>
      <Paragraph position="1"> For Levantine, we use a similarly annotated corpus, the Levantine Arabic Treebank (LATB) from the Linguistic Data Consortium. However, there are three major differences: the text is transcribed speech, the corpus is much smaller, and, since, there is no morphological analyzer for Levantine currently, the before-files are the result of running the MSA Buckwalter analyzer on the Levantine token, with many of the analyses incorrect, and only the analysis chosen for the token in context usually hand-corrected. We use LATB files fsa 16* for development, and for testing, files fsa 17*, fsa 18* (14 conversations, 3,175 distinct verbal types).</Paragraph>
      <Paragraph position="2"> We evaluate using three different metrics. The token-based metrics are the corresponding type-based metric weighted by the number of occurrences of the type in the test corpus.</Paragraph>
      <Paragraph position="3"> a0 Recall (TyR for type recall, ToR for token recall): what proportion of the analyses in the gold standard does MAGEAD get? a0 Precision (TyP for type precision, ToP for token precision): what proportion of the analyses that MAGEAD gets are also in the gold standard? a0 Context token recall (CToR): how often does MAGEAD get the contextually correct analysis for that token? We do not give context precision figures, as MAGEAD does not determine the contextually correct analysis (this is a tagging problem). Rather, we interpret the context recall figures as a measure of how often MAGEAD gets the most important of the analyses (i.e., the correct one) for each token.  Analyzer on MSA for different root restrictions, and for different metrics; &amp;quot;Roots&amp;quot; indicates the number of possible roots for that restriction; all numbers are percent figures</Paragraph>
    </Section>
    <Section position="3" start_page="685" end_page="686" type="sub_section">
      <SectionTitle>
8.3 Quantitative Analysis: MSA
</SectionTitle>
      <Paragraph position="0"> The results are summarized in Figure 1. We see that we get a (rough) recall-precision trade-off, both for types and for tokens: the more restrictive we are, the higher our precision, but recall declines. For all, we get excellent recall, and an overgeneration by a factor of only 2. This performance, assuming it is roughly indicative of dialect performance, allows us to conclude that we can use MAGEAD as a dialect morphological analyzer without a lexicon.</Paragraph>
      <Paragraph position="1"> For the root lists, we see that precision is al- null ways higher than for all, as many false analyses are eliminated. At the same time, some correct analyses are also eliminated. Furthermore, bwl under performs somewhat. The change from lex to mbc is interesting, as mbc is a true lexicon (since it does not only state which roots are possible, but also what their MBC is). Precision increases substantially, but not as much as we had hoped. We investigate the errors of mbc in the next subsection in more detail.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML