File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1604_intro.xml

Size: 3,169 bytes

Last Modified: 2025-10-06 14:02:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1604">
  <Title>The Architecture of a Standard Arabic lexical database: some figures, ratios and categories from the DIINAR.1 source program Ramzi ABBES</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In the present state of the art in the development of software and language resources in Arabic, there is an urgent need for evaluation and validation criteria based on solid analytic grounds: there exists nowadays a subsequent number of Arabic lexical databases, and more are under completion.</Paragraph>
    <Paragraph position="1"> Existing lexical dB-s are not always, for the time being, available as such to researchers and/or developers, because they are usually embedded in software (such as a morphological analyser or a parser), and are still very difficult to make use of independently. It is to be expected, though, that the issue of availability will be overcome in a reasonably near future, and that a number of Arabic lexical databases will be found on the market, or on catalogues such as, in Europe, that of ELRA1, and in the USA, that of LDC2. The on-going European project NEMLAR is presently working on the availability of language resources including lexical databases3. As a result, the crucial question of the quality and consistency of these databases should be met as soon as possible.</Paragraph>
    <Paragraph position="2">  One of the criteria for the evaluation and validation of a lexical database for Arabic is both quantitative (how many?) and qualitative (what of, precisely?). In this paper, which refers to previous work on the processing of Arabic and the related lexical resources4, we will try and give evidence on the structure of a lexical database, founded on an analysis of the DIINAR.1 database5. Quantitative results are only interesting if they can be interpreted in such a way as to yield information on the actual structure and categories of the lexicon of the language under consideration. We will endeavour to show that a quantitative and qualitative analysis of the lexical categories incorporated in DIINAR.1 can be interpreted with this respect. Moreover, the investigation leads to proposing a more consistent organisation of lexical information and relations, which should be included in future versions of DIINAR.</Paragraph>
    <Paragraph position="3"> 2 The type of lexical dB required by the automatic analysis of Arabic What are the fundamental requirements of a lexical database in Arabic? The first challenge to be met upon endeavouring to build language resources in Arabic is that of the structure of the writing system of the language (Dichy, 1990), the two main features of which are: non diacriticized script in standard texts (SS 2.1) and the structure of the word-form (SS 2.2). The combined effect of these features entails the need for a lexical data-base that includes a subsequent number of grammar-lexis relations (SS 2.3). Such a dB is to be considered as a sine qua non condition of high-level and elaborate Arabic NLP.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML