File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/a92-1032_metho.xml

Size: 5,333 bytes

Last Modified: 2025-10-06 14:12:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="A92-1032">
  <Title>Robust parsing of natural language descriptions</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The sublanguage
</SectionTitle>
    <Paragraph position="0"> Examples of the sublanguage we are considering are reported below: Produzione conto terzi olio di oliva.</Paragraph>
    <Paragraph position="1"> *production (for) third parties (of) olive oil. 1 Mulino. Produzione di foraggio cereali mais. *Mill. Production of fodder cereals com.</Paragraph>
    <Paragraph position="2"> La produzione, l'importazione, l'esportazione, il commercio all'ingrosso e al dettaglio di prodotti tessili in fibre naturali e sintetiche e articoli per la casa in genere.</Paragraph>
    <Paragraph position="3"> 1 The translation provided is rather literal, but reflects the ungrammatical, telegraphic style of the sublanguage.</Paragraph>
    <Paragraph position="4"> *Production, import, export, wholesale and retail trade of textile products of natural and synthetic fibres and articles for the home.</Paragraph>
    <Paragraph position="5"> As we can see, the utterances are mostly formed by complex noun phrases rather than complete sentences, in which the number of constituents and the nesting degree may be rather high. Ellipses of prepositions and/or coordinating conjunctions are also frequent; the use of adjectives is relatively poor, and, when used, they generally bind the technical component of the information; the use of locutions with adjectival function is also common. Finally, the usage of verbs is quite scanty, and restricted to conversational parts; lexical mistakes are also recurrent. In their basic version, these noun phrases are formed either by a single noun or by a group of the form NI-P-N2 (noun, preposition, noun) where N1 is associated with an economic activity, and N2 with the object of this activity linked by a preposition, which is often absent.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="229" type="metho">
    <SectionTitle>
3 Syntactic analysis
</SectionTitle>
    <Paragraph position="0"> The core of our approach is the extraction of elementary syntactic relationships (ESR) from a possibly ill-formed sentence. ESR's are described by a definite clause grammar (DCG), a fragment of which appears in Fig.1. Due to the presence of the special symbol skip in the right side of the rules, this grammar turns to be a special, very simple case of a discontinuous grammar (Dahl, 1989).</Paragraph>
    <Paragraph position="2"> Prolog expansion of the skip symbol: skip(S0,S2):- appond(S1 ,$2,S0).</Paragraph>
    <Paragraph position="3"> Fig 1 A part of our DCG grammar  The introduction of the skip allows us to ignore unknown or ill-formed words, and accounts in a very compact way for the positional variations of the structure elements.</Paragraph>
  </Section>
  <Section position="4" start_page="229" end_page="229" type="metho">
    <SectionTitle>
4 The system
</SectionTitle>
    <Paragraph position="0"> Input text first passes through a morphological analyzer similar to the one used in (Antonacci et al., 1988). Before undergoing syntactical analysis, the relevant phrases occurring within descriptions are isolated by taking into account the coarse semantic trait (activities vs. activity objects) of the nouns involved. After a single description has been analyzed by our grammar, ESR's undergo an intermediate processing in order to reduce morphological and syntactic ambiguity and to take into account ellipses and conjunctions. These phenomena are handled by employing preference schemes improved by semantic control (Hobbs and Bear, 1990).</Paragraph>
    <Paragraph position="1"> Finally, ESR's are converted into conceptual relationships by using a many-to-many mapping between syntactic and conceptual relations similar to the one used in (Antonacci et al., 1988); conceptual relationships are subsequently validated by using a semantic knowledge base, and finally merged into a semantic tree. The knowledge representation technique adopted is inspired by the ITL system (Guarino, 1991), where semantic validation reduces to order-sorted unification.</Paragraph>
  </Section>
  <Section position="5" start_page="229" end_page="229" type="metho">
    <SectionTitle>
5 Preliminary results
</SectionTitle>
    <Paragraph position="0"> A prototype of the system has been implemented on Macintosh workstation in LPA MacProlog. It has been tested on a significative fragment of our corpus (about 2000 descriptions in the areas of agriculture and services).</Paragraph>
    <Paragraph position="1"> The output of the system has been manually tested by a linguist, for correctness and completeness. Errors turn out to be independent of the syntactic algorithm, and are mainly due to (i) lack of semantic knowledge; (ii) lack of lexical knowledge (unknown or ill-formed words); (iii) difficulties with disambiguation and phrase separation.</Paragraph>
    <Paragraph position="2"> The query system is now under development.</Paragraph>
    <Paragraph position="3"> As far as system efficiency is concerned, the time complexity of the whole analysis process seems to be almost quadratic with respect to the length of the sentence, while the mean understanding time is well below 10 seconds on a Macintosh Ilfx.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML