File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2304_intro.xml

Size: 4,796 bytes

Last Modified: 2025-10-06 14:04:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2304">
  <Title>A Robust and Efficient Parser for Non-Canonical Inputs</Title>
  <Section position="2" start_page="0" end_page="19" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Parsing spoken languages and non canonical inputs remains a challenge for NLP systems. Many different solutions have been experimented, depending on the kind of material to be parsed or the kind of application: in some cases, superficial information such as bracketing is enough whereas in other situations, the system needs more details. The question of robustness, and more generally the parsing strategy, is addressed differently according to these parameters. Classi- null cally, three families of solutions are proposed: - Reducing the complexity of the output - Controlling the parsing strategy - Training and adapting the system to the type of input  In the first case, the idea consists in building structures with little information, even under-specified (which means the possibility of building partial structures). We find in this family the different shallow parsing techniques (see for example [Hindle83], [Abney96]). Unsurprisingly, the use of statistical methods is very frequent and efficient in this kind of application (see [Tjong Kim Sang00] for some results of a comparison between different shallow parsers). Generally, such parsers (being them symbolic or not) are deterministic and build non recursive units. In some cases, they can also determine relations between units.</Paragraph>
    <Paragraph position="1"> The second family contains many different techniques. The goal is to control a given parsing strategy by means of different mechanisms.</Paragraph>
    <Paragraph position="2"> Among them, we can underline three proposals:  - Implementing recovering mechanisms, triggering specific treatments in case of error (cf. [Boulier05]) - Controlling the parsing process by means of probabilistic information (cf.</Paragraph>
    <Paragraph position="3"> [Johnson98]) - Controlling deep parsers by means of  shallow parsing techniques (cf. [Crysmann02], [UszKoreit02], [Marimon02]) The last kind of control mechanism consists in adapting the system to the material to be parsed. This can be done in different ways: - Adding specific information in order to reduce the search space of the parsing process. This kind of information can appear under the form of ad hoc rules or information depending on the kind of data to be treated.</Paragraph>
    <Paragraph position="4"> - Adapting the resources (lexicon, grammars) to the linguistic material These different strategies offer several advantages and some of them can be used together. Their interest is that the related questions of robustness and efficiency are both taken into account. However, they do not constitute a generic  solution in the sense that something has to be modified either in the goal, in the formalism or in the process. In other words, they constitute an additional mechanism to be plugged into a given framework.</Paragraph>
    <Paragraph position="5"> We propose in this paper a parsing technique relying on a constraint-based framework being both efficient and robust without need to modify the underlying formalism or the process. The notion of constraints is used in many different ways in NLP systems. They can be a very basic filtering process as proposed by Constraint Grammars (see [Karlsson90]) or can be part to an actual theory as with HPSG (see [Sag03]), the Optimality Theory (see [Prince03]) or Constraint Dependency Grammars (cf. [Maruyama90]). Our approach is very different: all information is represented by means of constraints; they do not stipulate requirements on the syntactic structure (as in the above cited approaches) but represent directly syntactic knowledge. In this approach, robustness is intrinsic to the formalism in the sense that what is built is not a structure of the input (for example under the form of a tree) but a description of its properties. The parsing mechanism can then be seen as a satisfaction process instead of a derivational one. Moreover, it becomes possible, whatever the form of the input, to give its characterization. The technique relies on constraint relaxation and is controlled by means of a simple left-corner strategy. One of its interests is that, on top of its efficiency, the same resources and the same parsing technique is used whatever the input.</Paragraph>
    <Paragraph position="6"> After a presentation of the formalism and the parsing scheme, we describe an evaluation of the system for the treatment of spoken language.</Paragraph>
    <Paragraph position="7"> This evaluation has been done for French during the evaluation campaign Easy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML