File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/88/c88-1075_intro.xml

Size: 4,388 bytes

Last Modified: 2025-10-06 14:04:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-1075">
  <Title>Parsing Incomplete Sentences</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> It is often necessary in practical situations to attempt parsing an incorrect or incomplete input. This may take many forms: e.g.</Paragraph>
    <Paragraph position="1"> missing or spurious words, misspelled or misunderstood or otherwise unknown words \[28\], missing or unidentified word boundaries \[22,27\]. Specific techniques may be developed to deal with these situations according to the requirements of the application arcs (e.g. n~tural language processing, progrmrmfing language parsing, tea:i-time or off-line processing).</Paragraph>
    <Paragraph position="2"> The con~lext-fi.ee (CF) parsing of a sentence with unknown words hss been considered by other authors \[28\]. Very simply, an unknown word may be considered as a &amp;quot;special multi-part-of-speech word whose pa'ct of speech can be anything&amp;quot;. This multipsi't-of-speech word need not be introduced in the CF grammar of the lang0age, but only implicitly in the construction of its parser. Thi;~ works very well with Earley-like (chart) parsers that can simulate all possible parsing paths that could lead to a correct parse.</Paragraph>
    <Paragraph position="3"> In this paper, we deal with the more complex problem of parsing a ser*.tence for wtfich one or several subparts of unknown length are roissing. Again we can use a chart parser to try all possible parses on all possible inputs. However the fact that the length of th~ 1*fissing subsequence is unknown raises an additional difficulty. Many published chart parsers \[24,28,23,21\] are constructed ~,ith the assumption that tim CF grammar of the language ho~', no cyclic rules. Tlfis hypothesis is reasonable for the syntax ol natural (or programming) languages. However the resulting simplification of the pm'ser construction does not allow its extension to parsing sentences with unknown subsequenees of words.</Paragraph>
    <Paragraph position="4"> If the length (in words) of the missing subsequence were known, we could simply replace it with as many unknown words, a problem we know how to handle. When this length is not known, the tdgorithm has to simulate the parsing of an arbitrary numbe~: of words, and thus may have to go several tim~ tht'ough reduction by the same rules of the grammar 1 without ever' touchinl; the stack present before scanning the unknown t~equenee, aml without reading the input beyond that sequence.</Paragraph>
    <Paragraph position="5"> If we consider the unknown sequence as a special input word, wc are in a situation that is analogous to that created by cyclic grammars, i.~. g~amrnars where a nonterminal may derive onto IThis grammar oriented view of the computation of the autonmton is only meant as a support for intuition.</Paragraph>
    <Paragraph position="6"> itself without producing any terminal. This explains why techniques limited to non-cyclic grammars cannot deal with this problem.</Paragraph>
    <Paragraph position="7"> It may be noted that the problem is different fi'om that of parsing in a word lattice \[22,27\] since all possible path in the lattice have a known bounded length, even when the lattice contains separated unknown words, tIowever the technique presented here combines well with word lattice parsing.</Paragraph>
    <Paragraph position="8"> The ability to parse unknown subsequences may be ~seful to parse badly transmitted sentences, and sentences that arc interrupted (e.g. in a discussion) or otherwise left unfinished (e.g. because the rest may be inferred from the context). It may also be used in programming languages: for example the programming language SETL \[9\] allows some statements to be left unfinished in some contexts.</Paragraph>
    <Paragraph position="9"> The next section contains an introduction to all-paths parsing. In section 3 we give a more detailed account of our basic algorithm and point at the features that allow the handling of cyclic grammars. Section 4 contains the modifications that make this algorltlml capable of parsing incomplete sentences.</Paragraph>
    <Paragraph position="10"> The fifll algorithm is given in appendix C, while two examples are given in appendices A and B.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML