File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/j99-2004_intro.xml

Size: 4,032 bytes

Last Modified: 2025-10-06 14:06:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="J99-2004">
  <Title>Supertagging: An Approach to Almost Parsing</Title>
  <Section position="4" start_page="0" end_page="238" type="intro">
    <SectionTitle>
2.1 Finite-State-Grammar-based Parsers
</SectionTitle>
    <Paragraph position="0"> Finite-state-grammar-based approaches to parsing are exemplified by the parsing systems in Joshi, (1960), Abney (1990), Appelt et al. (1993), Roche (1993), Grishman (1995), Hobbs et al. (1997), Joshi and Hopely (1997), and Karttunen et al. (1997). These systems use grammars that are represented as cascaded finite-state regular expression recognizers. The regular expressions are usually hand-crafted. Each recognizer in the cascade provides a locally optimal output. The output of these systems is mostly in the form of noun groups and verb groups rather than constituent structure, often called a shallow parse. There are no clause-level attachments or modifier attachments in the shallow parse. These parsers always produce one output, since they use the longest-match heuristic to resolve cases of ambiguity when more than one regular expression 1 The use of descriptions for primitives to capture constraints locally has a precursor in AI. The Waltz algorithm (Waltz 1975) for labeling vertices of polygonal solid objects can be thought of in these terms. Waltz made the description of vertices more complex by including information about the incident edges, associated surfaces and other information. This increases the local ambiguity but the local constraints on the complex descriptions are strong enough to efficiently disambiguate the descriptions. Of course, Waltz did not use statistical information for disambiguation. See also Joshi (1998).</Paragraph>
    <Paragraph position="1">  Bangalore and Joshi Supertagging matches the input string at a given position. At present none of these systems use any statistical information to resolve ambiguity. The grammar itself can be partitioned into domain-independent and domain-specific regular expressions, which implies that porting to a new domain would involve rewriting the domain-dependent expressions.</Paragraph>
    <Paragraph position="2"> This approach has proved to be quite successful as a preprocessor in information extraction systems (Hobbs et al. 1995; Grishman 1995).</Paragraph>
    <Section position="1" start_page="238" end_page="238" type="sub_section">
      <SectionTitle>
2.2 Statistical Parsers
</SectionTitle>
      <Paragraph position="0"> Pioneered by the IBM natural language group (Fujisaki et al. 1989) and later pursued by, for example, Schabes, Roth, and Osborne (1993), Jelinek et al. (1994), Magerman (1995), Collins (1996), and Charniak (1997), this approach decouples the issue of well-formedness of an input string from the problem of assigning a structure to it. These systems attempt to assign some structure to every input string. The rules to assign a structure to an input are extracted automatically from hand-annotated parses of large corpora, which are then subjected to smoothing to obtain reasonable coverage of the language. The resultant set of rules are not linguistically transparent and are not easily modifiable. Lexical and structural ambiguity is resolved using probability information that is encoded in the rules. This allows the system to assign the most-likely structure to each input. The output of these systems consists of constituent analysis, the degree of detail of which is dependent on the detail of annotation present in the treebank that is used to train the system.</Paragraph>
      <Paragraph position="1"> There are also parsers that use probabilistic (weighting) information in conjunction with hand-crafted grammars, for example, Black et al. (1993), Nagao (1994), Alshawi and Carter (1994), and Srinivas, Doran, and Kulick (1995). In these cases the probabilistic information is primarily used to rank the parses produced by the parser and not so much for the purpose of robustness of the system.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML