File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/03/j03-4001_abstr.xml

Size: 7,386 bytes

Last Modified: 2025-10-06 13:42:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="J03-4001">
  <Title>c(c) 2003 Association for Computational Linguistics Dependency Parsing with an Extended Finite-State Approach</Title>
  <Section position="2" start_page="0" end_page="518" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Finite-state machines have been used for many tasks in language processing, such as tokenization, morphological analysis, and parsing. Recent advances in the development of sophisticated tools for building finite-state systems (e.g., XRCE Finite State Tools [Karttunen et al. 1996], AT&amp;T Tools [Mohri, Pereira, and Riley 1998], and Finite State Automata Utilities [van Noord 1997]) have fostered the development of quite complex finite-state systems for natural language processing. In the last several years, there have been a number of studies on developing finite-state parsing systems (Koskenniemi 1990; Koskenniemi, Tapanainen, and Voutilainen 1992; Grefenstette 1996; Chanod and Tapanainen 1996; Ait-Mokhtar and Chanod 1997; Hobbs et al. 1997). Another stream of work in using finite-state methods in parsing is based on approximating context-free grammars with finite-state grammars, which are then processed by efficient methods for such grammars (Black 1989; Pereira and Wright 1997; Grimley-Evans 1997; Johnson 1998; Nederhof 1998, 2000). There have also been a number of approaches to natural language parsing using extended finite-state approaches in which a finite-state engine is applied multiple times to the input, or various derivatives thereof, until some termination condition is reached (Abney 1996; Roche 1997).</Paragraph>
    <Paragraph position="1"> This article presents an approach to dependency parsing using a finite-state approach. The approach is similar to those of Roche and Abney in that all three use an extended finite-state scheme to parse the input sentences. Our contributions can be summarized as follows: * Our approach differs from Roche's and Abney's in that it is based on the dependency grammar approach and at the output produces an encoding of the dependency structure of a sentence. The lexical items and the dependency relations are encoded in an intertwined manner and manipulated by grammar rules, as well as structural and linguistic [?] Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli, 34956, Tuzla, Istanbul, Turkey. E-mail: oflazer@sabanciuniv.edu.</Paragraph>
    <Paragraph position="2">  Computational Linguistics Volume 29, Number 4 constraints implemented as finite-state filters, to arrive at parses. The output of the parser is a finite-state transducer that compactly packs all the ambiguities as a lattice.</Paragraph>
    <Paragraph position="3"> * As our approach is an all-parses approach with no statistical component, we have used Lin's (1995) proposal for ranking the parses based on the total link length and have obtained promising results. For over 48% of the sentences, the correct parse was among the dependency trees with the smallest total link length.</Paragraph>
    <Paragraph position="4"> * Our approach can employ violable constraints for robust parsing so that when the parser fails to link all dependents to a head, one can use lenient filtering to allow parses with a small number of unlinked dependents to be output.</Paragraph>
    <Paragraph position="5"> * The rules for linking dependents to heads can specify constraints on the intervening material between them, so that, for instance, certain links may be prevented from crossing barriers such as punctuation or lexical items with certain parts of speech or morphological properties (Collins 1996; Giguet and Vergne 1997; Tapanainen and J&amp;quot;arvinen 1997).</Paragraph>
    <Paragraph position="6"> We summarize in Figure 1 the basic idea of our approach. This figure presents in a rather high-level fashion, for a Turkish and an English sentence, the input and output representation for the approach to be presented. For the purposes of this summary, we assume that none of the words in the sentences have any morphological ambiguity and that their morphological properties are essentially obvious from the glosses. We represent the input to the parser as a string of symbols encoding the words with some additional delimiter markers. Panel (a) of Figure 1 shows this input representation for a Turkish sentence, on the top right, and panel (b) shows it for an English sentence. The parser operates in iterations. In the first iteration, the parser takes the input string encoding the sentence and manipulates it to produce the intermediate string in which we have three dependency relations encoded by additional symbols (highlighted with boldface type) injected into the string. The partial dependency trees encoded are depicted to the left of the intermediate strings. It should be noted that the sets of dependency relations captured in the first iteration are different for Turkish and English. In the Turkish sentence, two determiner links and one object link are encoded in parallel, whereas in the English sentence, two determiner links and one subject link are encoded in parallel. The common property of these links is that they do not &amp;quot;interfere&amp;quot; with each other.</Paragraph>
    <Paragraph position="7"> The second iteration of the parser takes the output of the first iteration and manipulates it to produce a slightly longer string in which symbols encoding a new subject (object) link are injected into the Turkish (English) string. (We again highlight these symbols with boldface type.) Note that in the English string the relative positions of the link start and end symbols indicate that this is a right-to-left link. The dependency structures encoded by these strings are again on their left. After the second iteration, there are no further links that can be added, since in each case there is only one word left without any outgoing links and it happens to be the head of the sentence.</Paragraph>
    <Paragraph position="8"> The article is structured as follows: After a brief overview of related work, we summarize dependency grammars and aspects of Turkish relevant to this work. We provide a summary of concepts from finite-state transducers so that subsequent sections can be self-contained. We continue by describing the representation that we have employed for encoding dependency structures, along with the encoding of dependency linking rules operating on these representations and configurational constraints</Paragraph>
    <Section position="1" start_page="517" end_page="518" type="sub_section">
      <SectionTitle>
Oflazer Dependency Parsing
</SectionTitle>
      <Paragraph position="0"> Figure 1 Dependency parsing by means of iterative manipulations of strings encoding dependency structures.</Paragraph>
      <Paragraph position="1">  Computational Linguistics Volume 29, Number 4 for filtering them. We then describe the parser and its operational aspects, with details on how linguistically motivated constraints for further filtering are implemented. We briefly provide a scheme for a robust-parsing extension of our approach using the lenient composition operation. We then provide results from a prototype implementation of the parser and its application to dependency parsing of Turkish. We close with remarks and conclusions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML