File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0408_intro.xml

Size: 8,188 bytes

Last Modified: 2025-10-06 14:02:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0408">
  <Title>Multiword expressions as dependency subgraphs</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 XDG
</SectionTitle>
    <Paragraph position="0"> In this section, we explain the intuitions behind XDG, before proceeding with its formalization, and a description of the XDG solver for parsing and generation.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 XDG intuitions
</SectionTitle>
      <Paragraph position="0"> Extensible Dependency Grammar (XDG) is a new grammar formalism generalizing Topological Dependency Grammar (TDG) (Duchier and Debusmann, 2001). XDG characterizes linguistic structure along arbitrary many dimensions of description. All dimensions correspond to labeled graphs, sharing the same set of nodes but having different edges.</Paragraph>
      <Paragraph position="1"> The well-formedness conditions for XDG analyses are determined by principles. Principles can either be one-dimensional, applying to a single dimension only, or multi-dimensional, constraining the relation of several dimensions. Basic one-dimensional principles are treeness and valency. Multi-dimensional principles include climbing (as in (Duchier and Debusmann, 2001); one dimension must be a flattening of another) and linking (e.g. to specify how semantic arguments must be realized syntactically).</Paragraph>
      <Paragraph position="2"> The lexicon plays a central role in XDG. For each node, it provides a set of possible lexical entries (feature structures) serving as the parameters for the principles. Because a lexical entry constrains all dimensions simultaneously, it can also help to synchronize the various dimensions, e.g.</Paragraph>
      <Paragraph position="3"> with respect to valency. For instance, a lexical entry could synchronize syntactic and semantic dimensions by requiring a subject in the syntax, and an agent in the semantics.</Paragraph>
      <Paragraph position="4"> As an example, we show in (1) an analysis for the sentence He dates her, along two dimensions of description, immediate dominance (ID) and predicate-argument structure (PA). We display the ID part of the analysis on the left, and the PA part on the right:1.</Paragraph>
      <Paragraph position="5"> He dates her</Paragraph>
      <Paragraph position="7"> The set of edge labels on the ID dimension includes subj for subject and obj for object. On the PA dimension, we have arg1 and arg2 standing for the argument slots of semantic predicates.2 The ID part of the analysis states that He is the subject, and her the object of dates. The PA part states that He is the first argument and her the second argument of dates.</Paragraph>
      <Paragraph position="8">  edge labels, but since the assumption of thematic roles is very controversial, we decided to choose more neutral labels. The principles of the underlying grammar require that the ID part of each analysis is a tree, and the PA part a directed acyclic graph (dag).34 In addition, we employ the valency principle on both dimensions, specifying the licensed incoming and outgoing edges of each node. The only employed multi-dimensional principle is the linking principle, specifying how semantic arguments are realized syntactically.</Paragraph>
      <Paragraph position="9"> Figure 1 shows the lexicon of the underlying grammar. Each lexical entry corresponds to both a word and a semantic literal. inID and outID parametrize the valency principle on the ID dimension. inID specifies the licensed incoming, and outID the licensed outgoing edges. E.g.</Paragraph>
      <Paragraph position="10"> He licenses zero or one incoming edges labeled subj, and no outgoing edges. inPA and outPA parametrize the valency principle on the PA dimension. E.g. dates licenses no incoming edges, and requires precisely one outgoing edge labeled arg1 and one labeled arg2. link parametrizes the multi-dimensional linking principle. E.g. dates syntactically realizes its first argument by a subject, the second argument by an object.</Paragraph>
      <Paragraph position="11"> Observe that all the principles are satisfied in (1), and hence the analysis is well-formed. Also notice that we can use the same grammar and lexicon for both parsing (from words) and generation (from semantic literals).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 XDG formalization
</SectionTitle>
      <Paragraph position="0"> Formally, an XDG grammar is built up of dimensions, a lexicon and principle, and characterizes a set of well-formed analyses.</Paragraph>
      <Paragraph position="1"> A dimension is a tuple D = (Lab; Fea; Val; Pri) of a set Lab of edge labels, a set Fea of features, a set Val of feature values, and a set of one-dimensional principles Pri. A lexicon for the dimension D is a set Lex Fea ! Val of total feature assignments called lexical entries. An analysis on dimension D is a triple (V; E; F) of a set V of nodes, a set E V V Lab of directed labeled edges, and an assignment F : V ! (Fea ! Val) of lexical entries to nodes. V and E form a  want it to reflect the re-entrancy e.g. in control constructions, where the same subject is shared by more than one node. graph. We write AnaD for the set of all possible analyses on dimension D. The principles characterize subsets of AnaD. We assume that the elements of Pri are finite representations of such subsets. null An XDG grammar ((Labi; Feai; Vali; Prii)ni=1; Pri; Lex) consists of n dimensions, multi-dimensional principles Pri, and a lexicon Lex. An XDG analysis (V; Ei; Fi)ni=1 is an element of Ana = Ana1 Anan where all dimensions share the same set of nodes V . We call a dimension of a grammar grammar dimension.</Paragraph>
      <Paragraph position="2"> Multi-dimensional principles specify subsets of Ana, i.e. of tuples of analyses for the individual dimensions. The lexicon Lex Lex1 Lexn constrains all dimensions at once, thereby synchronizing them. An XDG analysis is licensed by Lex iff (F1(v); : : : ; Fn(v)) 2 Lex for every node v 2 V .</Paragraph>
      <Paragraph position="3"> In order to compute analyses for a given input, we employ a set of input constraints (Inp), which again specify a subset of Ana. XDG solving then amounts to finding elements of Ana that are licensed by Lex, and consistent with Inp and Pri.</Paragraph>
      <Paragraph position="4"> The input constraints e.g. determine whether XDG solving is to be used for parsing or generation. For parsing, they specify a sequence of words, and for generation, a multiset of semantic literals.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 XDG solver
</SectionTitle>
      <Paragraph position="0"> XDG solving has a natural reading as a constraint satisfaction problem (CSP) on finite sets of integers, where well-formed analyses correspond to the solutions of the CSP (Duchier, 2003). We have implemented an XDG solver (Debusmann, 2003) using the Mozart-Oz programming system (Mozart Consortium, 2004).</Paragraph>
      <Paragraph position="1"> XDG solving operates on all dimensions concurrently. This means that the solver can infer information about one dimension from information on another, if there is either a multi-dimensional principle linking the two dimensions, or by the synchronization induced by the lexical entries. For instance, not only can syntactic information trigger inferences in syntax, but also vice versa.</Paragraph>
      <Paragraph position="2"> Because XDG allows us to write grammars with completely free word order, XDG solving is an NP-complete problem (Koller and Striegnitz, word literal inID outID inPA outPA link He he' fsubj?g fg farg1?; arg2?g fg fg dates date' fg fsubj!; obj!g fg farg1!; arg2!g farg1 7! fsubjg; arg2 7! fobjgg her she' fobj?g fg farg1?; arg2?g fg fg  2002). This means that the worst-case complexity of the solver is exponential. The average-case complexity of many smaller-scale grammars that we have experimented with seems polynomial, but it remains to be seen whether we can scale this up to large-scale grammars.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML