File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1063_metho.xml

Size: 12,382 bytes

Last Modified: 2025-10-06 14:13:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1063">
  <Title>PARSING A FLEXIBLE WORD ORDER LANGUAGE</Title>
  <Section position="4" start_page="0" end_page="397" type="metho">
    <SectionTitle>
2. PROBLEMS FOR TIlE ID/LP FORMAT
</SectionTitle>
    <Paragraph position="0"> In tile immediate Dominance/Linear Precedence (ID/LP) format of GPSG (Gazdar &amp; Pullum 1981, Gazdar et al. 1985), where tile  information, concerning constituency (=immediate dominance) and linear order, is separated, WO rules are concisely, declaratively and modularly expressed over the domain of local-trees (i.e. trees of depth 1). E.g. the ID rule A &amp;quot;-)~D B, C, D, if no linearization restrictions are declared, stands for the mother node expanded into its siblings appearing in any order; declaring the restriction { D &lt; C } e.g., it stands for the CFG rules { A --&gt; B D C, A --&gt; D B C and A --) D CB}.</Paragraph>
    <Paragraph position="1"> It is important to note that in GPSG the linear precedence rules stated for a pair of sibling constituents should be valid for the whole set of grammar rules in which these constituents occur, and not just for some specific rule (this &amp;quot;global&amp;quot; empirical constraint on WO is called the Exhaustive Constant Partial Ordering (ECPO) property).</Paragraph>
    <Paragraph position="2"> However, there are problems with ECPO.</Paragraph>
    <Paragraph position="3"> They may be illustrated with a simple example from Bulgarian. Consider a grammar describing sentences with a reflexive verb and a reflexive particle (the NP-subject and the adverb being optional), responsible for expressions whose English equivalent is e.g. &amp;quot;(Ivan) shaved himself (yesterday)&amp;quot;.</Paragraph>
    <Paragraph position="4">  (1) S-&gt;mNP, VP (2) S &amp;quot;-&gt;m VP % omitted subject (3) VP &amp;quot;&gt;m V\[refl\], Part\[refl\], Adv (4) VP -&gt;ID V\[refl\], Part\[refl\] % omitted adverb  First, assume we derive a sentence, applying rules (2) and (3). (5a-b) are the only accept,able linearizations of the sister constituents in (3).  is non-ECPO. Thus, fixing any ordering between any two constituents in (3) will, of necessity, block at least one of the correct orderings (5a-b); ,alternatively, sanctioning no WO restriction will result in overgeneration, admitting, besides the grammatical (Sa-b), 4 ungrammatical permutations. This inability to impose an arbitrary ordering on siblings we will c,-dl the ordering-problem of ID/LP grammars.</Paragraph>
    <Paragraph position="5"> Now assume we derive a sentence, applying rules (1) and (4). The ordering of the siblings, reflexive verb and particle, in (4) now depends on the order of nodes NP and VP higher up in the tree in rule (i): if NP precedes VP in (1), then the reflexive particle must precede the verb in (4), otherwise it should follow it.</Paragraph>
    <Paragraph position="6">  (meaning: Ivan shaved himself) Again we are in trouble since LP rules cannot impose orderings among non-siblings, their domain of application being just siblings. This we call the domain-problem of ID/LP grammars, it is essential to note that the domain-problem may not be remedied (even if we are inclined to sacrifice linguistic intuitions) by &amp;quot;flattening&amp;quot; the tree, e.g. collapsing rules (1) and (4) into (8) S &amp;quot;&gt;ID NP, V\[refl\], Part\[refl\] Escaping the second problem, thrusts us into the first: we now cannot properly order the siblings, the CFG, corresponding to (7a-b), being the non-ECPO (6).</Paragraph>
    <Paragraph position="7"> Sporadic counter-evidence for ECPO grammars has been found for some languages like English (the verb-particle construction, Sag 1987, Pollard and Sag 1987), German (complex fronting, Uszkoreit 1985, Engelkamp et aL 1992) and Finnish (the adverb my(is 'also, too' Zwicky and Nevis 1986). Bulgarian offers m,'kssive counter-evidence (Pericliev 1992b); one major example, the Bulgarian clitic system, we discuss in Section 4.</Paragraph>
  </Section>
  <Section position="5" start_page="397" end_page="397" type="metho">
    <SectionTitle>
3. THE FORMALISM
EFOG (Extended Flexible word Order
</SectionTitle>
    <Paragraph position="0"> Grammar) extends the expressive power of the ID/LP format. First, EFOG introduces further WO restrictions in addition to precedence (enabling it to avoid the ordering-problem), and, second, the formalism extends the domain of application of these WO restrictions (in order to handle the domain-problem).</Paragraph>
    <Paragraph position="1">  In the immediate dominance part of rules EFOG has two types of constituents: non-contiguous (notated: #Node) and contiguous (notated just: Node), where Node is some node. Informally, a contiguous node shows that its daughters fern1 a contiguous sequence, whereas a non-contiguous one allows its daughters to be interspersed among the sisters of this non-contiguous node.</Paragraph>
    <Paragraph position="2"> E.g. in EFOG notation (using a double arrow for ID rules, small case letters for constants and upper case ones for variables), the grammar of tim Latin sentence: Puella bona puerum parvum amat (good girl loves small boy), grammatical in all its 120 permutations and, besides, having discontinuity in the noun phrases, we capture with the following structured EFOG rules with no we restrictions:</Paragraph>
    <Paragraph position="4"> accompanied by the dictionary rules:</Paragraph>
    <Paragraph position="6"> The non-contiguous nodes allow us to impose an ordering (or to intersperse, as in the above case) MI their daughter nodes without having to sacrifice the natural constituencies. It will be clear that this extension of the domain of LP rules (which can go any depth we like), besides ordering between non-siblings, allows an elegant treatment of discontinuities.</Paragraph>
    <Paragraph position="7"> In order to solve the ordering-problem, we Imve introduced additional we constraints. The following atomic we constraints have been defined:  Node), where Node is a node; e.g. first (a, s ) designates that a is sentence-initial.</Paragraph>
    <Paragraph position="8"> We also allow atomic we constraints to combine into complex logical expressions, using the following operators with obvious semantics:  Our we restriction language is, of course, partly logically redundant (e.g. immediately precedence may be expressed through precedence and adjacency, and so is tim case with the last two of the operators, etc.).</Paragraph>
    <Paragraph position="9"> ltowever, what is logically is not necessarily psychologically equiwdent, and our goal tins been to maintain a linguist-friendly notation (el. requirement (ii) of Section 1). To take just one example, we have 'after' in addition to 'before', since linguists normally speak of precedence of dependent with respect to head word, not vice versa, and hence will use both expressions in respective situations (surely it is not by chance that NLs also have both words).</Paragraph>
    <Paragraph position="10"> As a simple example of the ordering possibilities of EFOG, consider the we Universal 20 (of Greenberg and Hawkins) to the effect that NPs comprising dem(onstrative), num(eral), adj(ective) and noun can appear in that order, or in its mirror-image. We can write a &amp;quot;universal&amp;quot; rule enforcing adjacent permutations of all constituents as follows: np ==&gt; dem, num, adj, noun.</Paragraph>
    <Paragraph position="11"> ip: dem &lt;&gt; num and num &lt;&gt; adj and adj &lt;&gt; noun.</Paragraph>
  </Section>
  <Section position="6" start_page="397" end_page="397" type="metho">
    <SectionTitle>
4. BULGARIAN CLITICS
</SectionTitle>
    <Paragraph position="0"> Bulgarian clitics fall into different categories: (1) nominals (short accusative pronouns: me &amp;quot;me&amp;quot;, te &amp;quot;you&amp;quot;, etc.; short dative prononns: mi &amp;quot;to me&amp;quot;, ti &amp;quot;to you&amp;quot;, etc.); (2) verbs (the present tense forms of &amp;quot;to be&amp;quot; sam &amp;quot;am&amp;quot;, si &amp;quot;(you) are&amp;quot;, etc.); (3) adjectives (short possessive pronouns: mi &amp;quot;my&amp;quot;, ti &amp;quot;your&amp;quot;, etc.; short ml\]exive pronoun: si &amp;quot;one's own&amp;quot;); and (4) particles (inten;ogative li &amp;quot;do&amp;quot;, reflexive se &amp;quot;myself/yourself..&amp;quot;, the negative ne &amp;quot;no(t)&amp;quot;, etc.). They have the distribution of the specific categories they belong to, but show diverse, and quite complex orderings, varying in accordance with the positions of their siblings/non-siblings as well as the position of other clitics appearing in the sentence.' In effect, dmir ordering as a rule i This often results in discontinuities (o1&amp;quot; nonprojectivities). For an automated way of discovering and a description of such constructs  cannot be correctly stated in the standard ID/LP format.</Paragraph>
    <Paragraph position="1"> By way of illustration, below we present the EFOG version (simplified for expository reasons) of the grammar (1-4) from Section 2 to get the flavour of how we handle the problems mentioned there. The ID rules are as follows (note that the non-contiguous node #vp allows its daughters v(refl), part (refl), ,'rod adv to be ordered with respect to np):  part(refl), part(refl).</Paragraph>
    <Paragraph position="2"> np ::&gt; \[ivan\].</Paragraph>
    <Paragraph position="3"> v(refl) =:&gt; \[brasna\].</Paragraph>
    <Paragraph position="4"> part(refl) ::&gt; \[se\].</Paragraph>
    <Paragraph position="5"> adv ::&gt; \[vcera\].</Paragraph>
    <Paragraph position="6"> The WO ofv(refl) and part (refl) is as follows. First, the reflexive particle never occurs sentence-initially (information we cannot express in ID/LP); in EFOG we express this as: ip: not(first(part(refl),s)) .</Paragraph>
    <Paragraph position="7"> Secondly, we use the default rule 'ifthenelse' to declare the regularity that the particle in question immediately precedes the verb, unless when the verb occurs sentenceqnitially, in which case the particle immediately follows it (which is of course also inexpressible in ID/LP): ip : ifthenelse( first(v(refl),s), v(refl) &lt;&lt; part(refl), part(refl) &lt;&lt; v(refl)).</Paragraph>
    <Paragraph position="8"> These two straightforward LP rules thus are ,all we need to get exactly the linearizations we want: those of (Sa-b) and (7a-b), as well as ,all and the only other correct expressions derivable from the ID grammar. These LP rules are also interesting in that they express the overall behaviour of a number of other proclitically behaving clitics (as e.g. those with nominal ,'rod verbal nature; see above).</Paragraph>
    <Paragraph position="9"> Because of space limitations we cannot enter into further details here. Suffice it to say that EFOG was tested successfully in the description of this veo' complicated domain 2 as well as in some other hard ordering problems in Bulgari,'m.</Paragraph>
  </Section>
  <Section position="7" start_page="397" end_page="397" type="metho">
    <SectionTitle>
6. CONCLUSION
</SectionTitle>
    <Paragraph position="0"> Logic grammars have generally failed to handle flexible WO in a satisfactory way. We have described a formalism which allows the grammar-writer to express complex WO rules in a language (including discontinuity) in a concise, modular and natural way. EFOG extends the expressive power of the ID/LP format in both allowing complex LP rules and extending their domain of application.</Paragraph>
    <Paragraph position="1"> EFOG is based on a previous version of the formalism, called FOG (Pericliev and Grigorov 1992), also seeking to overcome the difficulties with the ID/LP format. FOG however looked for different solutions to the problems (e.g. using LP rules attached to each specific ID rule, rather than global ones, which unnecessarily proliferated the LP part of the grammar; or employing flattening rather than having non-contiguous grammar symbols to the same effect). EFOG is also related to FO-TAG (Becker et al.</Paragraph>
    <Paragraph position="2"> 1991) and the HPSG approach (Engelkamp et al. 1992, Oliva 1992) in extending the domain of applicability of LP rules. A comparisson with these form~disms is beyond the scope of this study; we may only mention here that our inventory of LP relations is larger, and unlike e.g. the latter approach we do not confine to binary branching trees.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML