File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/j97-2005_metho.xml

Size: 11,073 bytes

Last Modified: 2025-10-06 14:14:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="J97-2005">
  <Title>Squibs and Discussions A Delayed Syntactic-Encoding-based LFG Parsing Strategy for an Indian Language Bangla Probal Sengupta* Indian Statistical Institute</Title>
  <Section position="2" start_page="0" end_page="346" type="metho">
    <SectionTitle>
2. Delayed Syntactic Encoding
</SectionTitle>
    <Paragraph position="0"> As suggested in Bresnan (1982a, 1982b) and Mohanan (1982), a fiat constituent structure for a Bangla sentence S is given by the rule in (1), where constituent NPs and/or  * Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, 203 B. T. Road, Calcutta 700 035. India. E-mail: sprobal@isical.ernet.in; bbc@isical.ernet.in 1 Maxwell and Kaplan (1991) also propose a method to minimize unification overhead.</Paragraph>
    <Paragraph position="1"> (~) 1997 Association for Computational Linguistics  Computational Linguistics Volume 23, Number 2 Table 1 Case markers and their possible grammatical functions. The GEN case marker normally marks a genitive qualifier of a noun.</Paragraph>
    <Paragraph position="2"> However, for certain verb forms (for example, ones in pseudopassive voice), it also marks the subject.</Paragraph>
    <Paragraph position="3"> Marker Name SUBJ OBJ IOBJ ADJUNCT</Paragraph>
    <Paragraph position="5"> (2) In (1), syntactic encoding of GF's is carried out using the simplified encoding schemata (2) annotating the NPs (Bresnan 1982b, 297-299). In the implementation domain, schemata (2) works quite well if the mapping from case marker to function is nearly one-to-one. Unfortunately, as shown in Table 1, many modern Indian languages lack this property--almost every marker has many-to-one mapping. The classical way of handling such situations is to use alternation or disjunction. However, in the context of an a priori lexical knowledge of the verb, the alternations cease to exist in most cases. To express this more formally, let G = {gl,g2 .... } be the set of relevant GF's, C = {cl, c2,...} be the set of NP case markers, and cToG be a mapping from case markers to GF's, such that cToG(c), c E C is (are) the grammatical function(s) predictable from c. In our case, cToG(c) is actually a finite disjunction gil V gi2 V ... of functions. If fN is the f-structure of an NP of a sentence S with f-structure fs and the case marker on the head noun of the NP is c, the semantics of schemata (2) annotating the NP is (fs cToG\[c\]) =fN, where &amp;quot;=&amp;quot; denotes unification. Since cToG\[c\] is a disjunction, in a parser implementation, it effectively multiplies out to IcToG(c)l nondeterministic choices for the functional role played by the fN in fs. If, in the ultimate analysis, the NP is found to play the functional role g in fs, the constraints set in (3), projected by the verb, must have been satisfied: (Is g CASE) = c (3) Ai (fs g qi) = vi where qi are different normal agreement features (like NUMBer, PERSon, etc.) and/or other semantic agreement features (like ANIMacy, etc.). We shall call the schema (3) the agreement schema for the function g projected by the verb. Observations show that in most well-formed sentences, the agreement schema of the verb for any function g is satisfied by at most one constituent NP of the sentences, provided some order of processing the agreement schema of different GF's is maintained. The mapping cToG is therefore nearly one-to-one in the context of the agreement schema of the verb and the  Sengupta and Chaudhuri Delayed Syntactic Encoding agreement schema may serve as test criteria for selecting grammatical functions from internal properties of NPs. The parser must ensure evaluation of an encoding schemata of a constituent NP in the context of the agreement schema of the verb, somewhat like handling a forward reference (where an item referred to is defined later than the places where it has been referred to). The trick is to delay the evaluation of encoding schema of constituent NPs till an appropriate moment, while maintaining a persistent data structure, such as a symbol table, to keep track of the points of forward reference (at which actual function names get instantiated) and their local environments (the internal f-structure of the constituent NPs).</Paragraph>
  </Section>
  <Section position="3" start_page="346" end_page="349" type="metho">
    <SectionTitle>
3. The Proposed Solution Technique
</SectionTitle>
    <Paragraph position="0"> In this section, we provide the basic solution technique for simple sentences (i.e., consisting of a single verb only) in two parts.</Paragraph>
    <Section position="1" start_page="346" end_page="346" type="sub_section">
      <SectionTitle>
3.1 Solution Part I: Initiation of Forward Reference
</SectionTitle>
      <Paragraph position="0"> A forward reference discussed in the previous section is encountered during Locate-ing the left-hand side of a schemata like (2) while processing an NP. In our delayed encoding proposal, the (modified) Locate operation should leave the &amp;quot;name&amp;quot; of the functional role played by the NP as &amp;quot;underspecified.&amp;quot; To force the Locate operator to behave in this manner, we propose:</Paragraph>
      <Paragraph position="2"> The introduction of a new type of underspecification metavariable: ? The modification of encoding schemata (2) to schemata (4):</Paragraph>
      <Paragraph position="4"> The ? metavariables generate placeholders for hitherto anonymous grammatical functions, which we shall call nameholders, and denote them by actual name variables nl, n2,.... Locate-ing of schemata (4) creates such a nameholder (n, say) in the scope of the functional placeholder (f, say) for the T metavariable and simultaneously stores the pair (f, n) in the symbol table. Locate-ing a construct like (f n) where both f and n are already defined placeholder and nameholder, respectively, returns (a pointer to) the &amp;quot;value&amp;quot; part of the pair in the f-structure (pointed at by)f, whose name is (pointed at by) n. The extended semantics of Locate is therefore: Locate\[d\], where d has the form (x y). Let f be the reference to an f-structure Locate\[x\]. If y is a ? metavariable, let n be a new nameholder for the metavariable.</Paragraph>
      <Paragraph position="5"> An anonymous slot is created in the scope off, and n is made to point to it. Simultaneously, the pair (f, n) is entered as a new entry of the symbol table. If, however, y is a nameholder n, Locate returns the value field of the pair inf whose name field is held by n.</Paragraph>
      <Paragraph position="6"> With this, the semantics of Locate with reJspect to the form in (5), which is the left-hand side of schemata (4), may be pictorially represented as in Figure 1.</Paragraph>
      <Paragraph position="8"/>
    </Section>
    <Section position="2" start_page="346" end_page="349" type="sub_section">
      <SectionTitle>
3.2 Solution Part II: Name Binding of Forward References
</SectionTitle>
      <Paragraph position="0"> The next point to be considered is binding actual function names to nameholders.</Paragraph>
      <Paragraph position="1"> We assume that the agreement schema for a function g may select the structure that satisfies the constraints. For this, the agreement schema must be handled in a different  Semantics of Locate with respect to (5).</Paragraph>
      <Paragraph position="2"> Symbol Table manner than normal projection schema. We choose the notation (# g ql) --- vl for one agreement schemata for the function g. We shall call the forms (# g qi) -= via metastructure or m-structure. M-structure schema are projected by the main verb of a sentence. A symbol table entry (f, n) satisfies an m-structure schemata (# g qi) = vi projected by the verb V of a sentence S, iff is the f-structure of S, and the structure (f n), where n is treated as an atom, contains the pair \[qi vi\]. If a symbol table entry satisfies all m-structure schema for a function g, by our proposed scheme, the nameholder n that points to the entry is bound to the function name g. Also, the satisfying symbol table entry is deleted.</Paragraph>
      <Paragraph position="3"> Testing of symbol table entries with m-structure schema and resulting binding of nameholders to actual function names are carried out by a newly introduced operator Search. The operator Search takes the entire set m-structure schema for a particular GF and carries out the process described in the previous paragraph. If more than one symbol table entry satisfies the m-structure schema for a particular function g, the one earlier in order of occurrence is chosen. The relative evaluation (by operating with Search) order for the sets of m-structure schema for different functions is motivated by the default ordering of phrases in a sentence in the target language. In Bangla for example, the default ordering is SUBJ-IOBJ-OBJ. Thus, the test for SUBJ is carried out first, followed by IOBJ, and OBJ, if any.</Paragraph>
      <Paragraph position="4"> The final solution technique therefore involves first evaluating all f-structure schema, including those with underspecification metavariables annotating the children nodes of an S-dominated c-structure tree. This would generate symbol table entries corresponding to NPs annotated with the ? schema. Next, the m-structure schema of the main verb are operated on with the Search operator in the default phrasal order for the language. A sentence is well formed if and only if all the m-structure schema for the verb are satisfied and all nameholders in the scope of the sentence are bound to names (i.e., at the end, the symbol table is empty). The evaluation process naturally satisfies the uniqueness property for sentence-level grammatical functions.</Paragraph>
      <Paragraph position="5"> Regarding the relative evaluation order of f- and m-structure schema, the general principle is &amp;quot;all f-structure schema are evaluated before any m-structure schemata is evaluated (i.e., fed to the Search operator).&amp;quot; Example 1 Let us consider the Bangla simple sentence below, in which the NPs have been under-</Paragraph>
      <Paragraph position="7"> You (honored) will give me a book  Sengupta and Chaudhuri Delayed Syntactic Encoding</Paragraph>
      <Paragraph position="9"> Lexical entries of head nouns and verbs in</Paragraph>
      <Paragraph position="11"> Any permutation of the underlined phrases and the verb should give identical results. The lexical entries of the head nouns and the verb are given in Figure 2. The feature HON is a three-valued scalar, 1 for honored, 0 for casual, and -1 for intimate.</Paragraph>
      <Paragraph position="12"> Since Bangla has no subject-verb agreement based on number, the NUM feature has been omitted. 3 The f-structure fs of the sentence before processing the m-structure of the verb appears as in Figure 3 and the final solution is as given in Figure 4. The f-structures fa, fb, and fc are for the NPs in order.</Paragraph>
      <Paragraph position="13">  Final solution for Bangla sentence a'pni a'ma'ke ekt'a&amp;quot; bai dilen.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML