File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-1522_metho.xml

Size: 5,282 bytes

Last Modified: 2025-10-06 14:10:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1522">
  <Title>Vancouver, October 2005. c(c)2005 Association for Computational Linguistics From Metagrammars to Factorized TAG/TIG Parsers</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Generic factorization operators
</SectionTitle>
    <Paragraph position="0"> The first factorization operators provided by DYALOG are the disjunction, Kleene star, and optionality operators. A finer control of optionality is provided through the notion of guards, used to state conditions on the presence or absence of a node (or of a node sequence). An expression (G+,x;G[?]) means that the guard G+ (resp. G[?]) should be satisfied for x to be present (resp. absent). A guard G is a boolean expression on equations between FS paths and is equivalent to a finite set of substitutions SG. Used to handle local free-word orderings, the interleaving (or shuffling) of two sequences (ai)i=1***n##(bj)j=1***m returns all sequences containing all ai and bj in any order that preserves the original orderings (i.e., ai &lt; ai+1 and bj &lt; bj+1).</Paragraph>
    <Paragraph position="1"> These operators do not increase the expressive power or the worst-case complexity of TAGs. They are implemented without expansion, ensuring good performances and more natural parsing output (with no added non-terminals).</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="190" type="metho">
    <SectionTitle>
3 Meta-Grammars
</SectionTitle>
    <Paragraph position="0"> MGs allow modular descriptions of syntactic phenomena, using elementary constraints grouped into classes. A class may inherit constraints from several parent classes and can also provide a resource or require a resource. Constraints on nodes include equality, precedence, immediate and indirect dominances. The constraints may also be on node and class decorations, expressed with Feature Structures.</Paragraph>
    <Paragraph position="1"> The objective of our MG compiler, also developed with DYALOG, is to cross the terminal classes (i.e. any class without descendants) in order to obtain neutral classes where each provided resource  has been consumed and conversely. Constraints are accumulated during crossing and are only kept the neutral classes whose accumulated constraints are satisfiable, taking into account their logical consequence. Minimal trees satisfying the constraints of the neutral classes are then produced.</Paragraph>
    <Paragraph position="2"> Getting factorized trees results from several mechanisms. A node may group alternatives, and may be made optional or repeatable (for Kleene stars). When generating trees, underspecified precedences between sibling nodes are handled by the interleaving operator.</Paragraph>
    <Paragraph position="3"> Positive and negative guards may be attached to nodes and are accumulated in a conjunctive way during the crossing phase, i.e. N = G1 and N = G2 is equivalent to N = (G1,G2). The compiler checks the satisfiability of the guards, removing the alternatives leading to failures and equations in guards which become trivially true. The remaining guards are emitted as DYALOG guards in the trees.</Paragraph>
  </Section>
  <Section position="6" start_page="190" end_page="190" type="metho">
    <SectionTitle>
4 Grammar anatomy
</SectionTitle>
    <Paragraph position="0"> In just a few months, we have developed, for French, a MG with 191 classes, used to generate a very compact TAG of only 126 trees. Only 27 trees are anchored by verbs and they are sufficient to cover canonical, passive and extracted verbal constructions with at most 2 arguments (including objects, attributes, completives, infinitives, prepositional arguments, wh-completives). These trees would correspond to several thousand trees, if the factorization operators were expanded. This strong compaction rate stems from the presence of 820 guards, 92 disjunctions (to handle choices in realizations), 26 interleavings (to handle verb argument positions) and 13 Kleene stars (to handle coordinations). The grammar is mostly formed of simple trees (with less than 17 nodes), and a few complex trees (26 trees between 30 and 46 nodes), essentially anchored by verbs.</Paragraph>
    <Paragraph position="1"> For instance, tree #1111, used for canonical verb constructions, results from the crossing of 25 terminal classes, and has 43 nodes, plus 3 disjunction nodes (for the different realizations of the subject and other verb arguments) and 1 interleaving node  perl/frmg/tree.pl.</Paragraph>
    <Paragraph position="2"> (between the verb arguments and a possible post-verbal subject). The tree is controlled by 35 guards, governing, for instance, the presence and position of a subject and of clitics.</Paragraph>
    <Paragraph position="3"> Such a tree covers much more verb sub-categorization frames than the number of frames usually attached to a given verb. The anchoring of a tree a by a word w is done by unifying two feature structures Ha and Hw, called hypertags (Kinyon, 2000), that list the syntactic properties covered by a and allowed by w. The link between Ht and the allowed syntactic constructions is done through the variables occurring inHt and in the guards and node decorations.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML