XML Viewer - w97-1505

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-1505_metho.xml
Size: 17,893 bytes
Last Modified: 2025-10-06 14:14:50
<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1505">
  <Title>Maintaining the Forest and Burning out the Underbrush in XTAG</Title>
  <Section position="3" start_page="31" end_page="34" type="metho">
    <SectionTitle>
2 Grammar Organization
</SectionTitle>
    <Paragraph position="0"> The XTAG English grammar currently consists of 768 tree templates, so grammar maintenance is no small task. In general, lexicalizing a TAG creates redundancy because the same trees, modulo their anchor labels, may be associated with many different lexical items. We have eliminated this redundancy by storing only abstract tree templates with uninstantiated anchor labels, and instantiating lexicalized trees on the fly, as words are encountered in the input. Another source of redundancy, however, is the reuse of tree substructures in many different tree templates. For example, most sentential tree templates include a structural fragment corresponding to the phrase-structure rule S --+ NP VP.</Paragraph>
    <Paragraph position="1"> This redundancy poses a problem for grammar maintenance and revision. To consistently implement a change in the grammar, all the relevant trees currently must be edited individually, although we do have an implementation of Becket's metarules (Becker, 1994) which allows us to automate this process to a great extent. For instance, the addition of a new feature equation associated with the structural fragment corresponding to S -~ NP VP would affect most clausal trees in the grammar. Crucially, one can only manually verify that such an update does not conflict with any other principle already instantiated in the grammar. As the grammar grows, the difficulty of this task grows with it.</Paragraph>
    <Paragraph position="2"> Following the idea first proposed in (Vijay-Shankar and Schabes, 1992), we extend the idea of abstraction over lexical anchors. A tree template with an unspecified anchor label subsumes an entire class of lexically specified trees; similarly, we define &amp;quot;meta-templates', or quasi-trees, which subsume classes of tree templates. The quasi-trees are specified by partial tree descriptions in a logical language patterned after Rogers and Vijay-Shanker (Rogers and Vijay-Shankar, 1994); we call the partial descriptions blocks. Since we are using a feature-based LTAG, our language has also been equipped with descriptive predicates allowing us to specify a tree's feature-structure equations, in addition to its structural characteristics. Each block abstractly describes all trees incorporating the partial structure it represents.</Paragraph>
    <Paragraph position="3"> An elementary tree template is expressed as a con- null junction of blocks. The blocks are organized as an inheritance lattice, so that descriptive redundancy is localized within an individual block. Within this description lattice, we isolate two sub-lattices which form more or less independent dimensions: the sub-categorization sub-lattice and the sub-lattice of descriptions of &amp;quot;transformations&amp;quot; on base subcategorization frames, such as wh-question formation and imperative mood. The subcategorization sub-lattice is further divided into four fairly orthogonal subparts: (1) the set of blocks describing the syntactic subject, (2) those for the main anchor(s), (3) those describing complements and (4) those for structure below a complement.</Paragraph>
    <Paragraph position="4"> Similar approaches have been pursued for a large French LTAG by (Candito, 1996) and for the XTAG English grammar by (Becket, 1994). Following the ideas set forth in (Vijay-Shankar and Schabes, 1992), Candito constructs a description hierarchy in much the same way as the present work, albeit for a smaller range of constructions than what exists in the XTAG grammar. Becker's meta-rules can also been seen as partial descriptions, wherein the inputs and outputs of the meta-rules are sisters in a description hierarchy and the parent is the common structure shared by both. However, there is still redundancy across meta-rules whose inputs apply to the same partial descriptions. For instance, the sub-ject wh- extraction and subject relative metarules would be specified independently and both refer to an NP in subject position of a clause.</Paragraph>
    <Section position="1" start_page="32" end_page="33" type="sub_section">
      <SectionTitle>
2.1 Hierarchical Organization of the
Current English Grammar
</SectionTitle>
      <Paragraph position="0"> We use the hierarchy to build the tree templates for the XTAG English grammar. In maintaining the grammar, however, only the abstract descriptions need ever be manipulated; the larger sets of tree templates and actual trees which they subsume are computed deterministically from these high-level descriptions, as given in Figure 1.</Paragraph>
      <Paragraph position="1"> Consider, for example, the description of the relative clause tree for transitive verbs which contains four blocks: one specifying that its subject is extracted, one that the subject is an NP, one that the main anchor is a verb, and one that the complement is an NP. These blocks correspond to the quasi-trees (partially specified trees) shown in Figure 2 and 3(1) and when combined will generate the elementary tree in Figure 3(2). For the sake of simplicity, feature equations are not shown. In these figures, solid lines and dashed lines denote the parent and dominance relations respectively; each node has a label, enclosed in parentheses, and at least one name. Multiple names for the same node are separated by commas such as VP, AnchorP in Figure 2(2). The arc in Figure 3(1) indicates that the precedence order of V and AnchorP is unspecified.</Paragraph>
      <Paragraph position="2"> (In small clauses, the main anchor is a preposition, adjective or noun, not a verb, so AnchorP and VP are not always the same node.) Our lexical organization tool is implemented in Prolog, and contains blocks which account for 85%  of the current English grammar. By the time of the workshop, the remainder of the grammar will also be implemented. There is also an interface to the Prolog module, and a visualization tool for displaying portions of the description lattice.</Paragraph>
    </Section>
    <Section position="2" start_page="33" end_page="34" type="sub_section">
      <SectionTitle>
2.2 A tool for grammar examination
</SectionTitle>
      <Paragraph position="0"> Being able to specify the grammar in a high-level description language has obvious advantages for maintenance and updating of the grammar, in that changes need only be made in one place and are automatically percolated appropriately throughout the grammar. We expect to reap additional benefits from this approach when developing a grammar for another language. Beyond these issues of efficiency and consistency, this approach also gives us a unique perspective on the existing grammar as a whole. Defining hierarchical blocks for the grammar both necessitates and facilitates an examination of the linguistic assumptions that have been made with regard to feature specification and tree-family definition. This can be very useful for gaining a overview of the theory that is being implemented and exposing gaps that have not yet been explained. Because of the organic way in which the grammar was built over the years, we have always suspected that there might exist a fair amount of inconsistency either within the feature structures, or within the tree families. The effort in organizing the lexicon has so far turned up very few non-linguistically motivated inconsistencies, which is a gratifying validation of the constraints imposed by the LTAG formalism.</Paragraph>
      <Paragraph position="1"> Our work in tree organization has allowed us to characterize three principal types of exceptions in the XTAG English grammar: (1) a class of trees is missing from the grammar, though this class would be expected from allowing the descriptive blocks to combine freely (for example, a sentential sub-ject with a verb anchor and a PP complement); (2) within a class of trees, some member is missing, though an analogous member is present in another class (extraction of the clausal complement of a noun-anchored predicative); (3) one tree in a class can be generated by combining quite general descriptions, but there is an exceptional piece of structure or feature equation (the ergative alternation of transitive verbs). While these may sometimes reflect known syntactic generalizations (e.g. extraction islands, as with the example in (2)), they may also reflect inconsistencies which have arisen over the lengthy time-course of grammar development and need to be corrected. As previously noted, the latter have so far been quite limited in number and significance.</Paragraph>
      <Paragraph position="2"> Our approach makes it incumbent on us to seek principled explanations for these irregularities, since they must be explicitly encoded in the description  hierarchy. Without the description hierarchy, there would be no need to reconcile these differences, since they would be entirely independent pieces of a flat grammar.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="34" end_page="35" type="metho">
    <SectionTitle>
3 Tailoring XTAG to the Weather
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="34" end_page="34" type="sub_section">
      <SectionTitle>
Domain
</SectionTitle>
      <Paragraph position="0"> While it is certainly interesting to develop a wide-coverage grammar for its own sake, it is clear that for any practical application the grammar will have to be tailored to the particular domain. Our overarching goal in building the English grammar was to make it broad enough and general enough that tailoring would be a matter of extracting the desired subset of the lexicon and/or the tree database. In this section, we will discuss and evaluate various approaches to specializing a large grammar, and then will discuss our effort at specializing the XTAG English grammar for a weather-message domain.</Paragraph>
    </Section>
    <Section position="2" start_page="34" end_page="34" type="sub_section">
      <SectionTitle>
3.1 General Considerations
</SectionTitle>
      <Paragraph position="0"> In considering how one might specialize a grammar, we make the following basic assumptions: that a sub-language exists; that it can be identified; that there is training data (usually unannotated) available; that default mechanisms will be adequate for handling over-specialization (since we know training data will not perfectly reflect the genre) and that the smaller grammar combined with defaults will still be more efficient than the large grammar.</Paragraph>
      <Paragraph position="1"> Based on these assumptions, the first choice is whether to do full parsing at all in the final application. If the domain contains a large number of fragments, it might be preferable to use a partial parsing approach, in which case development of a sub-grammar will be less crucial. Supertagging (Joshi and Srinivas, 1994) is one such approach; once the supertagger is trained for the domain, it could be used in place of the full parser. If, however, it is determined that full parsing is practicable for the domain, there are still a number of considerations in deriving the sub-grammar.</Paragraph>
      <Paragraph position="2"> In the ideal situation, there would already be a corrected parsed corpus (treebank), which can be used for crafting a sub-grammar for the domain.</Paragraph>
      <Paragraph position="3"> This is exceptionally unlikely, and in the more common case, training data will have to be constructed, either manually or automatically. In a lexicalized grammar like LTAG, this turns out to be quite manageable, since there are distinct representations which encode syntactic structures. We can use a statistical approach, such as supertagging, to make a first pass at assigning the correct structures to each word, and then hand-correct them to derive the relevant set of structures. In non-lexicalized grammars, this process would be much more difficult, because there is no straightforward way to associate structures with lexical word and to identify the rules to be eliminated. If it is impossible to create training data by any other method, the full grammar can be applied and then the output corrected to create a treebank of the training data. Needless to say, this is a tedious, time-consuming and computationally expensive task. Alternatively, a domain expert could provide a list of grammatical phenomena needing to be handled, and this list used to extract the sub-grammar.</Paragraph>
      <Paragraph position="4"> Once the training data has been processed by one of these methods, the sub-grammar is extracted based on the elementary objects in the grammar required to handle all of the syntactic phenomena identified in the training set. This could mean extracting precisely the constructions used in the training set, or generalizing from them. A lexical hierarchy such as that described in Section 2 can be used for this process, with generalization performed along either of the hierarchy dimensions. The expansion could be done by general principles (add all trees of a certain subcat frame if any are present), or could be done based on performance of the sub-grammar on held-out training data.</Paragraph>
      <Paragraph position="5"> Most domains have a rich terminological vocabulary, which if not taken into account can cause prohibitive ambiguity in parsing and interpretation.</Paragraph>
      <Paragraph position="6"> Identifying and demarcating domain specific terminology is helpful for all of these approaches, since the terms can then be treated as single tokens. This can been done either manually or automatically (Daille, 1994; Jacquemin and Royaut, 1994).</Paragraph>
      <Paragraph position="7"> Once the sub-grammar has been finalized, strategies for recovering from failure to parse should be developed. One simple strategy is to fall back to the large/whole grammar. A more sophisticated strategy would be to back off using a lexical hierarchy in the same way it was used for generalizing from the training set.</Paragraph>
    </Section>
    <Section position="3" start_page="34" end_page="35" type="sub_section">
      <SectionTitle>
3.2 Specializing to the Weather Domain
</SectionTitle>
      <Paragraph position="0"> The domain we chose to test out these strategies was weather reports, provided to us by CoGenTex3 The sentences tend to be quite long (an average of 20 tokens/sentence) and complex, and included a large amount of domain specific terminology in addition to many geographical names. To identify the domain 4Thanks to the Contrastive Syntax Project, Linguistics Department of the University of Montreal, for the use of their weather synopsis corpus.</Paragraph>
      <Paragraph position="1">  specific terms, we are using a hand-collected list, but we are currently working with Beatrice Daille (Daille, 1994) to collect them automatically. Collapsing these terms reduced the length of the test sentences by 22%. Example 1 is illustrative of the type of sentences and the terminology in this domain. We split the development data into a training set (99 sentences) and a test set (50 sentences).</Paragraph>
      <Paragraph position="2"> (1) Skies were beginning to clear over \[western New-Brunswick\] and \[western Nova Scotia\] early this morning as \[drier air\] pushed into the district from the west.</Paragraph>
      <Paragraph position="3"> We primarily pursued the full-parsing approach, but explored partial parsing to a more limited extent as well. Since we did not have access to parsed training data, we tried several of the approaches discussed above for creating the small grammar. Parsing with the full grammar was impractical and inefficient. We also attempted to parse the training sentences using a sub-grammar, created with the aid of a domain expert who identified relevant syntactic constructions. We used this information as input to the lexical organization tool to extract a sub-lattice of the grammar hierarchy (along both the subcat and transformational dimensions). However, initial experiments suggest this first pass sub-grammar was still too large, and that more radical pruning of the large grammar would be required.</Paragraph>
      <Paragraph position="4"> The most effective strategy for us was to use the supertagger to create an annotated training corpus. The supertagger (which had been trained on 200,000 words of correctly supertagged WSJ data) performed at about 87%. We then manually corrected the erroneous supertags, and prepared a sub-grammar using the word/POS-tag/supertag triples from the weather training corpus. Using this subgrammar, we set up the task to parse the 50 test sentences, backing off to the full grammar. As of the time of submission of this paper, we were still parsing these sentences. Although the sentences which could be parsed by the sub-grammar were assigned a parse very quickly, overall, we did not see the anticipated speed up that we expected. We suspect that backing off to the full grammar is not the best way to go, and are working on ways to back off using the lexical inheritance hierarchy.</Paragraph>
      <Paragraph position="5"> There are a number of directions for future work suggested by these initial experiments. With regard to partial parsing, we retrained the supertagger on the 100 training sentences (1416 tokens). This supertagger performed at 78%, a considerable decrease from the WSJ-trained supertagger, but respectable given the small training set. Some of the errors produced by the WSJ-trained supertagger were idiosyncratic to the newswire domain, so we plan to explore strategies for combining the information from the WSJ domain with the weather report domain, analogous to techniques used in the speech domain.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML