File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-6001_metho.xml
Size: 29,071 bytes
Last Modified: 2025-10-06 14:09:40
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-6001"> <Title>The TIGER 700 RMRS Bank: RMRS Construction from Dependencies</Title> <Section position="4" start_page="1" end_page="2" type="metho"> <SectionTitle> 2 From TIGER Dependency Bank to TIGER RMRS Bank 2.1 The TIGER Dependency Bank </SectionTitle> <Paragraph position="0"> The input to the treebank conversion process consists of dependency representations of the TIGER Dependency Bank (TIGER-DB). The TIGER-DB has been derived semi-automatically from (a subset of) the TIGER-LFG Bank of Forst (2003), which is in turn derived from the TIGER treebank. The dependency format is similar to the Parc 700 Dependency Bank (King et al., 2003). It abstracts away from constituency in order to remain as theory-neutral as possible. So-called dependency triples are sets of two-place predicates that encode grammatical relations, the arguments representingthe head of the dependency and the dependent, respectively. The</Paragraph> <Paragraph position="2"> of sentence #8595: Privatmuseum muss weichen - Private museum deemed to vanish.</Paragraph> <Paragraph position="3"> triples further retain a number of morphological features from the LFG representations, such as agreement features or tense information. Figure 1 displays an example.</Paragraph> <Paragraph position="4"> For the purpose of RMRS construction, the triples format has advantages and disadvantages. The LFG-derived dependencies offer all the advantages of a functional as opposed to a constituent-based representation. This representation already filters out the semantically inappropriate status of auxiliaries as heads; their contribution is encoded by features such as perf or fut, which can be directly translated into features of semantic event variables. Most importantly, the triples localise dependencies which are not locally realised in phrase structure (as in long-distance constructions), so that there is no need for additional mechanisms to identify the arguments of a governing predicate. Moreover, the dependency representation format is to a large extent uniform across languages, in contrast to phrase-structural encoding. Therefore, the dependency-based semantics construction mechanism can be quickly ported to other languages.</Paragraph> <Paragraph position="5"> The challenges we face mainly concern a lack of specific types of phrase structure information that are crucial for RMRS composition. Linear precedence, e.g., plays a crucial role when it comes to multiple modification or coordination. Yet, it is possible to reconstruct the surface order from the indices attached to the Pred values in the triples. Part-of-speech information, which is useful to trigger different types of semantics construction rules, can be induced from the presence or absence of certain morphological features, yet to a limited extent.</Paragraph> <Paragraph position="6"> For our current purpose of treebank conversion, we are dependent on the specific input format of the TIGER-DB, while in a more general parsing context, one could ensure that missing information of this type is included in the input to semantics construction.</Paragraph> <Section position="1" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 2.2 Treebank Conversion </SectionTitle> <Paragraph position="0"> Similar to the TIGER to TIGER-DB conversion (Forst, 2003; Forst et al., 2004), we are using the term rewriting system of Crouch (2005) for treebank conversion. Originally designed for machine translation, the system is a powerful rewriting tool that has been applied to other tasks, such as frame semantics construction (Frank and Erk, 2004), or induction of knowledge representations (Crouch, 2005).</Paragraph> <Paragraph position="1"> The input to the system consists of a set of facts in a prolog-like term representation. The rewrite rules refer to these facts in the left-hand side (LHS), either conjunctively (marked by ',') or disjunctively (marked by '|'). Expressions on the LHS may be negated (by prefix '-'), thereby encoding negative constraints for matching. A rule applies if and only if all facts specified on the LHS are satisfied by the input set of facts. The right-hand side (RHS) of a rewrite rule defines a conjunction of facts which are added to the input set of facts if the rule applies. The system further allows the user to specify whether a matched fact will be consumed (i. e., removed from the set of facts) or whether it will be retained in the output set of facts (marked by prefix '+').2 The system offers powerful rule encoding facilities in terms of macros and templates.</Paragraph> <Paragraph position="2"> Macros are parameterized patterns of (possibly disjunctive) facts; templates are parameterized abstractions over entire (disjunctive) rule applications. These abstraction means help the user to define rules in a perspicious and modular way, and significantly enhance 2The system additionally features optional rules ('?=>'), as opposed to deterministic rewriting ('==>'). However, given that the input structures for RMRS construction are disambiguated, and since our target structures are underspecified semantic structures, we can define the semantics deterministically.</Paragraph> <Paragraph position="3"> the maintainability of complex rule sets.</Paragraph> <Paragraph position="4"> The processing of rules is strictly ordered.</Paragraph> <Paragraph position="5"> The rules are applied in the order of textual appearance. Each rule is tested against the current input set of facts and, if it matches, produces an output set of facts that provides the input to the next rule in sequence. Each rule applies concurrently to all distinct sets of matching facts, i.e. it performs parallel application in case of alternative matching facts.</Paragraph> </Section> </Section> <Section position="5" start_page="2" end_page="7" type="metho"> <SectionTitle> 3 RMRS Construction from </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="2" end_page="3" type="sub_section"> <SectionTitle> Dependencies </SectionTitle> <Paragraph position="0"> Within the framework of HPSG, every lexical item defines a complete (R)MRS structure. The semantic representation of a phrase is defined as the assembly and combination of the RMRSs of its daughters, according to semantic constraints, which apply in parallel with syntactic constraints. In each composition step, the RMRSs of the daughters are combined according to semantic composition rules that define the semantic representation of thephrase, cf. (Copestake et al., 2005). Following the scaffolding of the syntactic structure in this way finally yields the semantics of the sentence.</Paragraph> <Paragraph position="1"> For the present task, the input to semantics construction is a dependency structure. As established by work on Glue Semantics (Dalrymple, 1999), semantics construction from dependency structures can in similar ways proceed recursively, to deliver a semantic projection of the sentence. Note, however, that the resource-based approach of Glue Semantics leads to alternative derivations in case of scope ambiguities, whereas RMRS targets an underspecified semantic representation.</Paragraph> <Paragraph position="2"> For (R)MRS construction from dependencies we follow the algebra for semantics composition in Copestake et al. (2001). In HPSG implementations of this algebra, composition is triggered by phrasal configurations. Yet, the algebra is neutral with regard to the syntactic representation, and can be transposed to composition on the basis of dependency relations, much alike the Glue framework.</Paragraph> <Paragraph position="3"> However, the rewriting system we are using is not suited for a typical recursive application scheme: the rules are strictly ordered, and each rule simultaneously applies to all facts that satisfy the constraints in the LHS. That is, the RMRS composition cannot recursively follow the composition of dependents in the input structure. In section 3.2 we present a design of RMRS that is suited for this concurrent application scheme. Before, we briefly sketch the semantic algebra.</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 3.1 An Algebra for Semantic Construction </SectionTitle> <Paragraph position="0"> Copestake et al. (2001) define a semantic entity as a 5-tuple <s1,s2,s3,s4,s5> such that s1 is a hook, s2 is a (possibly empty) set of holes, s3 and s4 are bags of Elementary Predications (EPs) and handle constraints, respectively, and s5 is a set of equalities holding between variables. The hook is understood to represent the externalised part of the semantic entity as a pair of a handle and an index (a variable). It is used for reference in composition: Hooks of semantic arguments fill holes (or slots) of functors. Holes, in turn, record gaps in a semantic representation which remain to be filled. They, too, are pairs of a handle and an index; furthermore, holes are labelled with the grammatical function they bear syntactically. That is, the labels on holes serve two purposes: They help determine the appropriate operation of composition (see below), and they link the semantics to syntax.3 EPs (predicate applications) represent the binding of argument variables to their predicators. An EP h : r(a1,... ,an,sa1,... ,sam) consists of the EP's handle (or label) h, a relation r, and a list of zero or more variable arguments a1,... ,an, followed by zero or more scopal arguments sa1,... ,sam (denoting handles) of the relation. Finally, the bag 3Copestake et al. (2001) mention a third feature to be included in the hook as an externally visible variable, which they instantiate with the index of the controlled subject in equi constructions and which is also used to implement the semantics of predicative modification. However, this feature is not crucial given that the underlying syntactic structures represent dependencies rather than immediate dominance relations, and therefore make non-local information available locally. Likewise, the dependency scenario does not necessitate that modifiers externalise their ARG1 argument position (see section 3.3.3).</Paragraph> <Paragraph position="1"> of handle constraints (Hcons) contains conditions which (partially) specify the relations between scopal arguments and their scope, i.e.</Paragraph> <Paragraph position="2"> between the scopal argument and the handles that may fill the hole.</Paragraph> <Paragraph position="3"> The operators of semantic composition opl1,... ,oplk are drawn from S x S - S, where S is the set of all semantic entities, and l1,... ,lk correspond to the labels on holes: An operator opli defines the composition of a semantic head which has a hole labelled li with the argument filling that hole as follows: The result of opli(a1,a2) is undefined if a2 has no hole labelled li, otherwise:</Paragraph> <Paragraph position="5"> stands for the transitive closure.</Paragraph> </Section> <Section position="3" start_page="3" end_page="4" type="sub_section"> <SectionTitle> 3.2 RMRS Design </SectionTitle> <Paragraph position="0"> As mentioned earlier, the concurrent nature of rule application makes it impossible to proceed recursively in a scaffolding way, inherent to tree-based analyses, since the rules apply simultaneously to all structures. RMRS construction is therefore designed around one designated &quot;global&quot; RMRS. Instead of projecting and accumulating RMRS constraints step-wise by recursive composition, we directly insert the meaning descriptions into a single global RMRS. Otherwise, composition strictly follows the semantic operations of the algebra of Copestake et al. (2001): the composition rules only refer to the hook and slots of functors and arguments, to achieve the binding of argument variables and the encoding of scope constraints.</Paragraph> <Paragraph position="1"> Global and Lexical RMRSs. The global RMRS features a top handle (Top, usually the label of the matrix proposition), sets of EPs (Rels) and handle constraints (Hcons), respectively, as described in the algebra, and a set of Ing constraints.4 4Whenever two labels are related via an Ing (ingroup) constraint, they can be understood to be con-</Paragraph> <Paragraph position="3"> sulting lexical and global RMRS (bottom).</Paragraph> <Paragraph position="4"> In addition, every predicate in the dependency structure projects a lexical RMRS. Lexical RMRSs are semantic entities which consist of only a hook (i.e. a label and a variable), that makes the entity available for reference by subsequent (composition) rules, whereas the basic semantic content (which is determined on the basis of the predicate's category, and comprises, at least, EPs for the relation and the ARG0)5 is uniformly maintained in the bags of the global RMRS, yet still anchored to the lexical hook labels and variables. Figure 2 shows an example of a lexical RMRS with its links to the global RMRS, and a simplified version of the corresponding rule: The rule applies to predicates, i.e. pred features, with a value Pred. It introduces the lexical RMRS, i.e., the hook's label and variable, and adds the predicate's basic semantic content to the global RMRS, here the relation represented by Pred and the ARG0 variable, which is co-referent with the hook's variable.</Paragraph> <Paragraph position="5"> Composition. The semantic composition of arguments and functors is driven by the predicate arg(Fctor,N,Arg), where N encodes the argument position, Fctor and Arg are indices of functor and argument, respecjoined. This is relevant, e.g., for intersective modification, since a quantifier that outscopes the modified noun must also take scope over the modifier.</Paragraph> <Paragraph position="6"> 5The category information required to define the concrete basic semantics is not explicit in the dependencies, but is induced from the grammatical function borne by the predicate, as well as the presence or absence of certain morphological features (section 2.1). gered by arg(X,2,Arg) (top), referred lexical RMRSs and resulting global RMRS (bottom).</Paragraph> <Paragraph position="7"> tively.6 We interpret the arg-predicate as a slot/hole of the functor, such that the binding of the argument to the functor comes down to filling the hole, in the sense of the algebra described above: This is steered by the previously defined hooks of the two semantic entities, in that the matching rule introduces an EP with an attribute ARGN that is anchored to the externalised label in the functor's hook. The value of the attribute ARGN is the hook variable or hook label of the argument, depending on the category. A slightly more complicated example is shown in Figure 3, it involves the introduction of an additional proposition and a scope constraint. This rule performs the composition of a declarative finite clausal object (oc fin) with its verbal head. It assigns a proposition relation as the value of the verb's ARG2, which in turn has an ARG0 that takes scope over the hook label of the matrix verb in the object clause.</Paragraph> <Paragraph position="8"> In general, composition does not depend on the order of rule applications. That is, the fact that the system performs concurrent rule 6The arg predicates are introduced by a set of preprocessing rules which reconstruct the argument structure by referring to the local grammatical functions of a predicate and testing for (morphological) features typically borne by non-arguments. E.g., pron type( ,expl) identifies an expletive pronoun. ein anderer die Studentenmassen [. . .] zu versammeln wusste. - [. . .] when hardly anybody knew how to rally the crowd of students [. . .] as well as he did. (from corpus sentence # 8074). applications in a cascaded rule set is not problematic for semantics construction. Though, we have to ensure that every partial structure is assigned a hook, prior to the application of composition rules. This is ensured by stating the rules for lexical RMRSs first.</Paragraph> <Paragraph position="9"> Scope constraints. By introducing handle constraints, we define restrictions on the possible scoped readings. This is achieved by gradually adding qeq relations to the global Hcons set. Typically, this constraint relates a handle argument of a scopal element, e.g. a quantifier, and the label of the outscoped element. However, we cannot always fully predict the interaction among several scoping elements. This is the case, inter alia, for the modification of verbs by more than one scopal adverb. This ambiguity is modeled by means of a UDRT-style underspecification, that is, we leave the scope among the modifiers unspecified, but restrict each to outscope the verb handle.7</Paragraph> </Section> <Section position="4" start_page="4" end_page="6" type="sub_section"> <SectionTitle> 3.3 Selected Phenomena 3.3.1 Verbal complements. </SectionTitle> <Paragraph position="0"> The treebank distinguishes three kinds of verbal complements: infinitival phrases govered by a raising verb or by a control verb, and finite clausal arguments.</Paragraph> <Paragraph position="1"> Infinitival complements. Raising verbs do not assign an ARG1, and the infinitival argument is bound via an additional proposition which fills the ARG2 position of the governor.</Paragraph> <Paragraph position="2"> A handle constraint requires the proposition grammar of Crysmann (2003), and will also be adapted in the ERG (p.c. D. Flickinger).</Paragraph> <Paragraph position="3"> to take scope over the label of the infinitive. Modal verbs lend themselves most naturally to the same analysis, by virtue of identical annotation in the dependency triples.</Paragraph> <Paragraph position="4"> The implementation of RMRS for equi constructions relies on external lexicon resources, since the underlying dependency structures do not encode the coreference between the controlled subject and the external controller. Instead, the controlee is annotated as a null pronoun. In order to differentiate subject from object control, we enrich the transfer input with a list of static facts s_control(Pred) and o_control(Pred), respectively, which we extracted from the German HPSG grammar (Crysmann, 2003). The rules refer to these facts, and establish the appropriate bindings. If no information about coreference is available (due to sparse lexical data), the controlled subject appears in the RMRS as an unbound pronoun, as assumed in the syntactic structure. This is shown in Fig. 4. In the manual correction phase, these cases are corrected in the output RMRS, by introducing the missing control relation.</Paragraph> <Paragraph position="5"> Finite complements. For finite clausal complements we assume the basic analysis illustrated in section 3.2. But finite clauses are not necessarily declarative, they can also have interrogative meaning. In RMRS, this distinction is typically drawn in a type hierarchy, of which we assume a simplified version: message m rel prop ques m rel imp m rel prpstn m rel int m rel German embedded clauses are usually marked by one of the complementizers dass (that) their affection and love (from corpus sentence # 8345). or ob (whether), in initial position, but may occur without it, though less frequently. If a complementizer is present, this is recorded as comp_form(_,dass) (resp.</Paragraph> <Paragraph position="6"> comp_form(_,ob)), and we can fully determine the kind of message relation from its lexical form, i.e., prpstn m rel for declarative and int m rel for interrogative ones. In the absence of an overt complementizer, we could introduce the underspecified type prop ques m rel, but rather chose to use a default rule for the declarative reading prpstn m rel, which occurs far more often. This reduces the manual correction effort.</Paragraph> </Section> <Section position="5" start_page="6" end_page="7" type="sub_section"> <SectionTitle> 3.3.2 Coordination </SectionTitle> <Paragraph position="0"> The HPSG analysis of coordinate structures takes the form of a binary, right-branching structure. Since semantics construction in HPSG proceeds along this tree, an RMRS for a coordinate phraselikewise mirrors therecursive organisation of conjuncts in the syntax. Each partial coordination introduces an implicit conj rel, while the meaning contributed by the lexical conjunction is conveyed in the EP which spans the entire coordination.</Paragraph> <Paragraph position="1"> By contrast, the dependency structures preserve the flat LFG-analysis of coordination as a set of conjuncts. To overcome this discrepancy between source and target structures, we define specialised rules that mimic recursion in that they process the conjuncts from right to left, two at a time, thereby building the desired, binary-structure semantics for the coordination. Fig. 5 shows a sample output RMRS for coordinated NPs.8 Note that we posit the L/R hndl handle arguments to outscope each label that takes scope over the noun. This accounts for scope ambiguities among quantifiers and scopal adjectives.</Paragraph> <Paragraph position="2"> The algebra of Copestake et al. (2001) defines modifiers to externalise the variable of the ARG1. This, however, runs into problems when a construction needs to incorporate the inherent event variable (ARG0) of a modifier as an argument, as e.g. in recursive modification. In these cases, the ARG0 variable is not accessible as a hook for composition.</Paragraph> <Paragraph position="3"> In contrast, we identify the hook variable of modifiers with their ARG0 variable.</Paragraph> <Paragraph position="4"> This enables a uniform account of recursive intersective modification, since the inherent variable is legitimatly accessible via the hook, whereas the ARG1--like any other argument--is bound in a slot-filling operation.9 Thecorresponding rule and an example output RMRS are displayed in Fig. 6: Whenever the dependency relation mo is encountered, no matter what the exact pred value, the semantics contributed by the head of the 8The semantic contribution of the possessive pronouns has been neglected for ease of exposition. 9Similarly, this treatment of modification correctly accounts for modification in coordination structures, as in the NP ihrer munteren und farbenfreudigen Inszenierung - of her lively and colourful production (from corpus sentence # 9821).</Paragraph> <Paragraph position="5"> modifiers (top), resulting global RMRS for the recursive modification in liege [. . .] sehr hoch - [. . .] is at a very high level (from corpus sentence # 8893).</Paragraph> <Paragraph position="6"> dependency can be unambiguously identified as the argument of the semantic head. In fact, given that modifiers are in this way locally annotated as mo dependents in the triples, we can bind the ARG1 already when defining the lexical RMRS of the modifier.</Paragraph> </Section> </Section> <Section position="6" start_page="7" end_page="8" type="metho"> <SectionTitle> 4 The TIGER 700 RMRS Bank </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="7" end_page="8" type="sub_section"> <SectionTitle> 4.1 Design and methodology </SectionTitle> <Paragraph position="0"> Treebank Design. Our aim is to make available manually validated RMRS structures for 700 sentences of the TIGER-DB. Since the underlying data is contiguous newspaper text, we chose to select a block of consecutive sentences instead of a random sample. In this way, the treebank can be further extended by annotation of intersentential phenomena, such as co-reference or discourse relations.</Paragraph> <Paragraph position="1"> However, we have to accommodate for gaps, due to sentences for which there are reasonable functional-syntactic, but (currently) no sound semantic analyses. This problem arises for sentences involving, e.g., elliptical constructions, or else ungrammatical or fragmented sentences. We will include, but explicitly mark such sentences for which we can only obtain partial, but no fully sound semantic analyses. We will correspondingly extend the annotation set to yield a total of 700 correctly annotated sentences.</Paragraph> <Paragraph position="2"> The composition rules are designed to record their application by way of rule-specific identifiers. These may serve as a filtering means in case the analysis of certain phenomena as assumed in the treebank is incompatible with the grammar to be evaluated.</Paragraph> <Paragraph position="3"> Quality Control. For compilation of a manually controlled RMRS bank, we implemented a cascaded approach for quality control, with a feedback loop between (i) and (ii): (i) Manual sample-based error-detection.</Paragraph> <Paragraph position="4"> We are using the application markers of specific construction rules to select sample RMRSs for phenomenon-based inspection, as well as random sampling, in order to detect problems that can be corrected by adjustments of the automatic conversion procedure. (ii) Adjustment of conversion rules. The construction rules are modified to adjust errors detected in the automatic conversion process. Errors that cannot be covered by general rules need to be manually corrected in (iii).</Paragraph> <Paragraph position="5"> (iii) Manual control. Finally, we perform manual control and correction of errors that cannot be covered by automatic RMRS construction. Here, we mark and separate the phenomena that are not covered by the state-of-the-art in RMRS-based semantic theory.</Paragraph> <Paragraph position="6"> Viewing and editing support. The inspection of RMRSs is supported by converting the underlying XML format to HTML.</Paragraph> <Paragraph position="7"> RMRSs can thus be comfortably viewed in a browser, with highlighting of coreferences, display of agreement features, and links of EPs to the surface forms they originated from.</Paragraph> <Paragraph position="8"> Correction is supported by an XSLT-based interactive editing tool. It enables the user to specify which EPs, arguments or constraints are to be added/removed. With each change, the HTML representation is updated, so that the result is immediately visible for verification. The tool features a simple mechanism for version maintenance and retrieval, and separate storage for fully validated structures.</Paragraph> </Section> <Section position="2" start_page="8" end_page="8" type="sub_section"> <SectionTitle> 4.2 First Results </SectionTitle> <Paragraph position="0"> The transfer grammar comprises 74 rewrite rules for converting dependency structures to RMRS, plus 34 macros and templates.</Paragraph> <Paragraph position="1"> In a first validation experiment on the basis of 100 structures, we classified 20% of the RMRSs as involving errors that can be captured by adjustments of the automatic conversion rules (see step (ii) above), while 59% were fully correct.10 After improvement of the rules we evaluated the quality of the automatic construction procedure by validating the 700 sentences of the treebank. Average counts for this sample are 15.57 tokens/sentence, 15.92 dependencies/sentence. Table 1 (top) summarises the results. Of the 700 structures, 4% contained phenomena which we do not analyse at all. 40% required no correction at all. For the 59% that needed manual correction, the average count of units to be corrected per sentence was 3.75. The number of RMRSs that needed less than the average of corrections was 601, i.e. 85.86%. The time needed for inspection and correction was 5 mins 12 secs/sentence, calculated on the entire data set.</Paragraph> <Paragraph position="2"> Error analysis. A large portion of the errors did not concern the RMRS as such, but 10This evaluation did not perform correction of part-of-speech tags (cf. below, error analysis). simply the part-of-speech tags, encoded in the relation names. If part-of-speech errors are ignored, the number of correct RMRSs increases from 41% to 68%. The results of validation without part-of-speech correction, calculated on a third sample of 100 sentences, are given in Table 1 (bottom).</Paragraph> <Paragraph position="3"> Significant structural errors arise primarily in the context of modification. This is due to the TIGER annotation scheme. For example, certain adjunct clauses are embedded in main clauses as mo dependents, yet the embedding conjunction is, again, annotated as a modifier of the embedded clause. This leads to erroneous analyses. Refinement of the rules could considerably improve accuracy, but distinguishing these cases from ordinary modification is not always possible, due to missing category information.</Paragraph> <Paragraph position="4"> While modifiers turned out challenging in the mapping from dependencies to semantics, we did not observe many errors in the treatment of arguments: the rules that map dependents to semantic arg predicates yield a very precise argument structure.</Paragraph> </Section> </Section> class="xml-element"></Paper>