File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-2026_metho.xml

Size: 5,222 bytes

Last Modified: 2025-10-06 14:10:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2026">
  <Title>Accurate Parsing of the Proposition Bank</Title>
  <Section position="3" start_page="101" end_page="102" type="metho">
    <SectionTitle>
2 The Data and the Extended Parser
</SectionTitle>
    <Paragraph position="0"> In this section we describe the augmentations to our base parsing models necessary to tackle the joint learning of parse tree and semantic role labels.</Paragraph>
    <Paragraph position="1"> PropBank encodes propositional information by adding a layer of argument structure annotation to the syntactic structures of the Penn Treebank (Marcus et al., 1993). Verbal predicates in the Penn Tree-bank (PTB) receive a label REL and their arguments are annotated with abstract semantic role labels A0-A5 or AA for those complements of the predicative verb that are considered arguments while those complements of the verb labelled with a semantic functional label in the original PTB receive the composite semantic role label AM-X, where X stands for labels such as LOC, TMP or ADV, for locative, temporal and adverbial modifiers respectively. Prop-Bank uses two levels of granularity in its annotation, at least conceptually. Arguments receiving labels A0-A5 or AA do not express consistent semantic roles and are specific to a verb, while arguments receiving an AM-X label are supposed to be adjuncts, and the roles they express are consistent across all verbs.</Paragraph>
    <Paragraph position="2"> To achieve the complex task of assigning semantic role labels while parsing, we use a family of state-of-the-art history-based statistical parsers, the Simple Synchrony Network (SSN) parsers (Henderson, 2003), which use a form of left-corner parse strategy to map parse trees to sequences of derivation steps. These parsers do not impose any a priori independence assumptions, but instead smooth their parameters by means of the novel SSN neural network architecture. This architecture is capable of inducing a finite history representation of an unbounded sequence of derivation steps, which we denote h(d1,...,di[?]1). The representation h(d1,...,di[?]1) is computed from a set f of hand-crafted features of the derivation move di[?]1, and from a finite set D of recent history representations h(d1,...,dj), where j &lt; i [?] 1. Because the history representation computed for the move i [?] 1 is included in the inputs to the computation of the representation for the next move i, virtually any information about the derivation history could flow from history representation to history representation and be used to estimate the probability of a derivation move. In our experiments, the set D of earlier history representations is modified to yield a model that is sensitive to regularities in structurally defined sequences of nodes bearing semantic role labels, within and across constituents. For more information on this technique to capture structural domains, see (Musillo and Merlo, 2005) where the technique was applied to function parsing. Given the hidden history representation h(d1,***,di[?]1) of a derivation, a normalized exponential output function is computed by the SSNs to estimate a probability distribution over the possible next derivation moves di.</Paragraph>
    <Paragraph position="3"> To exploit the intuition that semantic role labels are predictive of syntactic structure, we must pro- null vide semantic role information as early as possible to the parser. Extending a technique presented in (Klein and Manning, 2003) and adopted in (Merlo and Musillo, 2005) for function labels with state-of-the-art results, we split some part-of-speech tags into tags marked with AM-X semantic role labels.</Paragraph>
    <Paragraph position="4"> As a result, 240 new POS tags were introduced to partition the original tag set which consisted of 45 tags. Our augmented model has a total of 613 non-terminals to represent both the PTB and PropBank labels, instead of the 33 of the original SSN parser.</Paragraph>
    <Paragraph position="5"> The 580 newly introduced labels consist of a standard PTB label followed by one or more PropBank semantic roles, such as PP-AM-TMP or NP-A0-A1.</Paragraph>
    <Paragraph position="6"> These augmented tags and the new non-terminals are included in the set f, and will influence bottom-up projection of structure directly.</Paragraph>
    <Paragraph position="7"> These newly introduced fine-grained labels fragment our PropBank data. To alleviate this problem, we enlarge the set f with two additional binary features. One feature decides whether a given preterminal or nonterminal label is a semantic role label belonging to the set comprising the labels A0-A5 and AA. The other feature indicates if a given label is a semantic role label of type AM-X, or otherwise. These features allow the SSN to generalise in several ways. All the constituents bearing an A0-A5 and AA labels will have a common feature. The same will be true for all nodes bearing an AM-X label. Thus, the SSN can generalise across these two types of labels. Finally, all constituents that do not bear any label will now constitute a class, the class of the nodes for which these two features are false.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML