File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-2015_metho.xml
Size: 6,513 bytes
Last Modified: 2025-10-06 14:10:08
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-2015"> <Title>Semantic Role Labeling for Coreference Resolution</Title> <Section position="3" start_page="0" end_page="143" type="metho"> <SectionTitle> 2 Coreference Resolution Using SRL </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Corpora Used </SectionTitle> <Paragraph position="0"> The system was initially prototyped using the MUC-6 and MUC-7 data sets (Chinchor & Sundheim, 2003; Chinchor, 2001), using the standard partitioning of 30 texts for training and 20-30 texts for testing. Then, we developed and tested the system with the ACE 2003 Training Data corpus (Mitchell et al., 2003)1. Both the Newswire (NWIRE)andBroadcastNews(BNEWS)sections where split into 60-20-20% document-based partitions for training, development, and testing, and later per-partition merged (MERGED) for system evaluation. The distribution of coreference chains and referring expressions is given in Table 1.</Paragraph> </Section> <Section position="2" start_page="0" end_page="143" type="sub_section"> <SectionTitle> 2.2 Learning Algorithm </SectionTitle> <Paragraph position="0"> For learning coreference decisions, we used a Maximum Entropy (Berger et al., 1996) model.</Paragraph> <Paragraph position="1"> Coreference resolution is viewed as a binary classification task: given a pair of REs, the classifier has to decide whether they are coreferent or not.</Paragraph> <Paragraph position="2"> First, a set of pre-processing components includ-</Paragraph> </Section> </Section> <Section position="4" start_page="143" end_page="144" type="metho"> <SectionTitle> BNEWS NWIRE </SectionTitle> <Paragraph position="0"> #coref ch. #pron. #comm. nouns #prop. names #coref ch. #pron. #comm. nouns #prop. names ing a chunker and a named entity recognizer is applied to the text in order to identify the noun phrases, which are further taken as REs to be used for instance generation. Instances are created following Soon et al. (2001). During testing the classifier imposes a partitioning on the available REs by clustering each set of expressions labeled as coreferent into the same coreference chain.</Paragraph> <Section position="1" start_page="143" end_page="143" type="sub_section"> <SectionTitle> 2.3 Baseline System Features </SectionTitle> <Paragraph position="0"> Following Ng & Cardie (2002), our baseline system reimplements the Soon et al. (2001) system.</Paragraph> <Paragraph position="1"> The system uses 12 features. Given a pair of candidate referring expressions REi and REj the features are computed as follows2.</Paragraph> <Paragraph position="2"> STRING MATCH T if REi and REj have the same spelling, else F.</Paragraph> <Paragraph position="3"> ALIAS T if one RE is an alias of the other; else F.</Paragraph> <Paragraph position="4"> (b) Grammatical features I PRONOUN T if REi is a pronoun; else F. J PRONOUN T if REj is a pronoun; else F. J DEF T if REj starts with the; else F. J DEM T if REj starts with this, that, these, or those; else F.</Paragraph> <Paragraph position="5"> NUMBER T if both REi and REj agree in number; else F.</Paragraph> <Paragraph position="6"> GENDER U if REi or REj have an undefined gender. Else if they are both defined and agree T; else F.</Paragraph> <Paragraph position="7"> PROPER NAME T if both REi and REj are proper names; else F.</Paragraph> <Paragraph position="8"> APPOSITIVE T if REj is in apposition with REi; else F.</Paragraph> <Paragraph position="9"> (c) Semantic features WN CLASS U if REi or REj have an undefined WordNetsemanticclass. Elseiftheybothhave a defined one and it is the same T; else F. 2Possible values are U(nknown), T(rue) and F(alse). Note that in contrast to Ng & Cardie (2002) we classify ALIAS as a lexical feature, as it solely relies on string comparison and acronym string matching.</Paragraph> <Paragraph position="10"> (d) Distance features DISTANCE how many sentences REi and REj are apart.</Paragraph> </Section> <Section position="2" start_page="143" end_page="144" type="sub_section"> <SectionTitle> 2.4 Semantic Role Features </SectionTitle> <Paragraph position="0"> The baseline system employs only a limited amount of semantic knowledge. In particular, semantic information is limited to WordNet semantic class matching. Unfortunately, a simple Word-Net semantic class lookup exhibits problems such as coverage and sense disambiguation3, which make the WN CLASS feature very noisy. As a consequence, we propose in the following to enrich the semantic knowledge made available to the classifier by using SRL information.</Paragraph> <Paragraph position="1"> In our experiments we use the ASSERT parser (Pradhan et al., 2004), an SVM based semantic role tagger which uses a full syntactic analysis to automatically identify all verb predicates in a sentence together with their semantic arguments, which are output as PropBank arguments (Palmer et al., 2005). It is often the case that the semantic arguments output by the parser do not align with any of the previously identified noun phrases. In this case, we pass a semantic role labeltoaREonlyincasethetwophrasessharethe samehead. Labelshavetheform&quot;ARG1 pred1 ...</Paragraph> <Paragraph position="2"> ARGn predn&quot; for n semantic roles filled by a constituent, where each semantic argument label ARGi is always defined with respect to a predicate lemma predi. Given such level of semantic information available at the RE level, we introduce two new features4.</Paragraph> <Paragraph position="3"> I SEMROLE the semantic role argument-predicate pairs of REi.</Paragraph> <Paragraph position="4"> 3Following the system to be replicated, we simply mapped each RE to the first WordNet sense of the head noun. 4During prototyping we experimented unpairing the arguments from the predicates, which yielded worse results. This is supported by the PropBank arguments always being defined with respect to a target predicate. Binarizing the features -- i.e. do REi and REj have the same argument or predicate label with respect to their closest predicate? -- also gave worse results.</Paragraph> <Paragraph position="5"> J SEMROLE the semantic role argument-predicate pairs of REj.</Paragraph> <Paragraph position="6"> For the ACE 2003 data, 11,406 of 32,502 automaticallyextractednounphrasesweretaggedwith null 2,801 different argument-predicate pairs.</Paragraph> </Section> </Section> class="xml-element"></Paper>