File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/h05-1083_intro.xml
Size: 3,748 bytes
Last Modified: 2025-10-06 14:02:56
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1083"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 660-667, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Multi-Lingual Coreference Resolution With Syntactic Features</Title> <Section position="2" start_page="0" end_page="660" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A coreference resolution system aims to group together mentions referring to the same entity, where a mention is an instance of reference to an object, and the collection of mentions referring to the same object in a document form an entity. In the following example: (I) John believes himself to be the best student.</Paragraph> <Paragraph position="1"> mentions are underlined. The three mentions John , himself , the best student are of type name, pronoun 1, and nominal, respectively. They form an entity since they all refer to the same person.</Paragraph> <Paragraph position="2"> Syntactic information plays an important role in coreference resolution. For example, the binding theory (Haegeman, 1994; Beatrice and Kroch, 2000) provides a good account of the constraints on the antecedent of English pronouns. The theory relies on syntactic parse trees to determine the governing category which de nes the scope 1 Pronoun in this paper refers to both anaphor and normal pronoun.</Paragraph> <Paragraph position="3"> of binding constraints. We will use the theory as a guideline to help us design features in a machine learning framework.</Paragraph> <Paragraph position="4"> Previous pronoun resolution work (Hobbs, 1976; Lappin and Leass, 1994; Ge et al., 1998; Stuckardt, 2001) explicitly utilized syntactic information before. But there are unique challenges in this study: (1) Syntactic information is extracted from parse trees automatically generated. This is possible because of the availability of statistical parsers, which can be trained on human-annotated tree-banks (Marcus et al., 1993; Xia et al., 2000; Maamouri and Bies, 2004) for multiple languages; (2) The binding theory is used as a guideline and syntactic structures are encoded as features in a maximum entropy coreference system; (3) The syntactic features are evaluated on three languages: Arabic, Chinese and English (one goal is to see if features motivated by the English language can help coreference resolution in other languages). All contrastive experiments are done on publicly-available data; (4) Our coreference system resolves coreferential relationships among all the annotated mentions, not just for pronouns.</Paragraph> <Paragraph position="5"> Using machine-generated parse trees eliminates the need of hand-labeled trees in a coreference system. However, it is a major challenge to extract useful information from these noisy parse trees. Our approach is encoding the structures contained in a parse tree into a set of computable features, each of which is associated with a weight automatically determined by a machine learning algorithm. This contrasts with the approach of extracting rules and assigning weights to these rules by hand (Lappin and Leass, 1994; Stuckardt, 2001). The advantage of our approach is robustness: if a particular structure is helpful, it will be assigned a high weight; if a feature is extracted from a highly noisy parse tree and is not informative in coreference resolution, it will be assigned a small weight. By avoiding writing rules, we automatically incorporate useful information into our model and at the same time limit the potentially negative impact from noisy parsing output.</Paragraph> </Section> class="xml-element"></Paper>