File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0504_intro.xml
Size: 5,172 bytes
Last Modified: 2025-10-06 14:03:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0504"> <Title>Ontology Population from Textual Mentions: Task Definition and Benchmark</Title> <Section position="3" start_page="0" end_page="26" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Mentions are portions of text which refer to entities null . As an example, given a particular textual context, both the mentions &quot;George W. Bush&quot; and &quot;the U.S. President.&quot; refer to the same entity, i.e. a particular instance of Person whose first name is &quot;George&quot;, whose middle initial is &quot;W.&quot;, whose family name is &quot;Bush&quot; and whose role is &quot;U.S. President&quot;.</Paragraph> <Paragraph position="1"> In this paper we propose and investigate Ontology Population from Textual Mentions (OPTM), a sub-task of Ontology Learning and Population The terms &quot;mention&quot; and &quot;entity&quot; have been introduced within the ACE Program (Linguistic Data Consortium, 2004). &quot;Mentions&quot; are equivalent to &quot;referring expressions&quot; and &quot;entities&quot; are equivalent to &quot;referents&quot;, as widely used in computational linguistics. In this paper, we use italics for &quot;mentions&quot; and small caps for ENTITY and ENTITY_ATTRIBUTE.</Paragraph> <Paragraph position="2"> (OLP) from text where we assume that mentions for several kinds of entities (e.g. PERSON, ORGANIZATION, LOCATION, GEO-POLITICAL _ENTITY) are already extracted from a document collection.</Paragraph> <Paragraph position="3"> We assume an ontology with a set of classes is classified into a class c i in C, the OPTM task is defined in three steps: Recognition and Classification of Entity Attributes, Normalization, and Resolution of inter-text Entity Coreference. null (i) Recognition and Classification of Entity Attributes (RCEA). The textual material expressed in a mention is extracted and distributed along the attribute-value pairs already defined for the class c i of the mention; as an example, given the PERSON mention &quot;U.S. President Bush&quot;, we expect that the attribute LAST_NAME is filled with the value &quot;Bush&quot; and the attribute ROLE is filled with the value &quot;U.S. President&quot;. Note that fillers, at this step, are still portions of text.</Paragraph> <Paragraph position="4"> (ii) Normalization. The textual material extracted at step (i) is assigned to concepts and relations already defined in the ontology; for example, the entity BUSH is created as an instance of COUNTRY_PRESIDENT, and an instance of the relation PRESIDENT_OF is created between BUSH and U.S.A. At this step different instances are created for co-referring mentions.</Paragraph> <Paragraph position="5"> (iii) Resolution of inter-text Entity Co-reference (REC). Each mention m j has to be assigned to a single individual entity belonging to a class in C. For example, we recognize that the instances created at step (i) for &quot;U.S. President Bush&quot; and &quot;George W. Bush&quot; actually refer to the same entity. In this paper we address steps (i) and (iii), while step (ii) is work in progress. The input of the OPTM task consists of classified mentions and the output consists of individual entities filled with textual material (i.e. there is no normalization) with their co-reference relations. The focus is on the definition of the task and on an empirical analysis of the aspects that determine its complexity, rather than on approaches and methods for the automatic solution of OPTM.</Paragraph> <Paragraph position="6"> There are several advantages of OPTM which make it appealing for OLP. First, mentions provide an obvious simplification with respect to the more general Ontology Population from text (cf. Buitelaar et al. 2005); in particular, mentions are well defined and there are systems for automatic mention recognition. Although there is no univocally accepted definition for the OP task, a useful approximation has been suggested by (Bontcheva and Cunningham, 2005) as Ontology Driven Information Extraction with the goal of extracting and classifying instances of concepts and relations defined in a Ontology, in place of filling a template. A similar task has been approached in a variety of perspectives, including term clustering (Lin, 1998 and Almuhareb and Poesio, 2004) and term categorization (Avancini et al. 2003). A rather different task is Ontology Learning, where new concepts and relations are supposed to be acquired, with the consequence of changing the definition of the Ontology itself (Velardi et al. 2005). However, since mentions have been introduced as an evolution of the traditional Named Entity Recognition task (see Tanev and Magnini, 2006), they guarantee a reasonable level of difficulty, which makes OPTM challenging both for the Computational Linguistic side and the Knowledge Representation community. Second, there already exist annotated data with mentions, delivered under the</Paragraph> </Section> class="xml-element"></Paper>