File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-1003_intro.xml
Size: 3,049 bytes
Last Modified: 2025-10-06 14:02:17
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-1003"> <Title>Robust Reading: Identification and Tracing of Ambiguous Names</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Robust Reading </SectionTitle> <Paragraph position="0"> We consider reading a collection of documents D = fd1;d2;:::;dmg, each of which may contain mentions (i.e. real occurrences) of jTj types of entities. In the current evaluation we consider T = fPerson;Location;Organizationg.</Paragraph> <Paragraph position="1"> An entity refers to the &quot;real&quot; concept behind a mention and can be viewed as a unique identifier to a real-world object. Examples might be the person &quot;John F. Kennedy&quot; who became a president, &quot;White House&quot; - the residence of the US presidents, etc. E denotes the collection of all possible entities in the world and Ed = fedigld1 is the set of entities mentioned in document d. M denotes the collection of all possible mentions and Md = fmdignd1 is the set of mentions in document d. Mdi (1 * i * ld) is the set of mentions that refer to entity edi 2 Ed. For entity &quot;John F. Kennedy&quot;, the corresponding set of mentions in a document may contain &quot;Kennedy&quot;, &quot;J. F. Kennedy&quot; and &quot;President Kennedy&quot;. Among all mentions of an entity edi in document d we distinguish the one occurring first, rdi 2 Mdi , as the representative of edi . In practice, rdi is usually the longest mention of edi in the document as well, and other mentions are variations of it. Representatives are viewed as a typical representation of an entity mentioned in a specific time and place. For example, &quot;President J.F.Kennedy&quot; and &quot;Congressman John Kennedy&quot; may be representatives of &quot;John F. Kennedy&quot; in different documents. R denotes the collection of all possible representatives and Rd = frdigld1 Md is the set of representatives in document d. This way, each document is represented as the collection of its entities, representatives and mentions d = fEd;Rd;Mdg.</Paragraph> <Paragraph position="2"> Elements in the name space W = E[R[M each have an identifying writing (denoted as wrt(n) for n 2 W)1 and an ordered list of attributes, A = fa1;:::;apg, which depends on the entity type. Attributes used in the current evaluation include both internal attributes, such as, for People, ftitle, firstname, middlename, lastname, genderg as well as contextual attributes such as ftime, location, proper-namesg. Proper-names refer to a list of proper names that occur around the mention in the document. All attributes are of string value and the values could be missing or unknown2.</Paragraph> <Paragraph position="3"> The fundamental problem we address in robust reading is to decide what entities are mentioned in a given document (given the observed set Md) and what the most likely assignment of entity to each mention is.</Paragraph> </Section> class="xml-element"></Paper>