File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/n04-1003_metho.xml

Size: 30,123 bytes

Last Modified: 2025-10-06 14:08:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-1003">
  <Title>Robust Reading: Identification and Tracing of Ambiguous Names</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 A Model of Document Generation
</SectionTitle>
    <Paragraph position="0"> We define a probability distribution over documents d = fEd;Rd;Mdg, by describing how documents are being generated. In its most general form the model has the following three components:  how entities (of different types) are distributed into a document and reflects their co-occurrence dependencies. (2) The number of entities in a document, size(Ed), and the number of mentions of each entity in Ed, size(Mdi ), need to be decided. The current evaluation makes the simplifying assumption that these numbers are determined uniformly over a small plausible range.</Paragraph>
    <Paragraph position="1"> (3) The appearance probability of a name generated (transformed) from its representative is modelled as a  product distribution over relational transformations of attribute values. This model captures the similarity between appearances of two names. In the current evaluation the same appearance model is used to calculate both the probability P(rje) that generates a representative r given an entity e and the probability P(mjr) that generates a mention m given a representative r. Attribute transformations are relational, in the sense that the distribution is over transformation types and independent of the specific names.</Paragraph>
    <Paragraph position="2"> Given these, a document d is assumed to be generated as follows (see Fig. 1): A set of size(Ed) entities Ed E is selected to appear in a document d, according to P(Ed). For each entity edi 2 Ed, a representative rdi 2 R is chosen according to P(rdijedi), generating Rd. Then mentions Mdi of an entity are generated from each representative rdi 2 Rd -- each mention mdj 2 Mdi is independently transformed from rdi according to the appearance probability P(mdjjrdi ). Assuming conditional independency between Md and Ed given Rd, the probability distribution over documents is therefore</Paragraph>
    <Paragraph position="4"> and the probability of the document collection D is:</Paragraph>
    <Paragraph position="6"> Given a mention m in a document d (Md is the set of observed mentions in d), the key inference problem is to determine the most likely entity e/m that corresponds to it. This is done by computing:</Paragraph>
    <Paragraph position="8"> where is the learned model's parameters. This gives the assignment of the most likely entity e/m for m.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Relaxations of the Model
</SectionTitle>
      <Paragraph position="0"> In order to simplify model estimation and to evaluate some assumptions, several relaxations are made to form three simpler probabilistic models.</Paragraph>
      <Paragraph position="1"> Model I: (the simplest model) The key relaxation here is in losing the notion of an &amp;quot;author&amp;quot; - rather than first choosing a representative for each document, mentions are generated independently and directly given an entity.</Paragraph>
      <Paragraph position="2"> That is, an entity ei is selected from E according to the prior probability P(ei); then its actual mention mi is selected according to P(mijei). Also, an entity is selected into a document independently of other entities. In this way, the probability of the whole document set can be computed simply as follows:</Paragraph>
      <Paragraph position="4"> and the inference problem for the most likely entity given</Paragraph>
      <Paragraph position="6"> Model II: (more expressive) The major relaxation made here is in assuming a simple model of choosing entities to appear in documents. Thus, in order to generate a document d, after we decide size(Ed) and fsize(Md1;size(Md2);:::g according to uniform distributions, each entity edi is selected into d independently of others according to P(edi). Next, the representative rdi for each entity edi is selected according to P(rdijedi) and for each representative the actual mentions are selected independently according to P(mdjjrdj). Here, we have individual documents along with representatives, and the distribution over documents is:</Paragraph>
      <Paragraph position="8"> after we ignore the size components (they do not influence inferences). The inference problem here is the same as in Equ. (2).</Paragraph>
      <Paragraph position="9"> Model III: This model performs the least relaxation. After deciding size(Ed) according to a uniform distribution, instead of assuming independency among entities which does not hold in reality (For example, &amp;quot;Gore&amp;quot; and &amp;quot;George. W. Bush&amp;quot; occur together frequently, but &amp;quot;Gore&amp;quot; and &amp;quot;Steve. Bush&amp;quot; do not), we select entities using a graph based algorithm: entities in E are viewed as nodes in a weighted directed graph with edges (i;j) labelled P(ejjei) representing the probability that entity ej is chosen into a document that contains entity ei. We distribute entities to Ed via a random walk on this graph starting from ed1 with a prior probability P(edi). Representatives and mentions are generated in the same way as in Model II. Therefore, a more general model for the distribution over documents is:</Paragraph>
      <Paragraph position="11"> The inference problem is the same as in Equ. (2).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Inference Algorithms
</SectionTitle>
      <Paragraph position="0"> The fundamental problem in robust reading can be solved as inference with the models: given a mention m, seek the most likely entity e 2 E for m according to Equ. (3) for Model I or Equ. (2) for Model II and III. Instead of all entities in the real world, E can be viewed without loss as the set of entities in a closed document collection that we use to train the model parameters and it is known after training. The inference algorithm for Model I (with time complexity O(jEj)) is simple and direct: just compute P(e;m) for each candidate entity e 2 E and then choose the one with the highest value. Due to exponential number of possible assignments of Ed;Rd to Md in Model II and III, precise inference is infeasible and approximate algorithms are therefore designed: In Model II, we adopt a two-step algorithm: First, we seek the representatives Rd for the mentions Md in document d by sequentially clustering the mentions according to the appearance model. The first mention in each group is chosen as the representative. Specifically, when considering a mention m 2 Md, P(mjr) is computed for each representative r that have already been created and a fixed threshold is then used to decide whether to create a new group for m or to add it to one of the existing groups with the largest P(mjr). In the second step, each representative rdi 2 Rd is assigned to its most likely entity according to e/ = argmaxe2EP(e)/P(rje). This algorithm has a time complexity of O((jMdj+jEj)/jMdj).</Paragraph>
      <Paragraph position="1"> Model III has a similar algorithm as Model II. The only difference is that we need to consider the global dependency between entities. Thus in the second step, instead of seeking an entity e for each representative r separately, we determine a set of entities Ed for Rd in a Hidden Markov Model with entities in E as hidden states and Rd as observations. The prior probabilities, the transitive probabilities and the observation probabilities are given by P(e), P(ejjei) and P(rje) respectively. Here we seek the most likely sequence of entities given those representatives in their appearing order using the Viterbi algorithm. The total time complexity is  the correct assignment of entities to mentions. r1;r2 are representatives.</Paragraph>
      <Paragraph position="2"> O(jMdj2 + jEj2 / jMdj). The jEj2 component can be simplified by filtering out unlikely entities for a representative according to their appearance similarity.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Discussion
</SectionTitle>
      <Paragraph position="0"> Besides different assumptions, some fundamental differences exist in inference with the models as well. In Model I, the entity of a mention is determined completely independently of other mentions, while in Model II, it relies on other mentions in the same document for clustering.</Paragraph>
      <Paragraph position="1"> In Model III, it is not only related to other mentions but to a global dependency over entities. The following conceptual example illustrates those differences as in Fig. 2.</Paragraph>
      <Paragraph position="2"> Example 3.1 Given E = fGeorge Bush, George W. Bush, Steve Bushg, documents d1, d2 and 5 mentions in them, and suppose the prior probability of entity &amp;quot;George W. Bush&amp;quot; is higher than those of the other two entities, the entity assignments to the five mentions in the models could be as follows: For Model I, mentions(e1) = `, mentions(e2) = fm1;m2;m5g and mentions(e3) = fm4g. The result is caused by the fact that a mention tends to be assigned to the entity with higher prior probability when the appearance similarity is not distinctive.</Paragraph>
      <Paragraph position="3"> For Model II, mentions(e1) = `, mentions(e2) = fm1;m2g and mentions(e3) = fm4;m5g. Local dependency (appearance similarity) between mentions inside each document enforces the constraint that they should refer to the same entity, like &amp;quot;Steve Bush&amp;quot; and &amp;quot;Bush&amp;quot; in d2. For Model III, mentions(e1) = fm1;m2g, mentions(e2) = `, mentions(e3) = fm4;m5g. With the help of global dependency between entities, for example, &amp;quot;George Bush&amp;quot; and &amp;quot;J. Quayle&amp;quot;, an entity can be distinguished from another one with a similar writing.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Other Tasks
</SectionTitle>
      <Paragraph position="0"> Other aspects of &amp;quot;Robust Reading&amp;quot; can be solved based on the above inference problem.</Paragraph>
      <Paragraph position="1">  Here it's assumed that we already know the possible mentions of e/ after training the models with D. Prominence: Given a name n 2 W, the most prominent entity for n is given by (P(e) is given by the prior distribution PE and P(nje) is given by the appearance model.):</Paragraph>
      <Paragraph position="3"/>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Learning the Models
</SectionTitle>
    <Paragraph position="0"> Confined by the labor of annotating data, we learn the probabilistic models in an unsupervised way given a collection of documents; that is, the system is not told during training whether two mentions represent the same entity. A greedy search algorithm modified after the standard EM algorithm (We call it Truncated EM algorithm) is adopted here to avoid complex computation.</Paragraph>
    <Paragraph position="1"> Given a set of documents D to be studied and the observed mentions Md in each document, this algorithm iteratively updates the model parameter (several underlying probabilistic distributions described before) and the structure (that is, Ed and Rd) of each document d. Different from the standard EM algorithm, in the E-step, it seeks the most likely Ed and Rd for each document rather than the expected assignment.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Truncated EM Algorithm
</SectionTitle>
      <Paragraph position="0"> The basic framework of the Truncated EM algorithm to  learn Model II and III is as follows: 1. In the initial (I-) step, an initial (Ed0;Rd0) is assigned to each document d by an initialization algorithm. After this step, we can assume that the documents are annotated with D0 = f(Ed0;Rd0;Md)g.</Paragraph>
      <Paragraph position="1"> 2. In the M-step, we seek the model parameter t+1 that maximizes P(Dtj ). Given the &amp;quot;labels&amp;quot; supplied in the previous I- or E-step, this amounts to the maximum likelihood estimation. (to be described in Sec. 4.3).</Paragraph>
      <Paragraph position="2"> 3. In the E-step, we seek (Edt+1;Rdt+1) for each document d that maximizes P(Dt+1j t+1) where Dt+1 = f(Edt+1;Rdt+1;Md)g. It's the same inference problem as in Sec. 3.2.</Paragraph>
      <Paragraph position="3"> 4. Stopping Criterion: If no increase is achieved over  P(Dtj t), the algorithm exits. Otherwise the algorithm will iterate over the M-step and E-step. The algorithm for Model I is similar to the above one, but much simpler in the sense that it does not have the notions of documents and representatives. So in the E-step we only seek the most likely entity e for each mention m 2 D, and this simplifies the parameter estimation in the M-step accordingly. It usually takes 3!10 iterations before the algorithms stop in our experiments.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Initialization
</SectionTitle>
      <Paragraph position="0"> The purpose of the initial step is to acquire an initial guess of document structures and the set of entities E in a closed collection of documents D. The hope is to find all entities without loss so duplicate entities are allowed. For all the models, we use the same algorithm: A local clustering is performed to group mentions inside each document: simple heuristics are applied to calculating the similarity between mentions; and pairs of mentions with similarity above a threshold are then clustered together. The first mention in each group is chosen as the representative (only in Model II and III) and an entity having the same writing with the representative is created for each cluster3. For all the models, the set of entities created in different documents become the global entity set E in the following M- and E-steps.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Estimating the Model Parameters
</SectionTitle>
      <Paragraph position="0"> In the learning process, assuming documents have already been annotated D = f(e;r;m)gn1 from previous Ior E-step, several underlying probability distributions of the relaxed models are estimated by maximum likelihood estimation in each M-step. The model parameters include a set of prior probabilities for entities PE, a set of transitive probabilities for entity pairs PEjE (only in Model III) and the appearance probabilities PWjW of each name in the name space W being transformed from another.</Paragraph>
      <Paragraph position="1"> + The prior distribution PE is modelled as a multi-nomial distribution. Given a set of labelled entity-mention pairs f(ei;mi)gn1,</Paragraph>
      <Paragraph position="3"> where freq(e) denotes the number of pairs containing entity e.</Paragraph>
      <Paragraph position="4"> + Given all the entities appearing in D, the transitive</Paragraph>
      <Paragraph position="6"> Here, the conditional probability between two real-world entities P(e2je1) is backed off to the one between the identifying writings of the two entities P(wrt(e2)jwrt(e1)) in the document set D to avoid 3Note that the performance of the initialization algorithm is 97:3% precision and 10:1% recall (measures are defined later.) sparsity problem. doc#(w1;w2;:::) denotes the number of documents having the co-occurrence of writings w1;w2;:::.</Paragraph>
      <Paragraph position="7"> + Appearance probability, the probability of one name being transformed from another, denoted as P(n2jn1) (n1;n2 2 W), is modelled as a product of the transformation probabilities over attribute values 4. The transformation probability for each attribute is further modelled as a multi-nomial distribution over a set of predetermined transformation types: TT =</Paragraph>
      <Paragraph position="9"> longing to the same entity type, the transformation probabilities PMjR, PRjE and PMjE, are all modelled as a product distribution (naive Bayes) over attributes:</Paragraph>
      <Paragraph position="11"> We manually collected typical and non-typical transformations for attributes such as titles, first names, last names, organizations and locations from multiple sources such as U.S. government census and online dictionaries. For other attributes like gender, only copy transformation is allowed. The maximum likelihood estimation of the transformation probability P(t;k) (t 2 TT;ak 2 A) from annotated representative-mention pairs f(r;m)gn1 is:</Paragraph>
      <Paragraph position="13"> vrk !t vmk denotes the transformation from attribute ak of r to that of m is of type t. Simple smoothing is performed here for unseen transformations.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Experimental Study
</SectionTitle>
    <Paragraph position="0"> Our experimental study focuses on (1) evaluating the three models on identifying three entity types (People, Locations, Organization); (2) comparing our induced similarity measure between names (the appearance model) with other similarity measures; (3) evaluating the contribution of the global nature of our model, and finally, (4) evaluating our models on name expansion and prominence ranking.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Methodology
</SectionTitle>
      <Paragraph position="0"> We randomly selected 300 documents from 1998-2000  of vk, for example, &amp;quot;Prof.&amp;quot; for &amp;quot;Professor&amp;quot;, &amp;quot;Andy&amp;quot; for &amp;quot;Andrew&amp;quot;; non-typical denotes a non-typical transformation. 2002). The documents were annotated by a named entity tagger for People, Locations and Organizations. The annotation was then corrected and each name mention was labelled with its corresponding entity by two annotators.</Paragraph>
      <Paragraph position="1"> In total, about 8;000 mentions of named entities which correspond to about 2;000 entities were labelled. The training process gets to see only the 300 documents and extracts attribute values for each mention. No supervision is supplied. These records are used to learn the probabilistic models.</Paragraph>
      <Paragraph position="2"> In the 64 million possible mention pairs, most are trivial non-matching one -- the appearances of the two mentions are very different. Therefore, direct evaluation over all those pairs always get almost 100% accuracy in our experiments. To avoid this, only the 130;000 pairs of matching mentions that correspond to the same entity are used to evaluate the performance of the models. Since the probabilistic models are learned in an unsupervised setting, testing can be viewed simply as the evaluation of the learned model, and is thus done on the same data. The same setting was used for all models and all comparison performed (see below).</Paragraph>
      <Paragraph position="3"> To evaluate the performance, we pair two mentions iff the learned model determined that they correspond to the same entity. The list of predicted pairs is then compared with the annotated pairs. We measure Precision (P) - Percentage of correctly predicted pairs, Recall</Paragraph>
      <Paragraph position="5"> Comparisons: The appearance model induces a &amp;quot;similarity&amp;quot; measure between names, which is estimated during the training process. In order to understand whether the behavior of the generative model is dominated by the quality of the induced pairwise similarity or by the global aspects (for example, inference with the aid of the document structure), we (1) replace this measure by two other &amp;quot;local&amp;quot; similarity measures, and (2) compare three possible decision mechanisms - pairwise classification, straightforward clustering over local similarity, and our global model. To obtain the similarity required by pairwise classification and clustering, we use this formula sima(n1;n2) = P(n1jn2) to convert the appearance probability described in Sec. 4.3 to it.</Paragraph>
      <Paragraph position="6"> The first similarity measure we use is a simple baseline approach: two names are similar iff they have identical writings (that is, simb(n1;n2) =</Paragraph>
      <Paragraph position="8"> one is a state-of-art similarity measure sims(n1;n2) 2 [0;1] for entity names (SoftTFIDF with Jaro-Winkler distance and = 0:9); it was ranked the best measure in a recent study (Cohen et al., 2003).</Paragraph>
      <Paragraph position="9"> Pairwise classification is done by pairing two mentions iff the similarity between them is above a fixed threshold. For Clustering, a graph-based clustering al- null ilarity measures. Three similarity measures are evaluated (rows) across three decision levels (columns). Performance is evaluated by the F1 values over the whole test set. The first number averages all entity types; numbers in parentheses represent People, Location and Organization respectively. gorithm is used. Two nodes in the graph are connected if the similarity between the corresponding mentions is above a threshold. In evaluation, any two mentions belonging to the same connected component are paired the same way as we did in Sec. 5.1 and all those pairs are then compared with the annotated pairs to calculate Precision, Recall and F1.</Paragraph>
      <Paragraph position="10"> Finally, we evaluate the baseline and the SoftTFIDF measure in the context of Model II, where the appearance model is replaced. We found that the probabilities directly converted from the SoftTFIDF similarity behave badly so we adopt this formula P(n1jn2) = e10C/sims(n1;n2)!1 e10!1 instead to acquire P(n1jn2) needed byModel II. Those probabilities are fixed as we estimate other model parameters in training.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5.2 Results
</SectionTitle>
    <Paragraph position="0"> The bottom line result is given in Tab. 1. All the similarity measures are compared in the context of the three levels of decisions - local decision (pairwise), clustering and our probabilistic model II. Only the best results in the experiments, achieved by trying different thresholds in pairwise classification and clustering, are shown.</Paragraph>
    <Paragraph position="1"> The behavior across rows indicates that, locally, our unsupervised learning based appearance model is about the same as the state-of-the-art SoftTFIDF similarity. The behavior across columns, though, shows the contribution of the global model, and that the local appearance model behaves better with it than a fixed similarity measure does. A second observation is that the Location appearance model is not as good as the one for People and Organization, probably due to the attribute transformation types chosen.</Paragraph>
    <Paragraph position="2"> Tab. 2 presents a more detailed evaluation of the different approaches on the entity identity task. All the three probabilistic models outperform the discriminatory approaches in this experiment, an indication of the effectiveness of the generative model.</Paragraph>
    <Paragraph position="3"> We note that although Model III is more expressive and reasonable than model II, it does not always perform better. Indeed, the global dependency among entities in Model III achieves two-folded outcomes: it achieves better precision, but may degrade the recall. The following example, taken from the corpus, illustrates the advantage of this model.</Paragraph>
    <Paragraph position="4">  examples. B, D, I, II and III denote the baseline model, the SoftTFIDF similarity model with clustering, and the three probabilistic models. We distinguish between pairs of mentions that are inside the same document (InDoc, 15% of the pairs) or not (InterDoc).</Paragraph>
    <Paragraph position="5"> Example 5.1 &amp;quot;Sherman Williams&amp;quot; is mentioned along with the baseball team &amp;quot;Dallas Cowboys&amp;quot; in 8 out of 300 documents, while &amp;quot;Jeff Williams&amp;quot; is mentioned along with &amp;quot;LA Dodgers&amp;quot; in two documents.</Paragraph>
    <Paragraph position="6"> In all models but Model III, &amp;quot;Jeff Williams&amp;quot; is judged to correspond to the same entity as &amp;quot;Sherman Williams&amp;quot; since their appearances are similar and the prior probability of the latter is higher than the former. Only Model III, due to the co-occurring dependency between &amp;quot;Jeff Williams&amp;quot; and &amp;quot;Dodgers&amp;quot;, identifies it as corresponding to an entity different from &amp;quot;Sherman Williams&amp;quot;.</Paragraph>
    <Paragraph position="7"> While this shows that Model III achieves better precision, the recall may go down. The reason is that global dependencies among entities enforces restrictions over possible grouping of similar mentions; in addition, with a limited document set, estimating this global dependency is inaccurate, especially when the entities themselves need to be found when training the model.</Paragraph>
    <Paragraph position="8"> Hard Cases: To analyze the experimental results further, we evaluated separately two types of harder cases of the entity identity task: (1) mentions with different writings that refer to the same entity; and (2) mentions with similar writings that refer to different entities. Model II and III outperform other models in those two cases as well.</Paragraph>
    <Paragraph position="9"> Tab. 3 presents F1 performance of different approaches in the first case. The best F1 value is only 73:1%, indicating that appearance similarity and global dependency are not sufficient to solve this problem when the writings are very different. Tab. 4 shows the performance of different approaches for disambiguating similar writings that correspond to different entities.</Paragraph>
    <Paragraph position="10"> Both these cases exhibit the difficulty of the problem, and that our approach provides a significant improvement over the state of the art similarity measure -- column D vs. column II in Tab. 4. It also shows that it is necessary to use contextual attributes of the names, which are not yet included in this evaluation.</Paragraph>
    <Paragraph position="11">  (F1). We filter out identical writings and report only on cases of different writings of the same entity. The test set contains 46;376 matching pairs (but in different writings) in the whole data set.</Paragraph>
    <Paragraph position="12">  entities(F1). The test set contains 39;837 pairs of mentions that associated with different entities in the 300 documents and have at least one token in common.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Other Tasks
</SectionTitle>
      <Paragraph position="0"> In the following experiments, we evaluate the generative model on other tasks related to robust reading. We present results only for Model II, the best one in previous experiments.</Paragraph>
      <Paragraph position="1"> Name Expansion: Given a mention m in a query, we find the most likely entity e 2 E for m using the inference algorithm as described in Sec. 3.2. All unique mentions of the entity in the documents are output as the expansions of m. The accuracy for a given mention is defined as the percentage of correct expansions output by the system.</Paragraph>
      <Paragraph position="2"> The average accuracy of name expansion of Model II is shown in Tab. 5. Here is an example: Query: Who is Gore ? Expansions: Vice President Al Gore, Al Gore, Gore.</Paragraph>
      <Paragraph position="3"> Prominence Ranking: We refer to Example 3.1 and use it to exemplify quantitatively how our system supports prominence ranking. Given a query name n, the ranking of the entities with regard to the value of P(e)/P(nje) (shown in brackets) by Model II is as follows.</Paragraph>
      <Paragraph position="4"> Input: George Bush  1. George Bush (0.0448) 2. George W. Bush (0.0058) Input: Bush 1. George W. Bush (0.0047) 2. George Bush (0.0015) 3. Steve Bush (0.0002)</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Conclusion and Future Work
</SectionTitle>
    <Paragraph position="0"> This paper presents an unsupervised learning approach to several aspects of the &amp;quot;robust reading&amp;quot; problem - cross-document identification and tracing of ambiguous names.</Paragraph>
    <Paragraph position="1"> We developed a model that describes the natural generation process of a document and the process of how  over 30 randomly chosen queries for each entity type.</Paragraph>
    <Paragraph position="2"> names are &amp;quot;sprinkled&amp;quot; into them, taking into account dependencies between entities across types and an &amp;quot;author&amp;quot; model. Several relaxations of this model were developed and studied experimentally, and compared with a state-of-the-art discriminative model that does not take a global view. The experiments exhibit encouraging results and the advantages of our model.</Paragraph>
    <Paragraph position="3"> This work is a preliminary exploration of the robust reading problem. There are several critical issues that our model can support, but were not included in this preliminary evaluation. Some of the issues that will be included in future steps are: (1) integration with more contextual information (like time and place) related to the target entities, both to support a better model and to allow temporal tracing of entities; (2) studying an incremental approach of training the model; that is, when a new document is observed, coming, how to update existing model parameters ? (3) integration of this work with other aspects of general coreference resolution (e.g., other terms like pronouns that refer to an entity) and named entity recognition (which we now take as given); and (4) scalability issues in applying the system to large corpora.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML