File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-1038_evalu.xml

Size: 11,890 bytes

Last Modified: 2025-10-06 13:59:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1038">
  <Title>Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text</Title>
  <Section position="6" start_page="298" end_page="301" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="298" end_page="300" type="sub_section">
      <SectionTitle>
5.1 Data
</SectionTitle>
      <Paragraph position="0"> We sampled 1127 paragraphs from 271 articles from  tal of 4701 relation instances. In addition to a large set of person-to-person relations, we also included links between people and organizations, as well as biographical facts such as birthday and jobTitle. In all, there are 53 labels in the training data (Table 1). We sample articles that result in a high density of interesting relations by choosing, for example, a collection of related family members and associates. Figure 3 shows a small example of the type of connections in the data. We then split the data into training and testing sets (70-30 split), attempting to separate the entities into connected components. For example, all Bush family members were placed in the training set, while all Kennedy family members were placed in the testing set. While there are still occasional paths connecting entities in the training set to those in the test set, we believe this methodology reflects a typical real-world scenario in which we would like to extend an existing database to a different, but slightly related, domain.</Paragraph>
      <Paragraph position="1"> The structure of the Wikipedia articles somewhat simplifies the extraction task, since important entities are hyper-linked within the text. This provides an automated way to detect entities in the text, although these entities are not classified by type. This also allows us to easily construct database queries, since we can reason at the entity level, rather than the token level. (Although, see Sarawagi and Cohen (2004) for extensions of CRFs that model the entity length distribution.) The results we report here are constrained to predict relations only for hyper-linked entities. Note that despite this property, we still desire to use a sequence model to capture the dependencies between adjacent labels.</Paragraph>
      <Paragraph position="2"> We use the MALLET CRF implementation (Mc-Callum, 2002) with the default regularization parameters. null Basedoninitialexperiments,werestrictrelational path features to length two or three. Paths of length one will learn trivial paths and can lead to overfitting. Paths longer than three can increase computationalcostswithoutaddingmuchnewinformation. null In addition to the relational pattern features described in Section 4, the list of local features includes context words (such as the token identity within a 6 word window of the target token), lexicons (such as whether a token appears in a list of cities, people, or companies), regular expressions (such as whether the token is capitalized or contains digits or punctuation), part-of-speech (predicted by a CRF that was trained separately for part of speech tagging), prefix/suffix (such as whether a word ends in -ed or begins with ch-), and offset conjunctions (combinations of adjacent features within a window of size six).</Paragraph>
    </Section>
    <Section position="2" start_page="300" end_page="300" type="sub_section">
      <SectionTitle>
5.2 Extraction Results
</SectionTitle>
      <Paragraph position="0"> We evaluate performance by calculating the precision (P) and recall (R) of extracted relations, as well as the F1 measure, which is the harmonic mean of precision and recall.</Paragraph>
      <Paragraph position="1"> CRF0 is the conditional random field constructed without relational features. Results for CRF0 are displayed in the second column of Table 2. ME is a maximum entropy classifier trained on the same feature set as CRF0. The difference between these two models is that CRF0 models the dependence of relations that appear consecutively in the text. The superior performance of CRF0 suggests that this dependence is important to capture.</Paragraph>
      <Paragraph position="2"> The remaining models incorporate the relational patterns described in Section 4. We compare three different confidence thresholds for the construction of the initial testing database, as described in Section 4.2. CRFr uses no threshold, while CRFr0.9 andCRFr0.5restrictthedatabasetoextractionswith confidence greater than 0.9 and 0.5, respectively.</Paragraph>
      <Paragraph position="3"> As shown by comparing CRF0 and CRFr in Table 2, the relational features constructed from the database with no confidence threshold provides a considerable boost in recall (reducing error by 7%), at the cost of a decrease in precision. Here we see the effect of making fallacious inferences on a noisy database.</Paragraph>
      <Paragraph position="4"> In column four, we see the opposite effect for the overly conservative threshold of CRFr0.9. Here, precision improves slightly over CRF0, and considerably over CRFr (12% error reduction), but this is accompanied by a drop in recall (8% reduction).</Paragraph>
      <Paragraph position="5"> Finally, in column five, a confidence of 0.5 results in the best F1 measure (a 3.5% error reduction over CRF0). CRFr0.5 alsoobtainsbetterrecallandprecision than CRF0, reducing recall error by 3.6%, precision error by 2.5%.</Paragraph>
      <Paragraph position="6"> Comparing the performance on different relation types, we find that the biggest increase from CRF0 to CRFr0.5 is on the memberOf relation, for which the F1 score improves from 0.4211 to 0.6093. We conjecture that the reason for this is that the patterns mostusefulforthememberOflabelcontainrelations that are well-detected by the first-pass CRF. Also,  thelocallanguagecontextseemsinadequatetoproperly extract this relation, given the low performance of CRF0.</Paragraph>
      <Paragraph position="7"> To better gauge how much relational pattern features are affected by errors in the database, we run two additional experiments for which the relational features are fixed to be correct. That is, imagine that we construct a database from the true labeling of the testingdata, andcreatetherelationalpatternfeatures from this database. Note that this does not trivialize the problem, since there are no relational path features of length one (e.g., if X is the wife of Y, there will be no feature indicating this).</Paragraph>
      <Paragraph position="8"> We construct two experiments under this scheme, one where the entire test database is used (CRFt), and another where only half the relations are included in the test database, selected uniformly at random (CRFt0.5).</Paragraph>
      <Paragraph position="9"> Column six shows the improvements enabled by using the complete testing database. More interestingly, column seven shows that even with only half the database accurately known, performance improves considerably over both CRF and CRFr0.5.</Paragraph>
      <Paragraph position="10"> ArealisticscenarioforCRFt0.5isasemi-automated system, in which apartially-filled databaseis usedto bootstrap extraction.</Paragraph>
    </Section>
    <Section position="3" start_page="300" end_page="301" type="sub_section">
      <SectionTitle>
5.3 Mining Results
</SectionTitle>
      <Paragraph position="0"> Comparing the impact of discovered patterns on extraction is a way to objectively measure mining performance. We now give a brief subjective evaluation of the learned patterns. By examining relational patterns with high weights for a particular label, we can glean some regularities from our dataset. Examples of such patterns are in Table 3.</Paragraph>
      <Paragraph position="1">  terns.</Paragraph>
      <Paragraph position="2"> Fromthefamilialrelationsinourtrainingdata,we are able to discover many equivalences for mothers, cousins, grandfathers, and husbands. In addition to these high precision patterns, the system also generates interesting, low precision patterns. Row 3-7 of Table 3 can be summarized by the following generalizations: friends tend to be classmates; children of alumni often attend the same school as their parents; a boss' child often becomes the boss; grandchildren are often members of the same organizations as their grandparents; and rivals of a person from one political party are often rivals of other members of the same political party. While many of these patterns reflectthehighconcentrationofpoliticalentitiesand familial relations in our training database, many will have applicability across domains.</Paragraph>
    </Section>
    <Section position="4" start_page="301" end_page="301" type="sub_section">
      <SectionTitle>
5.4 Implicit Relations
</SectionTitle>
      <Paragraph position="0"> It is difficult to measure system performance on implicit relations, since our labeled data does not distinguish between explicit and implicit relations. Additionally, accurately labeling all implicit relations is challenging even for a human annotator.</Paragraph>
      <Paragraph position="1"> We perform a simple exploratory analysis to determine how relational patterns can help discover implicit relations. We construct a small set of synthetic sentences for which CRF0 successfully extracts relations using contextual features. We then add sentences with slightly more ambiguous languageandmeasurewhetherCRFr canovercomethis ambiguity using relational pattern features.</Paragraph>
      <Paragraph position="2"> For example, we create an article about an entity named &amp;quot;Bob Smith&amp;quot; that includes the sentences &amp;quot;His brother, Bill Smith, was a biologist&amp;quot; and &amp;quot;His companion, Bill Smith, was a biologist.&amp;quot; CRF0 successfully returns the brother relation in the first sentence, but not the second. After a fact is added to the database that says Bob and Bill have a brother in commonnamedJohn, CRFr isabletocorrectlylabel the second sentence in spite of the ambiguous word &amp;quot;companion,&amp;quot; because CRF0 has a highly-weighted relational pattern feature for brother.</Paragraph>
      <Paragraph position="3"> Similar behavior is observed for low precision patterns like &amp;quot;associates tend to win the same awards.&amp;quot; A synthetic article for the entity &amp;quot;Tom Jones&amp;quot; contains the sentences &amp;quot;He was awarded the Pulitzer Prize in 1998&amp;quot; and &amp;quot;Tom got the Pulitzer Prize in 1998.&amp;quot; Because CRF0 is highly-reliant on the presence of the verb &amp;quot;awarded&amp;quot; or &amp;quot;won&amp;quot; to indicate a prize fact, it fails to label the second sentence correctly. Afterthedatabaseisaugmentedtoinclude thefactthatTom'sassociateJillreceivedthePulitzer Prize, CRFr labels the second sentence correctly.</Paragraph>
      <Paragraph position="4"> However, we also observed that CRFr still requires some contextual clues to extract implicit relations. For example, if the Tom Jones article instead contains the sentence &amp;quot;The Pulitzer Prize was awarded to him in 1998,&amp;quot; neither CRF labels the prize fact correctly, since this passive construction is rarely seen in the training data.</Paragraph>
      <Paragraph position="5"> We conclude from this brief analysis that relational patterns used by CRFr can help extract implicit relations when (1) the database contains accurate relational information, and (2) the sentence contains limited contextual clues. Since relational patterns are treated only as additional features by CRFr, they are generally not powerful enough to overcome a complete absence of contextual clues.</Paragraph>
      <Paragraph position="6"> Fromthisperspective,relationalpatternscanbeseen as enhancing the signal from contextual clues. This differs from deterministically applying learned rules independent of context, which may boost recall at the cost of precision.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML