File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-1038_intro.xml

Size: 2,029 bytes

Last Modified: 2025-10-06 14:03:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1038">
  <Title>Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text</Title>
  <Section position="2" start_page="0" end_page="296" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Consider these four sentences:  1. George W. Bush's father is George H. W. Bush. 2. GeorgeH.W.Bush'ssisterisNancyBushEllis. 3. Nancy Bush Ellis's son is John Prescott Ellis. 4. John Prescott Ellis analyzed George W. Bush's campaign.</Paragraph>
    <Paragraph position="1">  We would like to build an automated system to extract the set of relations shown in Figure 1.  State of the art extraction algorithms may be able to detect the son and sibling relations from local language clues. However, the cousin relation is only implied by the text and requires additional knowledge to be extracted. Specifically, the system requires knowledge of familial relation patterns. One could imagine a system that accepts such rules as input (e.g. cousin = father's sister's son) and applies them to extract implicit relations. However, exhaustively enumerating all possible rules can be tedious and incomplete. More importantly, many relational patterns unknown a priori may both improve extraction accuracy and uncover informative trends in the data (e.g. that children often adopt the religion of their parents). Indeed, the goal of data mining is to learn such patterns from database regularities. Since these patterns will not always hold, we would like to handle them probabilistically. We propose an integrated supervised machine learning method that learns both contextual and relational patterns to extract relations. In particular, we construct a linear-chain conditional random field (Lafferty et al., 2001; Sutton and McCallum, 2006) to extract relations from biographical texts while simultaneously discovering interesting relational patterns that improve extraction performance.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML