File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/e06-1052_evalu.xml

Size: 7,722 bytes

Last Modified: 2025-10-06 13:59:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1052">
  <Title>Investigating a Generic Paraphrase-based Approach for Relation Extraction</Title>
  <Section position="7" start_page="412" end_page="414" type="evalu">
    <SectionTitle>
6 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="412" end_page="413" type="sub_section">
      <SectionTitle>
6.1 Experimental Settings
</SectionTitle>
      <Paragraph position="0"> To acquire a set of entailing templates we first executed TEASE on the input template 'X subj- interact mod- with pcomp[?]n- Y', which corresponds to the &amp;quot;default&amp;quot; expression of the protein interaction 2We chose a dependency parser as it captures directly the relations between words; we use Minipar due to its speed.</Paragraph>
      <Paragraph position="1">  1. X bind to Y 7. X Y complex 13. X interaction with Y 2. X activate Y 8. X recognize Y 14. X trap Y 3. X stimulate Y 9. X block Y 15. X recruit Y 4. X couple to Y 10. X binding to Y 16. X associate with Y 5. interaction between X and Y 11. X Y interaction 17. X be linked to Y 6. X become trapped in Y 12. X attach to Y 18. X target Y  relation. TEASE learned 118 templates for this relation. Table 7 lists the top 18 learned templates that we considered as correct (out of the top 30 templates in TEASE output). We then extracted interacting protein pair candidates by applying the syntactic matcher to the 119 templates (the 118 learned plus the input template). Candidate pairs that do not consist of two proteins, as tagged in the input dataset, were filtered out (see Section 4.1; recall that our experiments were applied to the dataset of protein interactions, which isolates the RE task from the protein name recognition task).</Paragraph>
      <Paragraph position="2"> In a subsequent experiment we iteratively executed TEASE on the 5 top-ranked learned templates to acquire additional relevant templates. In total, we obtained 1233 templates that were likely to imply the original input relation. The syntactic  matcherwasthenreappliedtoextractcandidateinteracting protein pairs using all 1233 templates. We used the development set to tune a small setof10generichand-craftedtransformationrules that handle different syntactic variations. To handle transparent head nouns, which is the only phenomenon that demonstrates domain dependence, we extracted a set of the 5 most frequent transparent head patterns in the development set, e.g.</Paragraph>
      <Paragraph position="3"> 'fragment of X'.</Paragraph>
      <Paragraph position="4"> In order to compare (roughly) our performance with supervised methods applied to this dataset, as summarized in (Bunescu et al., 2005), we adopted their recall and precision measurement. Their scheme counts over distinct protein pairs per abstract,whichyields283interactingpairsinourtest null set and 418 in the development set.</Paragraph>
    </Section>
    <Section position="2" start_page="413" end_page="414" type="sub_section">
      <SectionTitle>
6.2 Manual Analysis of TEASE Recall
</SectionTitle>
      <Paragraph position="0"> experiment pairs instances input 39% 37% input + iterative 49% 48% input + iterative + morph 63% 62%  instances (out of 341) in the development set. Before evaluating the system as a whole we wanted to manually assess in isolation the coverage of TEASE output relative to all template instances that were manually annotated in the development set. We considered a template as covered if there is a TEASE output template that is equal to the manually annotated template or differs from it only by the syntactic phenomena described in Section 3 or due to some parsing errors. Counting these matches, we calculated the number of template instances and distinct interacting protein pairs that are covered by TEASE output.</Paragraph>
      <Paragraph position="1"> Table 8 presents the results of our analysis. The  1st line shows the coverage of the 119 templates learned by TEASE for the input template 'X interact with Y'. It is interesting to note that, though we aim to learn relevant templates for the specific domain, TEASE learned relevant templates also by finding anchor-sets of different domains that use the same jargon, such as particle physics.</Paragraph>
      <Paragraph position="2"> We next analyzed the contribution of the iterativelearningfortheadditional5templatestorecall null (2nd line in Table 8). With the additional learned templates, recallincreasedbyabout25%, showing the importance of using the iterative steps.</Paragraph>
      <Paragraph position="3"> Finally, when allowing matching between a TEASE template and a manually annotated template, even if one is based on a morphological derivation of the other (3rd line in Table 8), TEASE recall increased further by about 30%.</Paragraph>
      <Paragraph position="4"> We conclude that the potential recall of the current version of TEASE on the protein interaction dataset is about 60%. This indicates that significant coverage can be obtained using completely unsupervised learning from the web, as performed by TEASE. However, the upper bound for our current implemented system is only about 50% because our syntactic matching does not handle morphological derivations.</Paragraph>
    </Section>
    <Section position="3" start_page="414" end_page="414" type="sub_section">
      <SectionTitle>
6.3 System Results
</SectionTitle>
      <Paragraph position="0"> experiment recall precision F1 input 0.18 0.62 0.28 input + iterative 0.29 0.42 0.34  mentation is notably worse than the upper bound of the manual analysis because of two general setbacks of the current syntactic matcher: 1) parsing errors; 2) limited transformation rule coverage. First, the texts from the biology domain presented quite a challenge for the Minipar parser. For example, in the sentences containing the phrase 'X bind specifically to Y' the parser consistently attaches the PP 'to' to 'specifically' instead of to 'bind'. Thus, the template 'X bind to Y' cannot be directly matched.</Paragraph>
      <Paragraph position="1"> Second, we manually created a small number of transformation rules that handle various syntactic phenomena, since we aimed at generic domain independent rules. The most difficult phenomenon to model with transformation rules is transparent heads. For example, in &amp;quot;the dimerization of Prot1 interacts with Prot2&amp;quot;, the transparent head 'dimerization of X' is domain dependent. Transformation rules that handle such examples are difficult to acquire, unless a domain specific learning approach(eithersupervisedorunsupervised)isused. null Finally, we did not handle co-reference resolution in the current implementation.</Paragraph>
      <Paragraph position="2"> Bunescu et al. (2005) and Bunescu and Mooney (2005) approached the protein interaction RE task using both handcrafted rules and several supervised Machine Learning techniques, which utilize about 180 manually annotated abstracts for training. Our results are not directly comparable with theirs because they adopted 10-fold crossvalidation, while we had to divide the dataset into a development and a test set, but a rough comparison is possible. For the same 30% recall, the rule-based method achieved precision of 62% and the best supervised learning algorithm achieved precision of 73%. Comparing to these supervised and domain-specific rule-based approaches our system is noticeably weaker, yet provides useful results given that we supply very little domain specific information and acquire the paraphrasing templates in a fully unsupervised manner. Still, the matching models need considerable additional research in order to achieve the potential performance suggested by TEASE.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML