XML Viewer - p06-1112

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-1112_evalu.xml
Size: 5,172 bytes
Last Modified: 2025-10-06 13:59:45
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1112">
  <Title>Exploring Correlation of Dependency Relation Paths for Answer Extraction</Title>
  <Section position="9" start_page="894" end_page="895" type="evalu">
    <SectionTitle>
6.2 Results
</SectionTitle>
    <Paragraph position="0"> Table 1 shows the overall performance of the five methods. The main observations from the table are as follows: 1. The methods SynPattern, StrictMatch, ApprMatch and CorME significantly improve MRR by 25.0%, 26.8%, 34.5% and 50.1% over the baseline method Density. The improvements may benefit from the various explorations of syntactic relations.</Paragraph>
    <Paragraph position="1"> 2. The performance of SynPattern (0.56MRR) and StrictMatch (0.57MRR) are close. SynPattern matches relation sequences of candidate answers with the predefined relation sequences extracted from a training data set, while StrictMatch matches relation sequences of candidate answers with the corresponding relation sequences in questions. But, both of them are based on the assumption that the more number of same relations between two sequences, the more similar the sequences are. Furthermore, since most TREC04 questions only have one or two phrases and many questions have similar expressions, SynPattern and StrictMatch don't make essential difference.</Paragraph>
    <Paragraph position="2"> 3. ApprMatch and CorME outperform SynPattern and StrictMatch by about 6.1% and 18.4% improvement in MRR. Strict matching often fails due to various relation representations in syntactic trees. However, such variations of syntactic relations may be captured by ApprMatch and CorME using a MI-based statistical method.</Paragraph>
    <Paragraph position="3"> 4. CorME achieves the better performance by 11.6% than ApprMatch. The improvement may benefit from two aspects: 1) ApprMatch assigns equal weights to the paths of a candidate answer and question phrases, while CorME estimate the weights according to phrase type and path length. After training a ME model, the weights are assigned, such as 5.72 for topic path ; 3.44 for constraints path and 1.76 for target path. 2) CorME incorporates approximate phrase mapping scores into path correlation measure.</Paragraph>
    <Paragraph position="4"> We further divide the questions into two classes according to whether NER is used in answer extraction. If the expected answer type of a question is unknown, such as &amp;quot;How did James Dean die?&amp;quot; or the type cannot be annotated by NER, such as &amp;quot;What ethnic group/race are Crip members?&amp;quot;, we put the question in Qw/oNE set, otherwise, we put it in QwNE. For the questions in Qw/oNE, we extract all basic noun phrases and verb phrases as candidate answers. Then, answer extraction module has to work on the larger candidate sets. Using a MUC-based NER, the recognized types include person, location, organization, date, time and money. In TREC04 questions, 123 questions are put in QwNE and 80 questions in Qw/oNE.</Paragraph>
    <Paragraph position="5">  We evaluate the performance on QwNE and Qw/oNE respectively, as shown in Table 2.</Paragraph>
    <Paragraph position="6"> The density-based method Density (0.11MRR) loses many questions in Qw/oNE, which indicates that using only surface word information is not sufficient for large candidate answer sets. On the contrary, SynPattern(0.36MRR), Strict-Pattern(0.36MRR), ApprMatch(0.42MRR) and CorME (0.47MRR) which capture syntactic information, perform much better than Density. Our method CorME outperforms the other syntactic-based methods on both QwNE and Qw/oNE. Es- null pecially for more difficult questions Qw/oNE, the improvements (up to 31% in MRR) are more obvious. It indicates that our method can be used to further enhance state-of-the-art QA systems even if they have a good NER.</Paragraph>
    <Paragraph position="7"> In addition, we evaluate component contributions of our method based on the main idea of relation path correlation. Three components are tested: 1. Appr. Mapping (Section 3.4). We replace approximate question phrase mapping with exact phrase mapping and withdraw the phrase mapping scores from path correlation measure. 2. Answer Ranking (Section 4). Instead of using ME model, we sum all of the path correlations to rank candidate answers, which is similar to (Cui et al., 2004). 3. Answer Re-ranking (Section 5). We disable this component and select top 5 answers according to answer ranking scores.</Paragraph>
    <Paragraph position="8">  The contribution of each component is evaluated with the overall performance degradation after it is removed or replaced. Some findings are concluded from Table 3. Performances degrade when replacing approximate phrase mapping or ME-based answer ranking, which indicates that both of them have positive effects on the systems. This may be also used to explain why CorME out-performs ApprMatch in Table 1. However, removing answer re-ranking doesn't affect much. Since short questions, such as &amp;quot;What does AARP stand for?&amp;quot;, frequently occur in TREC04, exploring the phrase relations for such questions isn't helpful.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML