File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1112_intro.xml
Size: 4,831 bytes
Last Modified: 2025-10-06 14:03:35
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1112"> <Title>Exploring Correlation of Dependency Relation Paths for Answer Extraction</Title> <Section position="3" start_page="0" end_page="889" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Answer Extraction is one of basic modules in open domain Question Answering (QA). It is to further process relevant sentences extracted with Passage / Sentence Retrieval and pinpoint exact answers using more linguistic-motivated analysis. Since QA turns to find exact answers rather than text snippets in recent years, answer extraction becomes more and more crucial.</Paragraph> <Paragraph position="1"> Typically, answer extraction works in the following steps: * Recognize expected answer type of a question. null * Annotate relevant sentences with various types of named entities.</Paragraph> <Paragraph position="2"> * Regard the phrases annotated with the expected answer type as candidate answers.</Paragraph> <Paragraph position="3"> * Rank candidate answers.</Paragraph> <Paragraph position="4"> In the above work flow, answer extraction heavily relies on named entity recognition (NER). On one hand, NER reduces the number of candidate answers and eases answer ranking. On the other hand, the errors from NER directly degrade answer extraction performance. To our knowledge, most top ranked QA systems in TREC are supported by effective NER modules which may identify and classify more than 20 types of named entities (NE), such as abbreviation, music, movie, etc. However, developing such named entity recognizer is not trivial. Up to now, we haven't found any paper relevant to QA-specific NER development. So, it is hard to follow their work. In this paper, we just use a general MUC-based NER, which makes our results reproducible.</Paragraph> <Paragraph position="5"> A general MUC-based NER can't annotate a large number of NE classes. In this case, all noun phrases in sentences are regarded as candidate answers, which makes candidate answer sets much larger than those filtered by a well developed NER. The larger candidate answer sets result in the more difficult answer extraction. Previous methods working on surface word level, such as density-based ranking and pattern matching, may not perform well. Deeper linguistic analysis has to be conducted. This paper proposes a statistical method which exploring correlation of dependency relation paths to rank candidate answers. It is motivated by the observation that relations between proper answers and question phrases in candidate sentences are always similar to the corresponding relations in question. For example, the question &quot;What did Alfred Nobel invent?&quot; and the candidate sentence &quot;... in the will of Swedish industrialist Alfred Nobel, who invented dynamite.&quot; For each question, firstly, dependency relation paths are defined and extracted from the question and each of its candidate sentences. Secondly, the paths from the question and the candidate sentence are paired according to question phrase mapping score. Thirdly, correlation between two paths of each pair is calculated by employing Dynamic Time Warping algorithm. The input of the calculation is correlations between dependency relations, which are estimated from a set of training path pairs. Lastly, a Maximum Entropy-based ranking model is proposed to incorporate the path correlations and rank candidate answers. Furthermore, sentence supportive measure are presented according to correlations of relation paths among question phrases. It is applied to re-rank the candidate answers extracted from the different candidate sentences. Considering phrases may provide more accurate information than individual words, we extract dependency relations on phrase level instead of word level.</Paragraph> <Paragraph position="6"> The experiment on TREC questions shows that our method significantly outperforms a density-based method by 50% in MRR and three state-of-the-art syntactic-based methods by up to 20% in MRR. Furthermore, we classify questions by judging whether NER is used. We investigate how these methods perform on the two question sets. The results indicate that our method achieves better performance than the other syntactic-based methods on both question sets. Especially for more difficult questions, for which NER may not help, our method improves MRR by up to 31%.</Paragraph> <Paragraph position="7"> The paper is organized as follows. Section 2 discusses related work and clarifies what is new in this paper. Section 3 presents relation path correlation in detail. Section 4 and 5 discuss how to incorporate the correlations for answer ranking and re-ranking. Section 6 reports experiment and results. null</Paragraph> </Section> class="xml-element"></Paper>