File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1043_intro.xml
Size: 6,244 bytes
Last Modified: 2025-10-06 14:00:45
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1043"> <Title>References</Title> <Section position="3" start_page="0" end_page="293" type="intro"> <SectionTitle> 2 The NLP Techniques </SectionTitle> <Paragraph position="0"> Surprising quality for open-donlain textual Q/A can be achieved when several lightweight knowledge-based NLP techniques eomt)lenmnt mostly shallow, surface-based approaches. The processing imposed by Q/A systems must be distinguished, oi1 the one band, from IR techniques, that locate sets of doc- null uments ('ontaining the required information, based on keywords tech, niques. Q/A systems are presented with natural language questions, far richer in semantics than a set of keywords eventually st, ru('flured around some, oi)erators. Ihnthernmre, the outtlut of Q/A systems is either the actual answer identified in a text or small text; fragments containing the answer. This eliminates the user's trouble of tinding the required inibrnlation in sometime.s large sets of retrieved do(-uments. Op(m-donmin Q/A systems must also l)e distinguished, on the other hand, h'om IE syst(;ms that model the inforlnation need through (latal)as(; t(;mt)lates , thus less naturally than a textual answer. Moreovei', open-domain IE is still ditli(:ult to achieve, beeause its linguistic t)atCern re(:ognition relies on domain-dependent lexico-semantie knowledge.</Paragraph> <Paragraph position="1"> To t)e able to satisf~y tile ol)en-donmin constraints, textual Q/A systems replace the linguistic pattern matching capabilities of IE systems with methods that rely (m the recognitioil of tile question type and of the e.'rpectcd answer type. Generally, this information is available by accessing a classification based on the question stem (i.e. what, how much, who) and the head of the first nOml phrase of the questi(m. Question 1)rocessing also includes the identification of the question keywords. Empirical methods, based on a set of ordered heuristics ot)erating on the phrasal parse of the question, extract keywords that are passed to the search engine. The overall precision of tile Q/A system depends also on th(, recognition of the question focus, since the answer extraction, suet:ceding the IR phase, is centered around the question focus. Unl})rtmmtely, eml)irical ninthods fl)r t'oeus recognition are hard to develop without the availability of richer semantic knowledge.</Paragraph> <Paragraph position="2"> S1)eeial requir(nnents are set Oil the documeid; pro(:essing COml)Olmnt of a Q/A system. To speed-u l) the answer extraction, the search engine returns only those 1)aragrai)hs from a document that contain all queried keywords. The paragraphs are ordered to ('.xtra(:ted whenever the question topic and the m> swer tyI)e are recognized iil a 1)aragraph. Thereafl;er the answers :/1(; scored 1)ased on several bag-of-words hem'isties. Throughout all this 1)roeessing, the NLP te(:hniques are limited to (21) named entity recognition; (b) semantic classification of the question tyt)e, l/ased oil information 1)rovided by an off-line question taxononly 21.i1(t senmntic class intbrmation available from WordNet (Felll)mml 1998); mid (c) phrasal parsing produced by enhancing Brill's part-of-sl)eech tagger with some rules tbr phrase tbrmation.</Paragraph> <Paragraph position="3"> Ilowever simt/le, this technology surl)asses 75% precision on trivia questions, as posed in the TREC-8 (:ompetition (of. (Moldovan et al.1999)). An impressive improvenmnt of 14% is achieved when more knowledge-intensive NLP techniques are ai)plied a.t both question and answer processing level. Figure 1 illustrates the architecture of a system that has enhanced Q/A performance.</Paragraph> <Paragraph position="4"> As represented in Figure 1, all three modules of the Q/A system preserve the shallow processing eomi/onents that determine good t)erformanee. In t;t1(', Quest, ion Processing module, the Question Class re(:ognizer, working against a taxonomy of questions, still constitutes the central processing that takes place at this stage. However, a far richer representation of the quest;ion classes is employed. To be able to classify against the new question taxonomy each question is first flflly parsed and transfommd into a semantic representation that captures all relationships between I)hrase heads.</Paragraph> <Paragraph position="5"> The recognition of the question class is based on the comparison of the question smnantic representation with the semantic representation of the nodes from tlm question taxonomy. Taxonomy nodes encode also the answer type, the question focus and the semantic class of question keywords. Multiple sets of keywords are generated based on their semantic class, all pertaining to the stone original question. This thature enables the search engine to retrieve multiple sets of documents, pertaining to multit)le sets of answers, that are extracted, combined and ranked based on several heuristics, reported in (Moklovan et a1.1999). This process of obtaining multiple sets of answers increases l;he likelihood of finding the correct answer.</Paragraph> <Paragraph position="6"> However, the big boost in the precision of the knowledge-based Q/A system is provided by the option of enabling the justification of the extracted answer. All extracted answers are parsed and transformed in semantic representations. Thereafter, both semantic transformations for questions and answers are translated into logic forms and presented to a simplified theoreln prover. The proof backchains Dora the question to the answer, its trace generating a justification. The prover may access a set of abduction rules that relax the justification process. Whenever an answer cmmot l)e 1)roven, it is discarded. This option solves multiple situations when the correct answer is not ranked as the first return, due to stronger surface-text-based indicators in some other answers, which unfortunately are not correct.</Paragraph> <Paragraph position="7"> This architecture allows for simple integration of semantic and axiomatic knowledge sources in a Q/A system and determines efficient interaction of text-surface-based and knowledge-based NLP techniques.</Paragraph> </Section> class="xml-element"></Paper>