File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/c04-1201_relat.xml

Size: 2,873 bytes

Last Modified: 2025-10-06 14:15:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1201">
  <Title>A Language Independent Method for Question Classification</Title>
  <Section position="3" start_page="0" end_page="0" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> Most approaches to question classification are based on handcrafted rules (Voorhees, 2001).</Paragraph>
    <Paragraph position="1"> It is not until recently that machine learning techniques are being used to tackle the problem of question classification. In (Zhang and Lee, 2003) they present a new method for question classification using Support Vector Machines. They compared accuracy of SVM against Nearest Neighbors, Naive Bayes, Decision Trees and Sparse Network of Winnows (SNoW), with SVM producing the best results.</Paragraph>
    <Paragraph position="2"> In their work, Zhang and Sun Lee improve accuracy by introducing a tree kernel function that allows to represent the syntactic structure of questions. Their experimental results show that SVM using this tree kernel function achieves an accuracy of 90%, however, a parser is needed in order to acquire the syntactic information.</Paragraph>
    <Paragraph position="3"> Li and Roth reported a hierarchical approach for question classification based on the SNoW learning architecture (Li and Roth, 2002). This hierarchical classifier discriminates among 5 coarse classes, which are then refined into 50 more specific classes. The learners are trained using lexical and syntactic features such as pos tags, chunks and head chunks together with two semantic features: named entities and semantically related words. They reported question classification accuracy of 98.80% for a coarse classification, using 5,500 instances for training. A different approach, used for Japanese question classification, is that of Suzuki et al.</Paragraph>
    <Paragraph position="4"> (Suzuki et al., 2003). They used SVM whith a new kernel function, called Hierarchical Directed Acyclic Graph, which allows the use of structured data. They experimented with 68 question types and compared performance of using bag-of-words against using more elaborated combinations of attributes, namely named entities and semantic information. Their best results, an accuracy of 94.8% at the first level of the hierarchy, were obtained when using SVM trained on bag-of-words together with named entities and semantic information.</Paragraph>
    <Paragraph position="5"> The idea of using the Internet in a QA system is not new. What is new, however, is that we are using the Internet to obtain values for features in our question classification process, as opposed to previous approaches where the redundancy of information available on the Internet has been used in the answer extraction process (Brill et al., 2002; Lin et al., 2002; Katz et al., 2003).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML