File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1201_concl.xml

Size: 3,062 bytes

Last Modified: 2025-10-06 13:53:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1201">
  <Title>A Language Independent Method for Question Classification</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> We have presented here experimental results of a language independent question classification method. The method is claimed to be language independent since the features used as attributes in the learning task can be extracted from the questions in a fully automated manner; we do not use semantic or syntactic information because otherwise we will be restricted to work on languages for which we do have parsers that can extract this information. We believe that this method can be successfully applied to other languages, such as Romanian, French, Portuguese and Catalan, that share the morphologic characteristics of the three languages  Comparing our results with those of previous works we can say that our method is promising. For instance Zhang and Sun Lee (Zhang and Lee, 2003) reported an accuracy of 90% for English questions, while Li and Roth (Li and Roth, 2002) achieved 98.8% accuracy. However, they used a training set of 5,500 questions and a test set of 500 questions, while in our experiments we used for training 405 for each 45 test questions (10-fold-cross-validation). When Zhang and Sun Lee used only 1,000 questions for training they achieved an accuracy of 80.2%.</Paragraph>
    <Paragraph position="1"> It is well known that machine learning algorithms perform better when a bigger training set is available, so it is expected that experiments of our method with a larger training set will provide improved results.</Paragraph>
    <Paragraph position="2"> As future work we plan to investigate active learning with SVM for this problem. Given that manually labelling questions is a very time consuming task, active learning can provide a faster approach to build accurate question classifiers.</Paragraph>
    <Paragraph position="3"> Instead of randomly selecting question instances to label manually and then provide them to the learner, the learner can analyze the unlabeled instances and select for labelling the instances that seem more relevant to the task.</Paragraph>
    <Paragraph position="4"> Another interesting line for future work is exploring the advantage of using mixed languages corpora lo learn question classification. The Romance languages, for instance, such as Italian, French and Spanish have stems in common.</Paragraph>
    <Paragraph position="5"> Then it is feasible that questions for several languages may help to train a classifier for a different language. The advantage of this idea will be the availability of larger corpora for languages for which a large enough corpus is not available, counting in favor of languages that are under-represented on the Internet. We could circumvent this lack of presence on the Internet for some languages by using information available on other, more well represented, languages.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML