File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-0703_concl.xml

Size: 2,178 bytes

Last Modified: 2025-10-06 13:55:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0703">
  <Title>Question Pre-Processing in a QA System on Internet Discussion Groups</Title>
  <Section position="8" start_page="22" end_page="22" type="concl">
    <SectionTitle>
5 Conclusion and Future Work
</SectionTitle>
    <Paragraph position="0"> This paper proposes question pre-processing methods for a FQA-style QA system on discussion groups in the Internet. For a posting already existing or being submitted to a discussion group, garbage texts in it are removed first, and then different questions in it are identified so that they can be compared with other questions individually.</Paragraph>
    <Paragraph position="1"> An expanded list of garbage keywords is used to detect garbage texts. If there is a garbage keyword appearing in a sentence fragment and the fragment has a length shorter than a threshold corresponding to the class of the garbage keyword, the fragment will be judged as a garbage text. This method achieves 92.57% accuracy on the test set. It means that a small set is sufficient to collect all classes of garbage keywords.</Paragraph>
    <Paragraph position="2"> In question segmentation, sentence fragments in interrogative forms are considered as question fragments. Besides, repeated fragments are removed and fragments of the same question types are merged into one fragment. The overall accuracy is 85.87% on the test set.</Paragraph>
    <Paragraph position="3"> In the future, performance of a QA system with or without question pre-processing will be evaluated to verify its value.</Paragraph>
    <Paragraph position="4"> New methods to create the list of garbage keywords more robotically should be studied, as well as the automatic assignments of the length thresholds of classes of garbage keywords.</Paragraph>
    <Paragraph position="5"> New feature should be discovered in the future in order to segment questions more accurately.</Paragraph>
    <Paragraph position="6"> Although the strategies and the thresholds are developed according to experimental data in Chinese, we can see that many of them are language-independent or can be adapted with not too much effort.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML