File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-0412_concl.xml

Size: 2,349 bytes

Last Modified: 2025-10-06 13:54:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0412">
  <Title>Non-Contiguous Word Sequences for Information Retrieval</Title>
  <Section position="9" start_page="0" end_page="0" type="concl">
    <SectionTitle>
7 Conclusions
</SectionTitle>
    <Paragraph position="0"> We have introduced a new type of phrases to the problem of information retrieval. We have developed and presented a method to use maximal frequent sequences in information retrieval. Using the INEX document collection, we compared it to a well-known technique of the state of the art. Our technique outperformed that of statistical phrases, known to be performing comparably to syntactical and linguistical phrases from the literature.</Paragraph>
    <Paragraph position="1"> These results are due to the allowance of a gap between words forming a sequence, offering a more realistic model of natural language. Furthermore, the number of phrases to index is rather small. A weak spot is the greedy algorithm to extract MFS. But many improvements are under way on this side, and the partitionjoin technique mentioned in subsection 4.1 already permits to extract good approximations efficiently.</Paragraph>
    <Paragraph position="2"> Our results confirm that the best improvements are obtained at the highest levels of recall. Therefore, MFS would be most useful in the case of exhaustive information needs. Cases where no relevant information should be missed, and 100% recall should be reached in a minimal number of hits (their inner ordering being a less serious matter). Typically, examples of such information lie in the judicial domain and in patent searching.</Paragraph>
    <Paragraph position="3"> More experiments remain to be done, to find out whether similar improvements can be obtained from other document collections. The INEX collection is of scientific articles and consistently uses a terminology of its own. Whether similar performance would be observed from a more general document collection such as newspaper articles has to be verified.</Paragraph>
    <Paragraph position="4"> The use of phrases is factual in many languages, which makes us optimistic regarding an application of this work to multilingual document corporas. Thinking of the other techniques, the gap should give us robustness against the challenges of multilingualism.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML