File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-0412_abstr.xml
Size: 1,403 bytes
Last Modified: 2025-10-06 13:43:42
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0412"> <Title>Non-Contiguous Word Sequences for Information Retrieval</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> The growing amount of textual information available electronically has increased the need for high performance retrieval. The use of phrases was long seen as a natural way to improve retrieval performance over the common document models that ignore the sequential aspect of word occurrences in documents, considering them as &quot;bags of words&quot;. However, both statistical and syntactical phrases showed disappointing results for large document collections. In this paper we present a recent type of multi-word expressions in the form of Maximal Frequent Sequences (Ahonen-Myka, 1999).</Paragraph> <Paragraph position="1"> Mined phrases rather than statistical or syntactical phrases, their main strengths are to form a very compact index and to account for the sequentiality and adjacency of meaningful word co-occurrences, by allowing for a gap between words.</Paragraph> <Paragraph position="2"> We introduce a method for using these phrases in information retrieval and present our experiments. They show a clear improvement over the well-known technique of extracting frequent word pairs.</Paragraph> </Section> class="xml-element"></Paper>