File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2069_intro.xml
Size: 1,408 bytes
Last Modified: 2025-10-06 14:03:44
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2069"> <Title>Examining the Content Load of Part of Speech Blocks for Information Retrieval</Title> <Section position="4" start_page="531" end_page="531" type="intro"> <SectionTitle> 2 Related Studies </SectionTitle> <Paragraph position="0"> We examine the distribution of POS blocks in language. This is but one type of language distribution analysis that can be realised. One can also examine the distribution of character or word ngrams, e.g. Language Modeling (Croft and Lafferty, 2003), phrases (Church and Hanks, 1990; Lewis, 1992), and so on. In class-based n-gram modeling (Brown et al., 1992) for example, class-based n-grams are used to determine the probability of occurrence of a POS class, given its preceding classes, and the probability of a particular word, given its own POS class. Unlike the class-based n-gram model, we do not use POS blocks to make predictions. We estimate their probability of occurrence as blocks, not the individual probabilities of their components, motivated by the intuition that the more frequently a POS block occurs, the more content it bears. In the context of IR, efforts have been made to use syntactic information to enhance retrieval (Smeaton, 1999; Strzalkowski, 1996; Zukerman and Raskutti, 2002), but not by using POS block-based distribution representations. null</Paragraph> </Section> class="xml-element"></Paper>