File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1105_metho.xml
Size: 12,900 bytes
Last Modified: 2025-10-06 14:07:28
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1105"> <Title>Discriminative Power and Retrieval Effectiveness of Phrasal Indexing Terms</Title> <Section position="2" start_page="672" end_page="672" type="metho"> <SectionTitle> 2. NTCIR Data Analysis </SectionTitle> <Paragraph position="0"> Greiff presented an analysis of TREC data plotting each query terms in view of distributions in the whole document collection and in relevant document sets(Greiff, 1998) and Pickens et al. applied this analysis for statistical phrases(Pickens et al, 2000)* Adopting their plotting approach, we will try to clarify distribution characteristics of phrasal terms using mainly p(occlrel) and p(occ) which are computed as document frequencies of the term in relevant documents/the whole collection respectively divided by each number of documents.</Paragraph> <Section position="1" start_page="672" end_page="672" type="sub_section"> <SectionTitle> 2.1. Occurrence in Relevant Documents </SectionTitle> <Paragraph position="0"> and in Non-relevant Documents Table 2 and Table 3 shows high document frequency terms extracted from the short query set of test topics.</Paragraph> <Paragraph position="1"> A short query refers to a query conslructed using only <description> field of topic description and a long query, all fields of topic description. First, plotting of p(occlnon-rel) as fimction of p(occ) is not interesting since approximately the relation p(occlnon-rel)-'-p(occ) is observed.</Paragraph> <Paragraph position="2"> This is not surprising because number of relevant documents are generally very small and p(occ\[non-rel) can be approximated by p(occ).</Paragraph> <Paragraph position="3"> From Table 2 and Table 3, we can imagine that the distribution characteristics of phrasal terms are almost same as single words i.e. Zipfian distribution but document frequencies of phrasal terms are much smaller than single words.</Paragraph> <Paragraph position="4"> It seems difficult to get clear intuition about term distribution characteristics from Figure 1, where p(occIrel) is plotted as fimction of p(occ). The same p(occ) value for some frequent terms found in plots indicates multiple occurrences of a term in different queries.</Paragraph> <Paragraph position="5"> As Greiff suggests, a different visualization is desirable for this graph.</Paragraph> <Paragraph position="7"> Left above: short query single words, Right above: short query phrases Left below: long query single words, Right below: long query phrases First p(occ) is replaced by log(O(occ))=log(p(occ)/1-p(occ)), since distribution of p(oec) is too skewed* In Figure 1, if the dot representing a term located higher than the graph of p(occ)=p(occlrel), the term can be a good discriminator and should contribute to retrieval performance given an adequate weighting scheme* On the other hands, the terms plotted lower than the graph of p(occ)---p(occlrel) are by no means useful for retrieval performance irrespective of weighting scheme.</Paragraph> <Paragraph position="8"> P(occirel) is replaced by log(p(occlrel)/p(occ)) in order to illustrate this borderline. In the case of zero probability for p(occlrel), -6 is assigned for log(p(occlrel)/p(oec)).</Paragraph> <Paragraph position="9"> This is equivalent to mutual information MI(occ;rel) in information theory as follows: lod'P(degCC I rel) , ( p(occ, rel) Finally, Figure 2 illustrates distribution characteristics of terms much better than Figure relevance judgements.</Paragraph> <Paragraph position="10"> As this shows, single words and phrases are very similar distribution characteristics but document frequencies for phrases are much lower. Average of log(O(occ)) is -5.22 for single words while 8.64 for phrases in long queries.</Paragraph> <Paragraph position="11"> On the other hands, ratios of good terms, whose log(p(occlrel)/p(occ)) is larger than 0, are shown in Table 4.</Paragraph> <Paragraph position="12"> From this observation, we can see limited usefulness of phrasal terms with regards to relevance. The ratio of positive log(p(occlrel)/p(oce)) is lower than single words. This explains poor performance of precoordinated longer phrase based indexing that utilizes phrases as replacements of single words. Phrasal terms tend to have high value of log(p(occlrel)/p(oce)), but this does not necessarily mean effectiveness of phrasal terms. As Figure 1 and Figure 2 illustrate, the terms with high log(p(ocelrel)/p(occ)) value tend to have low log(O(oec)) that means extremely lower document frequency so that they are not so useful because of such lower frequency.</Paragraph> </Section> <Section position="2" start_page="672" end_page="672" type="sub_section"> <SectionTitle> 2.2. Measures for Phrasal Term Effectiveness </SectionTitle> <Paragraph position="0"> Table 4 and Table 5 seem to support supplemental phrasal indexing, because fairly high ratio of positive log(p(occlrel)/p(occ)) terms, and higher average value of log(p(oeclrel)/p(occ) ) are observed. But for short queries, supplementing phrasal terms did not show any positive effect as we have seen in The following accounts are enumerated.</Paragraph> <Paragraph position="1"> 1) Over-weighted phrasal terms may cause topic deviation from concepts represented by single words to concepts represented by phrasal terms.</Paragraph> <Paragraph position="2"> 2) Supplemental phrasal terms are not always informative because their constituent single words are already indexed.</Paragraph> <Paragraph position="3"> If the phrasal term AB has a high MI(AB,rel) value in contrast with MI(A, rel) and MI(B,rel), this is the ease where phrasal terms are effective.</Paragraph> <Paragraph position="4"> Consider a supplemental phrasal term as informative if and only if its MI(occ,rel) is positive value and is higher than the sum of Ml(oce,rel) of constituent single words in view of the query and relevance judgements. A phrase &quot;AB&quot; is informative means that the occurrence of a phrase &quot;AB&quot; gives more information about relevance than occurrence of both single words &quot;A&quot; and &quot;B&quot;.</Paragraph> <Paragraph position="5"> Table 6 shows the number and the ratio of informative phrasal terms. -1 is assigned for MI(occ,rel) when p(oeclrel) is 0.</Paragraph> <Paragraph position="6"> Giving different values (-3 and -6) for MI(occlrel) when p(occlrel)=0 did not change the results..</Paragraph> </Section> <Section position="3" start_page="672" end_page="672" type="sub_section"> <SectionTitle> 2.3. Three Categories of Phrasal Terms </SectionTitle> <Paragraph position="0"> The following three categories of phrasal terms in view of possible contribution to retrieval effectiveness are proposed from the previous discussion.</Paragraph> </Section> <Section position="4" start_page="672" end_page="672" type="sub_section"> <SectionTitle> 2.4. Weight Ratio of Phrasal Terms </SectionTitle> <Paragraph position="0"> Retrieval status values are computed as a linear combination of each term weight, which is the product of the query weight and the document weight of the term. Using atn weighting in the SMART system for the same setting as the runs reported in Table 1, for each query term, the sums of weights of each query term are computed and for each query weight sum, ratio of informative phrasal terms and destructive phrasal terms are also computed. Macro-averaged ratios of informative phrasal terms and destructive phrasal terms are shown in Table 8.</Paragraph> <Paragraph position="1"> Still, short queries seem to contain better phrases in the ratio despite the fact that no consistent effectiveness for retrieval performance is observed.</Paragraph> <Paragraph position="2"> 2.5. Correlation between phrasal term weight ratio and performance difference For each runs against the 53 test topic set both with short queries and long queries, correlation between query-by-query performance difference and query-by-query weight ratio of both informative and destructive phrasal term weight ratio are examined. Performance difference is measured by non-interpolated average precision and when the supplemental phrasal term run performs better a positive value is given as we have seen in Table 1.</Paragraph> <Paragraph position="3"> Table 9 shows the Pearson's correlation coefficient between performance difference and each weight ratio as well as and difference between weight ratios.</Paragraph> <Paragraph position="4"> A positive correlation coefficient for informative phrasal terms and a negative correlation coefficient for destructive phrasal terms are observed as is expected, although the coefficient values are very small.</Paragraph> <Paragraph position="5"> Given a topic set, a document collection and relevance judgements, we are able to know which terms are good ( and possibly how good they are ) for retrieval performance but to explain slight performance difference between different indexing strategies seems to be much more difficult.</Paragraph> <Paragraph position="6"> Short queries contain relatively better phrasal terms even though absolute number of such terms is smaller than longer queries. But utilizing such phrasal terms does not always lead to performance improvement in macro-averaged precision-recall basis evaluation.</Paragraph> </Section> </Section> <Section position="3" start_page="672" end_page="672" type="metho"> <SectionTitle> 3. Topic Deviation </SectionTitle> <Paragraph position="0"> What we mean by topic deviation is a phenomenon that is similar to query drift caused by relevance feedback, but is incurred by some over-weighted supplemental phrasal terms.</Paragraph> <Paragraph position="1"> Terms representing some concepts in the topic are over-weighted consequently the search results are inclined to these concepts.</Paragraph> <Paragraph position="2"> We verified short queries where supplemental phrasal terms caused considerable degradation (difference in average precision is more than 20%) and listed phrasal terms caused such degradation in Table 10.</Paragraph> <Paragraph position="3"> As we can see, not only the neutral phrases in topics 50, 62 and 77, but also adding only informative phrases caused degradation as in topic 76.</Paragraph> <Paragraph position="4"> <description> field of topic 76 is translated as follows: &quot;(I want to know about) methods for interference detection between polyhedral representations.&quot; This topic consists of two concepts namely &quot;interference detection&quot; and &quot;polyhedral representation&quot; and the supplemented phrasal tom &quot;~i~ifls: ra~\]&quot;(between polyhedral) is part of the second concept.</Paragraph> <Paragraph position="5"> Retrieval effectiveness depends on a subtle balance of weighting on each concept, especially in short queries, and redundant terms or over-weighted terms cause the scoring function to loose such balances.</Paragraph> <Paragraph position="6"> Conclusions Effects of phrasal indexing in view of different length of queries are observed in the experiments using NACSIS test collection 1, the first large scale test collection for Japanese information retrieval.</Paragraph> <Paragraph position="7"> Our observations and conclusions are as follows: 1) Distribution characteristics of phrasal terms as well as single word terms are examined plotting each term's MI(oce,rel) as function of log(O(occ)).</Paragraph> <Paragraph position="8"> 2) Distribution characteristics of phrasal terms are similar to single word terms but their frequencies are much smaller than single words. 3) Generally phrasal terms are comparably good discriminators of relevant documents, if not superior, as single words are.</Paragraph> <Paragraph position="9"> 4) In supplemental phrasal indexing, good discriminator terms are not always effective for retrieval performance but only some phrasal terms are informative and possibly effective. 5) Informative, neutral and destructive phrasal terms are defined by means of MI(oce,rel).</Paragraph> <Paragraph position="10"> 6) Correlation between performance difference and weight ratio of informative/destructive terms is examined and a very week correlation is observed.</Paragraph> </Section> <Section position="4" start_page="672" end_page="672" type="metho"> <SectionTitle> 7) Explaining effectiveness of each query term is </SectionTitle> <Paragraph position="0"> not sufficient for explaining effectiveness of phrasal indexing. Even good discriminator terms may hurt the retrieval effectiveness.</Paragraph> <Paragraph position="1"> This research is by no means conclusive but a starting point of a longer project that hopefully leads to a new weighting scheme to replace current empirical down-weighting approach for supplemental phrasal terms.</Paragraph> </Section> class="xml-element"></Paper>