File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1018_intro.xml
Size: 15,519 bytes
Last Modified: 2025-10-06 14:01:47
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1018"> <Title>Orthogonal Negation in Vector Spaces for Modelling Word-Meanings and Document Retrieval</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Negation and Disjunction in </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Vector Spaces </SectionTitle> <Paragraph position="0"> In this section we use well-known linear algebra to define vector negation in terms of orthogonality and disjunction as the linear sum of subspaces. The mathematical apparatus is covered in greater detail in (Widdows and Peters, 2003). If A is a set (in some universe of discourse U), then 'NOT A' corresponds to the complement A[?] of the set A in U (by definition). By a simple analogy, let A be a vector subspace of a vector space V (equipped with a scalar product). Then the concept 'NOT A' should correspond to the orthogonal complement A[?] of A under the scalarproduct (BirkhoffandvonNeumann, 1936, SS6). If we think of a basis for V as a set of features, this says that 'NOT A' refers to the subspace of V which has no features in common with A.</Paragraph> <Paragraph position="1"> We make the following definitions. Let V be a (real) vector space equipped with a scalar product.</Paragraph> <Paragraph position="2"> We will use the notation A [?] V to mean &quot;A is a vector subspace of V .&quot; For A [?] V , define the orthogonal subspace A[?] to be the subspace A[?] [?] {v [?] V : [?]a [?] A,a* v = 0}.</Paragraph> <Paragraph position="3"> For the purposes of modelling word-meanings, we might think of 'orthogonal' as a model for 'completely unrelated' (having similarity score zero).</Paragraph> <Paragraph position="4"> This makes perfect sense for information retrieval, where we assume (for example) that if two words never occur in the same document then they have no features in common.</Paragraph> <Paragraph position="5"> Definition 1 Let a,b [?] V and A,B [?] V . By NOT A we mean A[?] and by NOT a, we mean <a> [?], where <a> = {la : l [?] R} is the 1-dimensional subspace subspace generated by a. By a NOT B we mean the projection of a onto B[?] and by a NOT b we mean the projection of a onto <b> [?].</Paragraph> <Paragraph position="6"> We now show how to use these notions to perform calculations with individual term or query vectors in a form which is simple to program and efficient to run.</Paragraph> <Paragraph position="7"> Theorem 1 Let a,b [?] V . Then a NOT b is represented by the vector a NOT b [?] a [?] a* b|b|2b.</Paragraph> <Paragraph position="8"> where |b|2 = b* b is the modulus of b.</Paragraph> <Paragraph position="9"> Proof. A simple proof is given in (Widdows and Peters, 2003).</Paragraph> <Paragraph position="10"> For normalised vectors, Theorem 1 takes the particularly simple form a NOT b = a[?](a * b)b, (1) which in practice is then renormalised for consistency. One computational benefit is that Theorem 1 gives a single vector for a NOT b, so finding the similarity between any other vector and a NOT b is just a single scalar product computation.</Paragraph> <Paragraph position="11"> Disjunction is also simple to envisage, the expression b1 OR ... OR bn being modelled by the sub-space null</Paragraph> <Paragraph position="13"> Theoretical motivation for this formulation can be found in (Birkhoff and von Neumann, 1936, SS1,SS6) and (Widdows and Peters, 2003): for example, B is the smallest subspace of V which contains the set {bj}.</Paragraph> <Paragraph position="14"> Computing the similarity between a vector a and this subspace B is computationally more expensive than for the negation of Theorem 1, because the scalar product of a with (up to) n vectors in an orthogonal basis for B must be computed. Thus the gain we get by comparing each document with the query a NOT b using only one scalar product operation is absent for disjunction.</Paragraph> <Paragraph position="15"> However, this benefit is regained in the case of negated disjunction. Suppose we negate not only one argument but several. If a user specifies that they want documents related to a but not b1,b2,...,bn, then (unless otherwise stated) it is clear that they only want documents related to none of the unwanted terms bi (rather than, say, the average of these terms).</Paragraph> <Paragraph position="16"> This motivates a process which can be thought of as a vector formulation of the classical de Morgan equivalence [?] a[?] [?] b [?][?] (a [?] b), by which the expression a AND NOT b1 AND NOT b2 ... AND NOT bn is translated to a NOT (b1 OR ... OR bn). (2) Using Definition 1, this expression can be modelled with a unique vector which is orthogonal to all of the unwanted arguments {b1}. However, unless the vectors b1,...,bn are orthogonal (or identical), we need to obtain an orthogonal basis for the subspace b1 OR ... OR bn before we can implement a higherdimensional version of Theorem 1. This is because the projection operators involved are in general noncommutative, one of the hallmark differences between Boolean and quantum logic.</Paragraph> <Paragraph position="17"> In this way vector negation generates a meaningvector which takes into account the similarities and differences between the negative terms. A query for chip NOT computer, silicon is treated differently from a query for chip NOT computer, potato.</Paragraph> <Paragraph position="18"> Vector negation is capable of realising that for the first query, the two negative terms are referring to the same general topic area, but in the second case the task is to remove radically different meanings from the query. This technique has been used to remove several meanings from a query iteratively, allowing a user to 'home in on' the desired meaning by systematically pruning away unwanted features.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Initial experiments modelling </SectionTitle> <Paragraph position="0"> word-senses Our first experiments with vector negation were to determine whether the negation operator could find different senses of ambiguous words by negating a word closely related to one of the meanings. A vector space model was built using Latent Semantic Analysis, similar to the systems of (Landauer and Dumais, 1997; Sch&quot;utze, 1998). The effect of LSA is to increase linear dependency between terms, and for this reason it is likely that LSA is a crucial step in our approach. Terms were indexed depending on their co-occurrence with 1000 frequent &quot;content-bearing words&quot; in a 15 word context-window, giving each term 1000 coordinates. This was reduced to 100 dimensions using singular value decomposition. Later on, document vectors were assigned in the usual manner by summation of term vectors using tf-idf weighting (Salton and McGill, 1983, p. 121). Vectors were normalised, so that the standard (Euclidean) scalar product and cosine similarity coincided. This scalar product was used as a measure of term-term and term-document similarity throughout our experiments. This method was used because it has been found to be effective at producing good term-term similarities for word-sense disambiguation (Sch&quot;utze, 1998) and automatic lexical acquisition (Widdows, 2003), and these similarities wereused to generateinteresting queries and to judge the effectiveness of different forms of negation. More details on the build- null ing of this vector space model can be found in (Widdows, 2003; Widdows and Peters, 2003).</Paragraph> <Paragraph position="1"> suit suit NOT lawsuit suit 1.000000 pants 0.810573 lawsuit 0.868791 shirt 0.807780 suits 0.807798 jacket 0.795674 plaintiff 0.717156 silk 0.781623 sued 0.706158 dress 0.778841 plaintiffs 0.697506 trousers 0.771312 suing 0.674661 sweater 0.765677 lawsuits 0.664649 wearing 0.764283 damages 0.660513 satin 0.761530 filed 0.655072 plaid 0.755880 behalf 0.650374 lace 0.755510 appeal 0.608732 worn 0.755260 Terms related to 'suit NOT lawsuit' (NYT data) play play NOT game play 1.000000 play 0.779183 playing 0.773676 playing 0.658680 plays 0.699858 role 0.594148 played 0.684860 plays 0.581623 game 0.626796 versatility 0.485053 offensively 0.597609 played 0.479669 defensively 0.546795 roles 0.470640 preseason 0.544166 solos 0.448625 midfield 0.540720 lalas 0.442326 role 0.535318 onstage 0.438302 tempo 0.504522 piano 0.438175 score 0.475698 tyrone 0.437917 Terms related to 'play NOT game' (NYT data) Two early results using negation to find senses of ambiguous words are given in Table 1, showing that vector negation is very effective for removing the 'legal' meaning from the word suit and the 'sporting' meaning from the word play, leaving respectively the 'clothing' and 'performance' meanings. Note that removing a particular word also removes concepts related to the negated word. This gives credence to the claim that our mathematical model is removing the meaning of a word, rather than just a string of characters. This encouraged us to set up a larger scale experiment to test this hypothesis, which is described in Section 4.</Paragraph> <Paragraph position="2"> 3 Other forms of Negation in IR There have been rigourous studies of Boolean operators for information retrieval, including the pnorms ofSalton et al. (1983) and the matrix forms of Turtle and Croft (1989), which have focussed particularly on mathematical expressions for conjunction and disjunction. However, typical forms of negation (such as NOT p = 1[?]p) have not taken into account the relationship between the negated argument and the rest of the query.</Paragraph> <Paragraph position="3"> Negation has been used in two main forms in IR systems: for the removal of unwanted documents after retrieval and for negative relevance feedback. We describe these methods and compare them with vector negation.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Negation by filtering results after </SectionTitle> <Paragraph position="0"> retrieval A traditional Boolean search for documents related to the query a NOT bwould return simply thosedocuments which contain the term a and do not contain the term b. More formally, let D be the document collection and let Di [?] D be the subset of documents containing the term i. Then the results to the Boolean query for a NOT b would be the set Da[?]Dprimeb, where Dprimeb is the complement of Db in D. Variants of this are used within a vector model, by using vector retrieval to retrieve a (ranked) set of relevant documents and then 'throwing away' documents containing the unwanted terms (Salton and McGill, 1983, p. 26). This paper will refer to such methods under the general heading of 'post-retrieval filtering'.</Paragraph> <Paragraph position="1"> There are at least three reasons for preferring vector negation to post-retrieval filtering. Firstly, post-retrieval filtering is not very principled and is subject to error: for example, it would remove a long document containing only one instance of the unwanted term.</Paragraph> <Paragraph position="2"> One might argue here that if a document containing unwanted terms is given a 'negative-score' rather than just disqualified, this problem is avoided. This would leaves us considering a combined score, sim(d,a NOT b) = d*a [?]ld*b for some parameter l. However, since this is the same as d * (a [?] lb), it is computationally more efficient to treat a [?] lb as a single vector. This is exactly what vector negation accomplishes, and also determines a suitable value of l from a and b. Thus a second benefit for vector negation is that it produces a combined vector for a NOT b which enables the relevance score of each document to be computed using just one scalar product operation.</Paragraph> <Paragraph position="3"> The third gain is that vector retrieval proves to be better at removing not only an unwanted term but also its synonyms and related words (see Section 4), which is clearly desirable if we wish to remove not only a string of characters but the meaning represented by this string.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Negative relevance feedback </SectionTitle> <Paragraph position="0"> Relevance feedback has been shown to improve retrieval (Salton and Buckley, 1990). In this process, documents judged to be relevant have (some multiple of) their document vector added to the query: documents judged to be non-relevant have (some multiple of) their document vector subtracted from the query, producing a new query according to the formula where Qi is the ith query vector, Di is the set of documents returned by Qi which has been partitioned into relevant and non-relevant subsets, and a,b,g [?] R are constants. Salton and Buckley (1990) report best results using b = 0.75 and g = 0.25.</Paragraph> <Paragraph position="1"> The positive feedback part of this process has become standard in many search engines with options such as &quot;More documents like this&quot; or &quot;Similar pages&quot;. The subtraction option (called 'negative relevance feedback') is much rarer. A widely held opinion is that that negative feedback is liable to harm retrieval, because it may move the query away from relevant as well as non-relevant documents (Kowalski, 1997, p. 160).</Paragraph> <Paragraph position="2"> The concepts behind negative relevance feedback are discussed instructively by Dunlop (1997). Negative relevance feedback introduces the idea of subtracting an unwanted vector from a query, but gives no general method for deciding &quot;how much to subtract&quot;. We shall refer to such methods as 'Constant Subtraction'. Dunlop (1997, p. 139) gives an analysis which leads to a very intuitive reason for preferring vector negation over constant subtraction. If a user removes an unwanted term which the model deems to be closely related to the desired term, this should have a strong effect, because there is a significant 'difference of opinion' between the user and the model. (From an even more informal point of view, why would anyone take the trouble to remove a meaning that isn't there anyway?). With any kind of constant subtraction, however, the removal of distant points has a greater effect on the final querystatement than the removal of nearby points.</Paragraph> <Paragraph position="3"> Vector negation corrects this intuitive mismatch.</Paragraph> <Paragraph position="4"> Recall from Equation 1 that (using normalised vectors for simplicity) the vector a NOT b is given by a [?] (a * b)b. The similarity of a with a NOT b is therefore a *(a [?](a* b)b) = 1[?](a * b)2.</Paragraph> <Paragraph position="5"> The closer a and b are, the greater the (a*b)2 factor becomes, so the similarity of a with a NOT b becomes smaller the closer a is to b. This coincides exactly with Dunlop's intuitive view: removing a concept which in the model is very close to the original query has a large effect on the outcome. Negative relevance feedback introduces the idea of subtracting an unwanted vector from a query, but gives no general method for deciding 'how much to subtract'.</Paragraph> <Paragraph position="6"> We shall refer to such methods as 'Constant Subtraction'. null</Paragraph> </Section> </Section> class="xml-element"></Paper>