File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-1202_intro.xml
Size: 4,258 bytes
Last Modified: 2025-10-06 14:03:19
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1202"> <Title>The Distributional Similarity of Sub-Parses</Title> <Section position="3" start_page="7" end_page="7" type="intro"> <SectionTitle> 2 Background </SectionTitle> <Paragraph position="0"> One well-studied approach to the identification of paraphrases is to employ a lexical similarity function. As noted by Barzilay and Elhadad (2003), even a lexical function that simply computes word overlap can accurately select paraphrases. The problem with such a function is not in the accuracy of the paraphrases selected, but in its low recall. One popular way of improving recall is to relax the requirement for words in each sentence to be identical in form, to being identical or similar in meaning. Methods to find the semantic similarity of two words can be broadly split into those which use lexical resources, e.g., WordNet (Fellbaum, 1998), and those which use a distributional similarity measure (see Weeds (2003) for a review of distributional similarity measures). Both Jijkoun and deRijke (2005) and Herrara et al. (2005) show how such a measure of lexical semantic similarity might be incorporated into a system for recognising textual entailment between sentences.</Paragraph> <Paragraph position="1"> Previous work on the NatHab project (Weeds et al., 2004) used such an approach to extend lexical coverage. Each of the user's uttered words was mapped to a set of candidate words in a core lexicon3, identified using a measure of distributional similarity. For example, the word send is used when talking about printing or about emailing, and a good measure of lexical similarity would identify both of these conceptual services as candidates. The best choice of candidate was then chosen by optimising the match between grammatical dependency relations and paths in the ontology over the entire sentence. For example, an indirect-object relation between the verb send and a printer can be mapped to the path in the ontology relating a print request to its target printer.</Paragraph> <Paragraph position="2"> As well as lexical variation, our previous work (Weeds et al., 2004) allowed a certain amount of syntactic variation via its use of grammatical dependencies and policy templates. For example, the passive &quot;paraphrase&quot; of a sentence can be identified by comparing the sets of grammatical dependency relations produced by a shallow parser such as the RASP concept in the ontology.</Paragraph> <Paragraph position="3"> parser (Briscoe and Carroll, 1995). In other words, by looking at grammatical dependency relations, we can identify that &quot;John is liked by Mary,&quot; is a paraphrase of &quot;Mary likes John,&quot; and not of &quot;John likes Mary.&quot; Further, where there is a limited number of styles of sentence, we can manually identify and list other templates for matches over the trees or sets of dependency relations. For example, &quot;If C1 then C2&quot; is the same as &quot;C2 if C1&quot;.</Paragraph> <Paragraph position="4"> However, the limitations of this approach, which combines lexical variation, grammatical dependency relations and template matching, become increasingly obvious as one tries to scale up. As noted by Herrera (2005), similarity at the word level is not required for similarity at the phrasal level. For example, in the context of our project, the phrases &quot;if my mobile phone needs charging&quot; and &quot;if my mobile phone battery is low&quot; have the same intended meaning but it is not possible to obtain the second by making substitutions for similar words in the first. It appears that &quot;X needs charging&quot; and &quot;battery (of X) is low&quot; have roughly similar meanings without their component words having similar meanings. Further, this does not appear to be due to either phrase being non-compositional. As noted by Pearce (2001), it is not possible to substitute similar words within non-compositional collocations. In this case, however, both phrases appear to be compositional. Words cannot be substituted between the two phrases because they are composed in different ways.</Paragraph> </Section> class="xml-element"></Paper>