File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3206_intro.xml
Size: 6,297 bytes
Last Modified: 2025-10-06 14:02:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3206"> <Title>Scaling Web-based Acquisition of Entailment Relations</Title> <Section position="4" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Background and Motivations </SectionTitle> <Paragraph position="0"> This section provides a qualitative view of prior work, emphasizing the perspective of aiming at a full-scale paraphrase resource. As there are still no standard benchmarks, current quantitative results are not comparable in a consistent way.</Paragraph> <Paragraph position="1"> The major idea in paraphrase acquisition is often to find linguistic structures, here termed templates, that share the same anchors. Anchors are lexical elements describing the context of a sentence. Templates that are extracted from different sentences and connect the same anchors in these sentences, are assumed to paraphrase each other. For example, the sentences &quot;Yahoo bought Overture&quot; and &quot;Yahoo acquired Overture&quot; share the anchors {X=Yahoo, Y =Overture}, suggesting that the templates 'X buy Y' and 'X acquire Y' paraphrase each other. Algorithms for paraphrase acquisition address two problems: (a) finding matching anchors and (b) identifying template structure, as reviewed in the next two subsections.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Finding Matching Anchors </SectionTitle> <Paragraph position="0"> The prominent approach for paraphrase learning searches sentences that share common sets of multiple anchors, assuming they describe roughly the same fact or event. To facilitate finding many matching sentences, highly redundant comparable corpora have been used. These include multiple translations of the same text (Barzilay and McKeown, 2001) and corresponding articles from multiple news sources (Shinyama et al., 2002; Pang et al., 2003; Barzilay and Lee, 2003). While facilitating accuracy, we assume that comparable corpora cannot be a sole resource due to their limited availability. null Avoiding a comparable corpus, (Glickman and Dagan, 2003) developed statistical methods that match verb paraphrases within a regular corpus.</Paragraph> <Paragraph position="1"> Their limited scale results, obtaining several hundred verb paraphrases from a 15 million word corpus, suggest that much larger corpora are required. Naturally, the largest available corpus is the Web.</Paragraph> <Paragraph position="2"> Since exhaustive processing of the Web is not feasible, (Duclaye et al., 2002) and (Ravichandran and Hovy, 2002) attempted bootstrapping approaches, which resemble the mutual bootstrapping method for Information Extraction of (Riloff and Jones, 1999). These methods start with a provided known set of anchors for a target meaning. For example, the known anchor set{Mozart, 1756}is given as input in order to find paraphrases for the template 'X born in Y'. Web searching is then used to find occurrences of the input anchor set, resulting in new templates that are supposed to specify the same relation as the original one (&quot;born in&quot;). These new templates are then exploited to get new anchor sets, which are subsequently processed as the initial {Mozart, 1756}. Eventually, the overall procedure results in an iterative process able to induce templates from anchor sets and vice versa.</Paragraph> <Paragraph position="3"> The limitation of this approach is the requirement for one input anchor set per target meaning. Preparing such input for all possible meanings in broad domains would be a huge task. As will be explained below, our method avoids this limitation by finding all anchor sets automatically in an unsupervised manner.</Paragraph> <Paragraph position="4"> Finally, (Lin and Pantel, 2001) present a notably different approach that relies on matching separately single anchors. They limit the allowed structure of templates only to paths in dependency parses connecting two anchors. The algorithm constructs for each possible template two feature vectors, representing its co-occurrence statistics with the two anchors. Two templates with similar vectors are suggested as paraphrases (termed inference rule).</Paragraph> <Paragraph position="5"> Matching of single anchors relies on the general distributional similarity principle and unlike the other methods does not require redundancy of sets of multiple anchors. Consequently, a much larger number of paraphrases can be found in a regular corpus. Lin and Pantel report experiments for 9 templates, in which their system extracted 10 correct inference rules on average per input template, from 1GB of news data. Yet, this method also suffers from certain limitations: (a) it identifies only templates with pre-specified structures; (b) accuracy seems more limited, due to the weaker notion of similarity; and (c) coverage is limited to the scope of an available corpus.</Paragraph> <Paragraph position="6"> To conclude, several approaches exhaustively process different types of corpora, obtaining varying scales of output. On the other hand, the Web is a huge promising resource, but current Web-based methods suffer serious scalability constraints.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Identifying Template Structure </SectionTitle> <Paragraph position="0"> Paraphrasing approaches learn different kinds of template structures. Interesting algorithms are presented in (Pang et al., 2003; Barzilay and Lee, 2003). They learn linear patterns within similar contexts represented as finite state automata. Three classes of syntactic template learning approaches are presented in the literature: learning of predicate argument templates (Yangarber et al., 2000), learning of syntactic chains (Lin and Pantel, 2001) and learning of sub-trees (Sudo et al., 2003). The last approach is the most general with respect to the template form. However, its processing time increases exponentially with the size of the templates.</Paragraph> <Paragraph position="1"> As a conclusion, state of the art approaches still learn templates of limited form and size, thus restricting generality of the learning process.</Paragraph> </Section> </Section> class="xml-element"></Paper>