File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0412_intro.xml
Size: 7,231 bytes
Last Modified: 2025-10-06 14:01:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0412"> <Title>PhraseNet: Towards Context Sensitive Lexical Semantics/</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Progress in natural language understanding research necessitates significant progress in lexical semantics and the development of lexical semantics resources. In a broad range of natural language applications, from /Research supported by NSF grants IIS-99-84168, ITR-IIS-00-85836 and an ONR MURI award.</Paragraph> <Paragraph position="1"> Names of authors are listed alphabetically.</Paragraph> <Paragraph position="2"> prepositional phrase attachment (Pantel and Lin, 2000; Stetina and Nagao, 1997), co-reference resolution (Ng and Cardie, 2002) to text summarization (Saggion and Lapalme, 2002), semantic information is a necessary component in the inference, by providing a level of abstraction that is necessary for robust decisions.</Paragraph> <Paragraph position="3"> Inducing that the prepositional phrase in &quot;They ate a cake with a fork&quot; has the same grammatical function as that in &quot;They ate a cake with a spoon&quot;, for example, depends on the knowledge that &quot;cutlery&quot; and &quot;tableware&quot; are the hypernyms of both &quot;fork&quot; and &quot;spoon&quot;. However, the noun &quot;fork&quot; has five senses listed in WordNet and each of them has several different hypernyms. Choosing the correct one is a context sensitive decision.</Paragraph> <Paragraph position="4"> WordNet (Fellbaum, 1998), a manually constructed lexical reference system provides a lexical database along with semantic relations among the lexemes of English and is widely used in NLP tasks today. However, Word-Net is organized at the word level, and at this level, English suffers ambiguities. Stand-alone words may have several meanings and take on relations (e.g., hypernyms, hyponyms) that depend on their meanings. Consequently, there are very few success stories of automatically using WordNet in natural language applications. In many cases, reported (and unreported) problems are due to the fact that WordNet enumerates all the senses of polysemous words; attempts to use this resource automatically often result in noisy and non-uniform information (Brill and Resnik, 1994; Krymolowski and Roth, 1998).</Paragraph> <Paragraph position="5"> PhraseNet is designed based on the assumption that, by and large, semantic ambiguity in English disappears when local context of words is taken into account. It makes use of WordNet as an important knowledge source and is generated automatically using WordNet and machine learning based processing of large English corpora. It enhances a WordNet synset with its contextual information and refines its relational structure, including relations such as hypernym, hyponym, antonym and synonym, by maintaining only those links that respect contextual constraints. However, PhraseNet is not just a functional extension of WordNet. It is an independent lexical semantic system allied with proper user interfaces and access functions that will allow researchers and practitioners to use it in applications.</Paragraph> <Paragraph position="6"> As stated before, PhraseNet, is built on the assumption that linguistic context is an indispensable factor affecting the perception of a semantic proximity between words.</Paragraph> <Paragraph position="7"> In its current design, PhraseNet defines &quot;context&quot; hierarchically with three abstraction levels: abstract syntactic skeletons, such as [(S)!(V)!(DO)!(IO)!(P)!(N)] which stands for Subject, Verb, Direct Object, Indirect Object, Preposition and Noun(Object) of the Preposition, respectively; syntactic skeletons whose components are enhanced by semantic abstraction, such as [Peop ! send ! Peop ! gift ! on ! Day] and finally concrete syntactic skeletons from real sentences as [they !send!mom!gift!on!Christmas].</Paragraph> <Paragraph position="8"> Intuitively, while &quot;candle&quot; and &quot;cigarette&quot; would score poorly on semantic similarity without any contextual information, their occurrence in sentences such as &quot;John tried to light a candle/cigarette&quot; may highlight their connection with the process of burning.</Paragraph> <Paragraph position="9"> PhraseNet captures such constraints from the contextual structures extracted automatically from natural language corpora and enumerates word lists with their hierarchical contextual information. Several abstractions are made in the process of extracting the context in order to prevent superfluous information and support generalization.</Paragraph> <Paragraph position="10"> The basic unit in PhraseNet is a conset, a word in its context, together with all relations associated with it. In the lexical database, consets are chained together via their similar or hierarchical contexts. By listing every context extracted from large corpora and all the generalized contexts based on those attested sentences, PhraseNet will have much more consets than synsets in WordNet. However, the organization of PhraseNet respects the syntactic structure together with the distinction of senses of each word in its corresponding contexts.</Paragraph> <Paragraph position="11"> For example, rather than linking all hypernyms of a polysemous word to a single word token, PhraseNet connects the hypernym of each sense to the target word in every context that instantiates that sense. While in Word-Net every word has an average of 5:4 hypernyms, in PhraseNet, the average number of hypernyms of a word in a conset is 1:51.</Paragraph> <Paragraph position="12"> In addition to querying WordNet semantic relations to disambiguate consets, PhraseNet also maintains fre- null quency records of each word in its context to help differentiate consets and makes use of defined similarity between contexts in this process 2.</Paragraph> <Paragraph position="13"> Several access functions are built into PhraseNet that allow retrieving information relevant to a word and its context. When accessed with words and their contextual information, the system tends to output more relevant semantic information due to the constraint set by their syntactic contexts.</Paragraph> <Paragraph position="14"> While still in preliminary stages of development and experimentation and with a lot of functionalities still missing, we believe that PhraseNet is an important effort towards building a contextually sensitive lexical semantic resource, that will be of much value to NLP researchers as well as linguists and language learners.</Paragraph> <Paragraph position="15"> The rest of this paper is organized as follows. Sec. 2 presents the design principles of PhraseNet. Sec. 3 describes the construction of PhraseNet and the current stage of the implementation. An application that provides a preliminary experimental evaluation is described in Sec. 4. Sec. 5 discuses some related work on lexical semantics resources and Sec. 6 discusses future directions within PhraseNet.</Paragraph> </Section> class="xml-element"></Paper>