File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/w01-0704_metho.xml

Size: 17,109 bytes

Last Modified: 2025-10-06 14:07:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-0704">
  <Title>Semantic Pattern Learning Through Maximum Entropy-based WSD techniquea0</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Full parsing
</SectionTitle>
    <Paragraph position="0"> The analyzer used for this work is the Conexor's FDG Parser (Pasi Tapanainen and Timo J&amp;quot;arvinen, 1997). This parser tries to provide a build dependency tree from the sentence. When this is not possible, the parser tries to build partial trees that often result from unresolved ambiguity. One visual example of this dependency trees is shown in Figure 1 where the parsing tree of sentence (1) is illustrated.</Paragraph>
    <Paragraph position="1"> (1) The minister gave explanations to the Government.</Paragraph>
    <Paragraph position="2"> As seen in Figure 2, the analyzer assigns to each word a text token (second column), a base form (third column) and functional link  names, lexico-syntactic function labels and parts of speech (fourth column). Figure 1 shows the parsing tree related to this output. These elements are enough for the pattern extraction method to be applied to NLP tasks.</Paragraph>
    <Paragraph position="3"> Regarding to the evaluation of the parser, the authors report an average precision and recall of 95% and 88% respectively in the detection of the correct head. Furthermore, they report a precision rate between 89% and 95% and a recall rate between 83% and 96% in the selection of the functional dependencies.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 WSD based on Maximum Entropy
</SectionTitle>
    <Paragraph position="0"> A WSD module is applied to this parser's output, in order to select the correct sense of each entry.</Paragraph>
    <Paragraph position="1"> Maximum Entropy(ME) modeling is a framework for integrating information from many heterogeneous information sources for classification (Manning and Sch&amp;quot;utze, 1999). This WSD system is based on conditional ME probability models. The system implements a supervised learning method consisting of the building of word sense classifiers through training on a semantically tagged corpus. A classifier obtained by means of a ME technique consist of a set of parameters or coefficients estimated by an optimization procedure. Each coefficient associates a weight to one feature observed in the training data. A feature is a function that gives information about some characteristic in a context associated to a class. The basic idea is to obtain the probability distribution that maximizes the entropy, that is, maximum ignorance is assumed and nothing apart of training data is considered. As advantages of ME framework, knowledge-poor features applying and accuracy can be mentioned; ME framework allows a virtually unrestricted ability to represent problem-specific knowledge in the form of features (Ratnaparkhi, 1998).</Paragraph>
    <Paragraph position="2"> Let us assume a set of contexts a2 and a set of classes a3 . The function a4a6a5a8a7 a2 a9 a3 that performs the classification in a conditional probability model a10 chooses the class with the highest conditional probability: a4a11a5a13a12a15a14a17a16a19a18 a20a22a21a24a23a26a25a27a20a29a28a31a30 a10a32a12a33a4a35a34a14a36a16 . The features have the form expressed in equation (1), where a4a37a10a38a12a15a14a36a16 is some observable characteristic1. The conditional probability a10a32a12a33a4a35a34a14a36a16 is defined as in equation (2) where a39a41a40 are the parameters or weights of each feature, and a42a43a12a15a14a36a16 is a constant to ensure that the sum of probabilities for each possible class in this context is equal to 1.</Paragraph>
    <Paragraph position="4"> The features defined on the present system are, 1This is the kind of features used in the system due to it is required by the parameter estimation procedure, but the ME approach is not limited to binary funtions.</Paragraph>
    <Paragraph position="5"> basically, collocations of content words and POS tags of function words around the target word.</Paragraph>
    <Paragraph position="6"> With only this information the system obtains results comparable to other well known methods or systems. For training, DSO sense tagged English corpus (Hwee Tou Ng and Hian Beng Lee, 1996) is used. The DSO corpus is structured in files containing tagged examples of some word. The tags correspond to the correct sense in WordNet 1.5 (FellBaum, 1998). The examples were extracted from articles of the Brown Corpus and Wall Street Journal.</Paragraph>
    <Paragraph position="7"> The implemented system has three main modules: the Feature Extractor (FE), the Generalized Iterative Scaling (GIS), and the Classification module. Each word has its own ME model, that is, there will be a distinct classifier for each one. The FE module automatically defines the features to be observed on the training corpus depending on the classes (senses) defined in Word-Net for a word. The GIS module performs the parameter estimation. Finally, the Classification module uses this set of parameters in order to disambiguate new occurrences of the word.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Evaluation and results
</SectionTitle>
      <Paragraph position="0"> Some evaluation results over a few terms of the aforementioned corpus are presented in Table 1.</Paragraph>
      <Paragraph position="1"> The system was trained with features that inform of content words in the sentence context ( a81a83a82 a72 ,</Paragraph>
      <Paragraph position="3"> tags (a10a90a82 a72 , a10a90a82a56a84 , a10a90a82a56a85 , a10 a86 a72 , a10 a86 a84 , a10 a86 a85 ). For each word, the training set is divided in 10 folds, 9 for training and 1 for evaluation; ten tests were accomplished using a different fold for evaluation in each one (10-fold cross-validation). The accuracy results are the average accuracy on the ten tests for a word.</Paragraph>
      <Paragraph position="4"> Results comparison with previous work is difficult because there is different approaches to the WSD task (knowledge based methods, supervised and unsupervised statistical methods...) (Mihalcea and Moldovan, 1999) and many of them focus on a different set of words and sense definitions.</Paragraph>
      <Paragraph position="5"> Furthermore, the training corpus seems to be critical to the application of the learning to a specific  In the experiment presented here, the selection of the target words and the corpus used are the same that (Escudero et al., 2000a) where a Boosting method is proposed. In this paper a comparison between some WSD methods is shown.</Paragraph>
      <Paragraph position="6"> Boosting is the most successful method with a 68.1 % accuracy. Our method obtains lower accuracy but this is a first implementation and a better feature selection is expected to improve our results.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Semantic Pattern Learning
</SectionTitle>
    <Paragraph position="0"> Once the WSD phase has been performed, the semantic pattern extraction module can be executed. This module extracts head word pairs with subject-verb, verb-DObj and verb-IObj roles in the sentence and convert them into patterns formed by ontological concepts extracted from EuroWordNet.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 EuroWordNet's ontology
</SectionTitle>
      <Paragraph position="0"> EuroWordNet (Vossen, 2000) is a multilingual lexical database representing semantic relations among basic concepts for West European languages. In our case, we are going to work with isolated WordNets, it means, we won't take advantage of its multilingual feature, although we will use the ontology defined on it.</Paragraph>
      <Paragraph position="1"> EuroWordnet's ontology consists of 63 higher-level concepts and distinguishes three types of entities:a91 null 1stOrderEntity: any concrete entity (publicly) perceivable by the senses and located at any point in time, in a three-dimensional space, e.g.: vehicle, animal, substance, ob-</Paragraph>
      <Paragraph position="3"> 2ndOrderEntity: any Static Situation (property, relation) or Dynamic Situation, which cannot be grasped, heard, seen, felt as an independent physical thing. They can be located in time and occur or take place rather than exist, e.g.: happen, be, have, begin, end, cause, result, continue, occur..</Paragraph>
      <Paragraph position="4">  3rdOrderEntity: any unobservable proposition which exists independently of time and space. They can be true or false rather than real. They can be asserted or denied, remembered or forgotten, e.g.: idea, thought, information, theory, plan.</Paragraph>
      <Paragraph position="5"> These ontological concepts, associated to each synset from EuroWordNet, give semantic properties to these synsets that can be used, as we will see in the nexts sections, for improving the information source in Natural Language Processing tasks.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 The Learning Process
</SectionTitle>
      <Paragraph position="0"> From each clause, the module extracts the verb and (if exists) its subject, its direct object and its indirect object. With these elements, three possible pairs can be formed using the verb and the noun head of the aforementioned syntactic components. The verb head and the noun head are looked up in EuroWordNet's ontology using the correct sense previously selected. This query generates three possible ontological pairs that define, for each clause, the semantic concept associated to the main syntactic elements.</Paragraph>
      <Paragraph position="1"> Sentence (2) corresponds to a fragment extracted from a training corpus in English.</Paragraph>
      <Paragraph position="2"> (2) The ministera92 gavea93 explanationsa94 to the Governmenta94 .</Paragraph>
      <Paragraph position="3"> As shown in section 2, the output of the parser generates the next functional entities: Verb: give Subject head: minister D.Obj. head: explanations I.Obj. head: Government The superscripts indicate the correct sense in EuroWordNet for each word. After consulting EuroWordNet the semantic patterns formed are: Subj|V: Human,Occupation|Communication V|DObj: Communication|Agentive,Mental V|IObj: Communication|Group,Human These patterns will be stored in their corresponding files in order to be consulted later by the NLP task.</Paragraph>
      <Paragraph position="4"> This process is completely automatic and the error rate in the pattern extraction come from the aforementioned errors in the WSD and parsing phases.</Paragraph>
      <Paragraph position="5"> This strategy defined just as it has been done is, in principle, a little bit naive. Obviously, this is the single basis for the approach, but depending on the application, it can be combined with more sophisticated methods to improve its effectiveness. In this way, it is possible to make more elaborated combinations of ontological concepts to form new branches in the ontology defined by EuroWordNet.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Applying the method to anaphora
</SectionTitle>
    <Paragraph position="0"> resolution Since the aforementioned semantic patterns reveal the semantic behaviour of the main textual elements, this Natural Language learning process can be applied to any task that involves text understanding. null One possible application in this way could be the anaphora resolution problem, one of the most active research areas in Natural Language Processing. null The comprehension of anaphora is an important process in any NLP system, and it is among the toughest problems to solve in Computational Linguistics and NLP. According to Hirst (Hirst, 1981): &amp;quot;Anaphora, in discourse, is a device for making an abbreviated reference (containing fewer bits of disambiguating information, rather than being lexically or phonetically shorter) to some entity (or entities) in the expectation that the receiver of the discourse will be able to disabbreviate the reference and, thereby, determine the identity of the entity.&amp;quot; The reference to an entity is generally called an anaphor (e.g. a pronoun), and the entity to which the anaphor refers is its referent or antecedent. For instance, in the sentence &amp;quot;Johna40 ate an apple. Hea40 was hungry&amp;quot;, the pronoun he is the anaphor and it refers to the antecedent John.</Paragraph>
    <Paragraph position="1"> Traditionally, some of the most relevant approaches to solve anaphora have been those called poor-knowledge approaches. They use limited knowledge (lexical, morphological and syntactic information sources) for the detection of the correct antecedent. These proposals have report high success rates for English (89.7%) (Mitkov, 1998) and for Spanish (83%) (Ferr'andez et al., 1999).</Paragraph>
    <Paragraph position="2"> Taking this basis, it is possible to improve the results of a resolution method adding other sources such us semantic, pragmatic, world-knowledge or indeed statistical information.</Paragraph>
    <Paragraph position="3"> We have explored the use of semantic information extracted from an ontology and its application to the anaphora resolution proccess. This additional source has give good results on restricted texts (Azzam et al., 1998). Nevertheless, its application on unrestricted texts has not been so satisfactory, mainly due to the lack of adequate and available lexical resources. Due to this, we consider that the pattern learning can complement the semantic source in order to establish additional criteria in the antecedent selection. In addition, we believe that an adequate selection of patterns can improve the success rate in anaphora resolution on unrestricted texts.</Paragraph>
    <Paragraph position="4"> Each pattern contributes a compatibility feature between two syntactic elements. The whole set of patterns is a knowledge tool that can be consulted in order to define the compatibility between a pronoun and a candidate according to their syntactic role (subject, direct object and indirect object) and their relation with the verb. So, looking up the concepts associated to the antecedents of the pronoun and the verb, and using the syntactic relation between the pronoun and its verb, the semantic patterns can provide a compatibility degree to help the selection of the antecedent. A method oriented to anaphora resolution that uses these kinds of patterns extracted from two ontologies is detailed in (Saiz-Noeda and Palomar, 2000).</Paragraph>
    <Paragraph position="5"> The benefit of this approach is shown in a classical example shown in (3).</Paragraph>
    <Paragraph position="6">  (3) [The monkey]a95 climbed [the tree]  to get [a coconut]a96 when [the sun]a97 was rising. Ita96 was ripe.</Paragraph>
    <Paragraph position="7"> In this example, there are four possible antecedents of the pronoun 'it'. Basing the resolution only in morpho-syntactic information, it is not possible to solve it correctly. None of the candidates would be rejected regarding to their morphological features (all of them are masculine and singular). The classical approaches would determine that 'the monkey', for having the same sub-ject role as the pronoun, or 'the sun', for being the closest to the pronoun, could be the correct antecedent. Nevertheless, it is clear that the correct one in this case is 'the coconut'. Only a semantic pattern applied to this method could give additional information to solve it correctly.</Paragraph>
    <Paragraph position="8"> If we would extract ontological concepts for all the candidates, we would be able to compare the compatibility degree with the pronoun. One possible output could be the one in next table: Subject concept verb monkey animal be ripe tree plant be ripe coconut fruit be ripe sun star be ripe Examining this table it is easy to notice that, when applying this additional information, the suggestion of the system would be the correct antecedent, mainly based on a good previous pattern learning.</Paragraph>
    <Paragraph position="9"> This pronoun resolution system with additional information provided by the semantic patterns has been evaluated on a corpus formed by a set of texts containing news regarding the common topics in a newspaper (national, international, sports, society, economy, . . . ). Results obtained in the preliminary evaluation of this pronoun resolution reveal a success rate of 79.3% anaphors correctly solved. Although it has not been mentioned before, it is very important to have in mind that this method provides a fully automatic anaphora resolution process. Methods previously mentioned apply the resolution process over supervised steps to achieve such high rates. When the process is automated, the success rate decrease dramatically up to less than 55% (Mitkov, 2001).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML