File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0828_intro.xml

Size: 3,684 bytes

Last Modified: 2025-10-06 14:02:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0828">
  <Title>TALP System for the English Lexical Sample Task</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
3 Features
</SectionTitle>
    <Paragraph position="0"> We have divided the features of the system in 4 categories: local, topical, knowledge-based and syntactic features. First section of table 1 shows the local features. The basic aim of these features is to modelize the information of the surrounding words of the target word. All these features are extracted from a 3-word-window centred on the target word. The features also contain the position of all its components. To obtain Part-of-Speech and lemma for each word, we used FreeLing 2. Most of these features have been doubled for lemma and word form.</Paragraph>
    <Paragraph position="1"> Three types of Topical features are shown in the second section of table 1. Topical features try to obtain non-local information from the words of the context. For each type, two overlapping sets of redundant topical features are considered: one extracted from a 10-word-window and another considering all the example.</Paragraph>
    <Paragraph position="2"> The third section of table 1 presents the knowledge-based features. These features have been obtained using the knowledge contained into the Multilingual Central Repository (MCR) of the MEANING project3 (Atserias et al., 2004). For each example, the feature extractor obtains, from each context, all nouns, all their synsets and their associated semantic information: Sumo labels, domain labels, WordNet Lexicographic Files, and EuroWord-Net Top Ontology. We also assign to each label a weight which depends on the number of labels assigned to each noun and their relative frequencies in the whole WordNet. For each kind of semantic knowledge, summing up all these weights, the program finally selects those semantic labels with higher weights.</Paragraph>
    <Paragraph position="3">  local feats.</Paragraph>
    <Paragraph position="4"> Feat. Description form form of the target word locat all part-of-speech / forms / lemmas in the local context coll all collocations of two part-of-speech / forms / lemmas coll2 all collocations of a form/lemma and a part-of-speech (and the reverse) first form/lemma of the first noun / verb / adjective / adverb to the left/right of the target word topical feats.</Paragraph>
    <Paragraph position="5"> Feat. Description topic bag of forms/lemmas sbig all form/lemma bigrams of the example comb forms/lemmas of consecutive (or not) pairs of the open-class-words in the example knowledge-based feats.</Paragraph>
    <Paragraph position="6"> Feat. Description f sumo first sumo label a sumo all sumo labels f semf first wn semantic file label a semf all wn semantic file labels f tonto first ewn top ontology label a tonto all ewn top ontology labels f magn first domain label a magn all domain labels syntactical feats.</Paragraph>
    <Paragraph position="7"> Feat. Description tgt mnp syntactical relations of the target word from minipar rels mnp all syntactical relations from minipar yar noun NounModifier, ObjectTo, SubjectTo for nouns yar verb Object, ObjectToPreposition, Preposition for verbs yar adjs DominatingNoun for adjectives  Finally, the last section of table 1 describes the syntactic features which contains features extracted using two different tools: Dekang Lin's Minipar4 and Yarowsky's dependency pattern extractor. null It is worth noting that the set of features presented is highly redundant. Due to this fact, a feature selection process has been applied, which is detailed in the next section.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML