File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0720_metho.xml
Size: 6,168 bytes
Last Modified: 2025-10-06 14:07:23
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0720"> <Title>Genetic Algorithms for Feature Relevance Assignment in Memory-Based Language Processing</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Memory-Based Language </SectionTitle> <Paragraph position="0"> mans, van den Bosch, and Zavrel, 1999) is based on the idea that language acquisition should be seen as the incremental storage of exemplars of specific tasks, and language processing as analogical reasoning on the basis of these stored exemplars. These exemplars take the form of a vector of, typically, nominal features, describing a linguistic problem and its context, and an associated class symbol representing the solution to the problem. A new instance is categorized on the basis of its similarity with a memory instance and its associated * Research funded by CELE, S.AI.L Trust V.Z.W., Ieper, Belgium.</Paragraph> <Paragraph position="1"> class.</Paragraph> <Paragraph position="2"> The basic algorithm we use to calculate the distance between two items is a variant of IB1 (Aha, Kibler, and Albert, 1991). IB1 does not solve the problem of modeling the difference in relevance between the various sources of information. In an MBLP approach, this can be overcome by means of feature weighting.</Paragraph> <Paragraph position="3"> The IBi-IG algorithm uses information gain to weight the cost of a feature value mismatch during comparison. IGTREE is a variant in which an oblivious decision tree is created with features as tests, and in which tests are ordered according to information gain of the associated features. In this case, the accuracy of the trained system is very much dependent on a good feature ordering. For all variants of MBLP discussed here, feature selection can also improve both accuracy and efficiency by discarding some features altogether because of their irrelevance or even counter-productivity in learning to solve the task. In our experiments we will use a relevance assignment method that radically differs from information-theoretic measures: genetic algorithms.</Paragraph> </Section> <Section position="3" start_page="0" end_page="103" type="metho"> <SectionTitle> 2 Genetic Algorithms for Assigning </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="103" type="sub_section"> <SectionTitle> Relevance </SectionTitle> <Paragraph position="0"> In the experiments, we linked our memory-based learner TIMBL 1 to PGAPACK 2. During the weighting experiments a gene corresponds to a specific real-valued feature-weight (we will indicate this by including GA in the algorithm name, i.e. IB1-GA and GATREE, cf. IBi-IG and IGTREE).</Paragraph> <Paragraph position="1"> algorithms are described in more detail in (Daelemans et al., 1999).</Paragraph> <Paragraph position="2"> 2A software environment for evolutionary computation developed by D. Levine, Argonne National Laboratory, available from ftp://ftp.mcs.anl.gov/pub/pgapack/ In the case of selection the string is composed of binary values, indicating presence or absence of a feature (we will call this GASEL). The fitness of the strings is determined by running the memory-based learner with each string on a validation set, and returning the resulting accuracy as a fitness value for that string. Hence, both weighting and selection with the GA is an instance of a wrapper approach as opposed to a filter approach such as information gain (Kohavi and John, 1995).</Paragraph> <Paragraph position="3"> For comparison, we include two popular classical wrapper methods: backward elimination selection (BASEL) and forward selection (FOSEL). Forward selection starts from an empty set of features and backward selection begins with a full set of features. At each further addition (or deletion, for BASEL) the feature with the highest accuracy increase (resp. lowest accuracy decrease) is selected, until improvement stalls (resp. performance drops).</Paragraph> <Paragraph position="4"> During the morphology experiment the population size was 50, but for prediction of unknown words it was set to 16 because the larger dataset was computationally more demanding.</Paragraph> <Paragraph position="5"> The populations were evolved for a maximum of 200 generations or stopped when no change had occurred for over 50 generations. Parameter settings for the genetic algorithm were kept constant: a two-point crossover probability of 0.85, a mutation rate of 0.006, an elitist replacement strategy, and tournament selection.</Paragraph> </Section> <Section position="2" start_page="103" end_page="103" type="sub_section"> <SectionTitle> 2.1 Data </SectionTitle> <Paragraph position="0"> The first task 3 we consider is prediction of what diminutive suffix a Dutch noun should take on the basis of its form. There are five different possible suffix forms (the classes). There are 12 features which contain information (stress and segmental information) about the structure of the last three syllables of a noun. The data set contains 3949 such instances.</Paragraph> <Paragraph position="1"> The second data set 4 is larger and contains 65275 instances, the task we consider here is part-of-speech (morpho-syntactic category) tagging of unknown words. The features used here are the coded POS-tags of two words before and two words after the focus word to be tagged, the corpus of English.</Paragraph> <Paragraph position="2"> last three letters of the focus word, and information on hyphenation and capitalisation. There are 111 possible classes (part of speech tags) to predict.</Paragraph> </Section> <Section position="3" start_page="103" end_page="103" type="sub_section"> <SectionTitle> 2.2 Method </SectionTitle> <Paragraph position="0"> We have used 10-fold-cross-validation in all experiments. Because the wrapper methods get their evaluation feedback directly from accuracy measurements on the data, we further split the trainfile for each fold into 2/3 sub-trainset and a 1/3 validation set. The settings obtained by this are then tested on the test set of that fold.</Paragraph> </Section> </Section> class="xml-element"></Paper>