XML Viewer - w03-0426

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0426_intro.xml
Size: 3,981 bytes
Last Modified: 2025-10-06 14:01:54
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0426">
  <Title>Named Entity Recognition with Long Short-Term Memory</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In this paper, Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) is applied to named entity recognition, using data from the Reuters Corpus, English Language, Volume 1, and the European Corpus Initiative Multilingual Corpus 1.</Paragraph>
    <Paragraph position="1"> LSTM is an architecture and training algorithm for recurrent neural networks (RNNs), capable of remembering information over long time periods during the processing of a sequence.</Paragraph>
    <Paragraph position="2"> LSTM was applied to an earlier CoNLL shared task, namely clause identification (Hammerton, 2001) although the performance was significantly below the performance of other methods, e.g. LSTM achieved an fscore of 50.42 on the test data where other systems' fscores ranged from 62.77 to 80.44. However, not all training data was used in training the LSTM networks. Better performance has since been obtained where the complete training set was used (Hammerton, unpublished), yielding an fscore of 64.66 on the test data.</Paragraph>
    <Paragraph position="3"> 2 Representing lexical items An efficient method of representing lexical items is needed. Hammerton (2001; unpublished) employed lexical space (Zavrel and Veenstra, 1996) representations of the words which are derived from their co-occurrence statistics. Here, however, a different approach is used. A SARDNET (James and Miikkulainen, 1995), a self-organising map (SOM) for sequences, is trained to form representations of the words and the resulting representations reflect the morphology of the words.</Paragraph>
    <Paragraph position="4"> James and Miikkulainen (1995) provide a detailed description of how SARDNET operates. Briefly, the SARDNET operates in a similar manner to the standard SOM. It consists of a set of inputs and a set of map units. Each map unit contains a set of weights equal in size to the number of inputs. When an input is presented, the map unit with the closest weights to the input vector is chosen as the winner. When processing a sequence, this winning unit is taken out of the competition for subsequent inputs. The activation of a winning unit is set at 1 when it is first chosen and then multiplied by a decay factor (here set at 0.9) for subsequent inputs in the sequence. At the beginning of a new sequence all map units are made available again for the first input. Thus, once a sequence of inputs has been presented, the map units activated as winners indicate which inputs were presented and the activation levels of those units indicate the order of presentation. An advantage of SARDNET is that it can generalise naturally to novel words.</Paragraph>
    <Paragraph position="5"> The resulting representations are real-valued vectors, reflecting the size of the map layer in the SARDNET (enough to represent words of upto length a0 where a0 is the size of the map). A SARDNET was trained over a single presentation of all the distinct words that appear in the training and development data for English and a separate SARDNET was trained on all the distinct words appearing in the training data for German. The generalisation of the map to novel words was just as good with the German map as with the English map, suggesting training on the map only on the English training data words would make little difference to performance. Initially the neighbourhood was set to cover the whole SARDNET and the learning rate was set at 0.4. As each word was presented, the neighbourhood and learning rate were reduced in lin- null ear increments, so that at the end of training the learning rate was zero and the neighbourhood was 1. Both the English and German experiments used a SARDNET with 64 units.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML