File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2132_intro.xml
Size: 3,738 bytes
Last Modified: 2025-10-06 14:06:33
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2132"> <Title>A Multi-Neuro Tagger Using Variable Lengths of Contexts</Title> <Section position="2" start_page="0" end_page="802" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Words are often ambiguous in terms of their part of speech (POS). POS tagging disambiguates them, i.e., it assigns to each word the correct POS in the context of the sentence.</Paragraph> <Paragraph position="1"> Several kinds of POS taggers using rule-based (e.g., Brill et al., 1990), statistical (e.g., Merialdo, 1994), memory-based (e.g., Daelemans, 1996), and neural network (e.g., Schmid, 1994) models have been proposed for some languages.</Paragraph> <Paragraph position="2"> The correct rate of tagging of these models has reached 95%, in part by using a very large amount of training data (e.g., 1,000,000 words in Schmid, 1994). For many other languages (e.g., Thai, which we deal with in this paper), however, the corpora have not been prepared and there is not a large amount of training data available. It is therefore important to construct a practical tagger using as few training data as possible.</Paragraph> <Paragraph position="3"> In most of the statistical and neural network models proposed so far, the length of the contexts used for tagging is fixed and has to be selected empirically. In addition, all words in the input are regarded to have the same relevance in tagging. An ideal model would be one in which the length of the contexts can be automatically selected as needed in tagging and the words used in tagging can be given different relevances. A simple but effective solution is to introduce a multi-module tagger composed of multiple modules (basic taggers) with fixed but different lengths of contexts in the input and a selector (a selecting rule) to obtain the final answer. The tagger should also have a set of weights reflecting the different relevances of the input elements. If we construct such a multi-module tagger with statistical methods (e.g., n-gram models), however, the size of the n-gram table would be extremely large, as mentioned in Sec. 4.4. On the other hand, in memory-based models such as IGtree (Daelemans, 1996), the number of features used in tagging is actually variable, within the maximum length (i.e., the number of features spanning the tree), and the different relevances of the different features are taken into account in tagging. Tagging by this approach, however, may be computationally expensive if the maximum length is large. Actually, the maximum length was set at 4 in Daelemans's model, which can therefore be regarded as one using fixed length of contexts.</Paragraph> <Paragraph position="4"> This paper presents a multi-neuro tagger that is constructed using multiple neural networks, all of which can be regarded as single-neuro taggers with fixed but different lengths of contexts in inputs. The tagger performs POS tagging in different lengths of contexts based on longest context priority. Given that the target word is more relevant than any of the words in its context and that the words in context may have different relevances in tagging, each element of the input is weighted with information gains, i.e., numbers expressing the average amount of reduction of training set information entropy when the POSs of the element are known (Quinlan 1993). By using the trained results (weights) of the single-neuro taggers with short inputs as initial weights of those with long inputs, the training time for the latter ones can be greatly reduced and the cost to train a multi-neuro tagger is almost the same as that to train a single-neuro tagger.</Paragraph> </Section> class="xml-element"></Paper>