File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0102_metho.xml
Size: 22,593 bytes
Last Modified: 2025-10-06 14:09:04
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0102"> <Title>Non-locality all the way through: Emergent Global Constraints in the Italian Morphological Lexicon</Title> <Section position="4" start_page="0" end_page="1" type="metho"> <SectionTitle> 3 SOMs </SectionTitle> <Paragraph position="0"> SOMs can project input tokens, represented as data points of an n-dimensional input space, onto a generally two-dimensional output space (the map grid) where similar input tokens are mapped onto nearby output units. Each output unit in the map is associated with a distinct prototype vector, whose dimensionality is equal to the dimensionality of input vectors. As we shall see, a prototype vector is an approximate memory trace of recurring inputs, and plays the role of linking its corresponding output unit to a position in the input space. Accordingly, each output unit takes two positions: one in the input space (through its prototype vector) and one in the output space (its co-ordinates on the map grid).</Paragraph> <Paragraph position="1"> SOMs were originally conceived of as computer models of somatotopic brain maps. This explains why output units are also traditionally referred to as neurons. Intuitively, a prototype vector represents the memorised input pattern to which its associated neuron is most sensitive. Through learning, neurons gradually specialise in selectively being associated with specific input patterns. Moreover, memorised input patterns tend to cluster on the map grid so as to reflect natural classes in the input space.</Paragraph> <Paragraph position="2"> These interesting results are obtained through iterative unsupervised exposure to input tokens. At each learning step, a SOM is exposed to a single input token and goes through the following two stages: a) competitive neuron selection, and b) adaptive adjustment of prototype vectors. As we shall see in more detail in the remainder of this section, both stages are local and incremental in some crucial respects.</Paragraph> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 3.1 Stage 1: competitive selection </SectionTitle> <Paragraph position="0"> Let v x be the n-dimension vector representation of the current input. At this stage, the distance between each prototype vector and v x is computed.</Paragraph> <Paragraph position="1"> The output unit b that happens to be associated with the prototype vector v b closest to v x is selected as the best matching unit. More formally: {} ixbx vvvv [?][?][?] min , where is also known as the quantization error scored by v b relative to v x . Intuitively, this is to say that, although b is the map neuron reacting most sensitively to the current stimulus, b is not (yet) perfectly attuned to v</Paragraph> <Paragraph position="3"> Notably, the quantization error is a local distance function, as it involves two vector representations at a time. Hence, competitive selection is blind to general structural properties of the input space, such as the comparative role of each dimension in discriminating input tokens. This makes competitive selection prone to errors due to accidental or spurious similarity between the input vector and SOM prototype vectors.</Paragraph> </Section> <Section position="2" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 3.2 Stage 2: adaptive adjustment </SectionTitle> <Paragraph position="0"> After the winner unit b is selected at time t, the SOM locally adapts prototype vectors to the current stimulus. Vector adaptation applies locally, within a kernel area of radius r, centred on the position of b on the map grid. Both v</Paragraph> <Paragraph position="2"> and the prototype vectors associated with b's kernel units are adjusted to make them more similar to</Paragraph> <Paragraph position="4"> in b's kernel and the input vector v</Paragraph> <Paragraph position="6"> following adaptive function is used</Paragraph> <Paragraph position="8"> is the neighbourhood kernel centred around the winner unit b at time t, a non-increasing function of both time and the distance between the input vi and the winner vector vb. As learning time progresses, however, h bi decreases, and prototype vector updates become less sensitive to input conditions, according to the following: This marks a notable difference between SOMs and other classical projection techniques such as Vector Analysis or Multi-dimensional Scaling, which typically work on the basis of global constraints on the overall distribution of input data (e.g. by finding the space projection that maximizes data variance/co-variance).</Paragraph> <Paragraph position="10"> are, respectively, the position of b and its kernel neurons on the map grid, and a(t) is the learning rate at time t, a monotonically decreasing function of t. Interaction of these functions simulates effects of memory entrenchment and proto-typicality of early input data.</Paragraph> </Section> <Section position="3" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 3.3 Summary </SectionTitle> <Paragraph position="0"> The dynamic interplay between locality and incrementality makes SOMs plausible models of neural computation and data compression. Their sensitivity to frequency effects in the distribution of input data allows the researcher to carefully test their learning behaviour in different time-bound conditions. Learning makes output units increasingly more reactive to already experienced stimuli and thus gradually more competitive for selection.</Paragraph> <Paragraph position="1"> If an output unit is repeatedly selected by systematically occurring input tokens, it becomes associated with a more and more faithful vector representation of a stimulus or class of stimuli, to become an attractor for its neighbouring area on the map.</Paragraph> <Paragraph position="2"> As a result, the most parsimonious global organisation of input data emerges that is compatible with a) the size of the map grid, b) the dimensionality of output units and c) the distribution of input data.</Paragraph> <Paragraph position="3"> This intriguing dynamics persuaded us to use SOMs to simulate the emergence of non-local lexical constraints from local patterns of interconnectivity between vector representations of full word forms. The Italian verb system offers a particularly rich material to put this hypothesis to the challenging test of a computer simulation.</Paragraph> </Section> </Section> <Section position="5" start_page="1" end_page="1" type="metho"> <SectionTitle> 4 The Italian Verb System </SectionTitle> <Paragraph position="0"> The Italian conjugation is a complex inflectional system, with a considerable number of classes of regular, subregular and irregular verbs exhibiting different probability densities (Pirrelli, 2000; Pirrelli and Battista, 2000). Traditional descriptive grammars (e.g. Serianni, 1988) identify three main conjugation classes (or more simply conjugations), characterised by a distinct thematic vowel (TV), which appears between the verb root and the inflectional endings. First conjugation verbs have the TV -a- (parl-a-re 'speak'), second conjugation verbs have the TV -e- (tem-e-re 'fear'), and third conjugation verbs -i- (dorm-i-re 'sleep'). The first conjugation is by far the largest class of verbs</Paragraph> </Section> <Section position="6" start_page="1" end_page="1" type="metho"> <SectionTitle> TYPE EXAMPLE ENGLISH GLOSS </SectionTitle> <Paragraph position="0"> (73% of all verbs listed in De Mauro et al., 1993), almost all of which are regular. Only very few 1st conjugation verbs have irregularly inflected verb forms: andare 'go', dare 'give', stare 'stay' and fare 'do, make'. It is also the only truly productive class. Neologisms and foreign loan words all fall into it. The second conjugation has far fewer members (17%), which are for the most part irregular (around 95%). The third conjugation is the smallest class (10%). It is mostly regular (around 10% of its verbs are irregular) and only partially productive.</Paragraph> <Paragraph position="1"> Besides this macro-level of paradigmatic organisation, Italian subregular verbs also exhibit ubiquitous patterns of stem alternations, whereby a change in paradigm slot triggers a simultaneous change of verb stem and inflectional ending, as illustrated in Table 1 for the present indicative active. Pirrelli and Battista (2000) show that phenomena of Italian stem alternation, far from being accidental inconsistencies of the Italian morphophonology, define stable and strikingly convergent patterns of variable stem formation (Aronoff, 1994) throughout the entire verb system. The patterns partition subregular Italian verbs into equivalence micro-classes. In turn, this can be interpreted as suggesting that inter-class consistency plays a role in learning and may have exerted a convergent pressure in the history of the Italian verb system. If a speaker has heard a verb only in ambiguous inflections (i.e. inflections that are indicators of more than one verb micro-class), (s)he will need to guess, in order to produce unambiguous forms. Guesses are made on the basis of frequently attested verb micro-classes (Albright, 2002).</Paragraph> </Section> <Section position="7" start_page="1" end_page="2" type="metho"> <SectionTitle> 5 Computer simulations </SectionTitle> <Paragraph position="0"> The present experiments were carried out using the SOM toolbox (Vesanto et al., 2000), developed at the Neural Networks Research Centre of Helsinki University of Technology. The toolbox partly forced some standard choices in the training protocol, as discussed in more detail in the following sections. In particular, we complied with Kohonen's view of SOM training as consisting of two successive phases: a) rough training and b) fine-tuning. The implications of this view will be discussed in more detail later in the paper.</Paragraph> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 5.1 Input data </SectionTitle> <Paragraph position="0"> Our input data are inflected verb forms written in standard Italian orthography. Since Italian orthography is, with a handful of exceptions, consistently phonological, we expect to replicate the same results with phonologically transcribed verb forms.</Paragraph> <Paragraph position="1"> Forms are incrementally sampled from a training data set, according to their probability densities in a free text corpus of about 3 million words. Input data cover a fragment of Italian verb inflection, including, among others, present indicative active, future indicative active, infinitive and past participle forms, for a total of 10 different inflections. The average length of training forms is 8.5, with a max value of 18.</Paragraph> <Paragraph position="2"> Following Plunkett and Marchman (1993), we assume than the map is exposed to a gradually growing lexicon. At epoch 1, the map learns inflected forms of the 5 most frequent verb types. At each ensuing epoch, five more verb types are added to the training data, according to their rank in a list of decreasingly frequent verb types. As an overall learning session consists of 100 epochs, the map is eventually exposed to a lexicon of 500 verb types, each seen in ten different inflections.</Paragraph> <Paragraph position="3"> Although forms are sampled according to their corpus distributions, we hypothesise that the range of inflections in which verb tokens are seen by the map remains identical across verb types. This is done to throw paradigmatic effects in sharper relief and responds to the (admittedly simplistic) assumption that the syntactic patterns forming the linguistic input to the child do not vary across verb types.</Paragraph> <Paragraph position="4"> Each input token is localistically encoded as an 8*16 matrix of values drawn from the set {1, -1}.</Paragraph> <Paragraph position="5"> Column vectors represent characters, and rows give the random encoding of each character, ensuring maximum independence of character vector representations. The first eight columns in the matrix represent the first left-aligned characters of the form in question. The remaining eight columns stand for the eight (right-aligned) final characters of the input form.</Paragraph> </Section> <Section position="2" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 5.2 Training protocol </SectionTitle> <Paragraph position="0"> At each training epoch, the map is exposed to a total of 3000 input tokens. As the range of different inflected forms from which input tokens are sampled is fairly limited (especially at early epochs), forms are repeatedly shown to the map.</Paragraph> <Paragraph position="1"> Following Kohonen (1995), a learning epoch consists of two phases. In the first rough training phase, the SOM is exposed to the first 1500 tokens. In this phase, values of a (the learning rate) and neighbourhood kernel radius r are made vary as a linear decreasing function of the time epoch, from max a = 0.1 and r = 20 (epoch 1), to a = 0.02 and r = 10 (epoch 100). In the second fine-tuning phase of each epoch, on the other hand, a is kept to 0.02 and r = 3.</Paragraph> </Section> <Section position="3" start_page="1" end_page="2" type="sub_section"> <SectionTitle> 5.3 Simulation 1: Critical transitions in lexi- cal organisation </SectionTitle> <Paragraph position="0"> Figures 1 and 2 contain snapshots of the Italian verb map taken at the beginning and the end of training (epochs 1 and 100). The snapshots are Unified distance matrix (U-matrix, Ultsch and Siemon, 1990) representations of the Italian SOM.</Paragraph> <Paragraph position="1"> They are used to visualise distances between neurons. In a U-matrix representation, the distance between adjacent neurons is calculated and presented with different colourings between adjacent positions on the map. A dark colouring between neurons signifies that their corresponding prototype vectors are close to each other in the input space. Dark colourings thus highlight areas of the map whose units react consistently to the same stimuli. A light colouring between output units, on the other hand, corresponds to a large distance (a gap) between their corresponding prototype vectors. In short, dark areas can be viewed as clusters, and light areas as chaotically reacting cluster separators. This type of pictorial presentation is useful when one wants to inspect the state of knowledge developed by the map through learning. null For each epoch, we took two such snapshots: i) one of prototype vector dimensions representing the initial part of a verb form (approximately its verb root, Figures 1.a and 2.a), and ii) one of prototype vector dimensions representing the verb final part (approximately, its inflectional endings, Data storage on a Kohonen map is a dynamic process whereby i) output units tend to consistently become more reactive to classes of input data, and ii) vector prototypes which are adjacent in the input space tend to cluster in topologically connected subareas of the map.</Paragraph> <Paragraph position="2"> Self-organisation is thus an emergent property, based on local (both in time and space) principles of prototype vector adaptation. At the outset, the map is a tabula rasa, i.e. it has no notion whatsoever of Italian inflectional morphology. This has two implications. First, before training sets in, output units are associated with randomly initialised sequences of characters. Secondly, prototype vectors are randomly associated with map neurons, so that two contiguous neurons on the map may be sensitive to very different stimulus patterns. null Figure 1 shows that, after the first training epoch, the map started by organising memorised input patterns lexically, grouping them around their (5) roots. Each root is an attractor of lexically related stimuli, that nonetheless exhibit fairly heterogeneous endings (see Figure 1.b).</Paragraph> <Paragraph position="3"> At learning epoch 100, on the other hand, the topological organisation of the verb map is the mirror image of that at epoch 10 (Figures 2.a and 2.b). In the course of learning, root attractors are gradually replaced by ending attractors. Accordingly, vector prototypes that used to cluster around their lexical root appear now to stick together by morpho-syntactic categories such as tense, person and number. One can conceive of each connected dark area of map 2.b as a slot in an abstract inflectional paradigm, potentially associated with many forms that share an inflectional ending but differ in their roots.</Paragraph> <Paragraph position="4"> The main reason for this morphological organisation to emerge at a late learning stage rests in the distribution of training data. At the beginning, the map is exposed to a small set of verbs, each of which is inflected in 10 different forms. Forms with the same ending tend to be fewer than forms with the same root. As the verb vocabulary grows (say of the order of about 50 different verbs), however, the principles of morphological (as opposed to lexical) organisation allow for more compact and faithful data storage, as reflected by a significant reduction in the map average quantization error (Figure 3). Many different forms can be clustered around comparatively few endings, and the latter eventually win out as local paradigmatic attractors.</Paragraph> <Paragraph position="5"> Figure 4 (overleaf) is a blow-up of the map area associated with infinitive and past participle endings. The map shows the content of the last three characters of each prototype vector. Since past participle forms occur in free texts more often than infinitives, they have a tendency to take a proportionally larger area of the map (due to the so-called magnification factor). Interestingly enough, past participles ending in -ato occupy one third of the whole picture, witnessing the prominent role played by regular first conjugation verbs in the past participle inflection.</Paragraph> <Paragraph position="6"> Another intriguing feature of the map is the way the comparatively connected area of the past participle is carved out into tightly interconnected micro-areas, corresponding to subregular verb forms (e.g. corso 'run', scosso 'shaken' and chiesto 'answered'). Rather than lying outside of the morpho-phonological realm (as exceptions to the &quot;TV + to&quot; default rule), subregular forms of this kind seem here to draw the topological borders of the past participle domain, thus defining a continuous chain of morphological family resemblances. Finally, by analogy-based continuity, the map comes to develop a prototype vector for the non existing (but paradigmatically consistent) past participle ending -eto.</Paragraph> <Paragraph position="7"> This &quot;spontaneous&quot; over-generalization is the by-product of graded, overlapping morpheme-based memory traces.</Paragraph> <Paragraph position="8"> In general, stem frequency may have had a retardatory effect on the critical transition from a lexical to a paradigm-based organisation. For the same reason, high-frequency forms are eventually memorised as whole words, as they can successfully counteract the root blurring effect produced by the chaotic overlay of past participle forms of different verbs, which are eventually attracted to the same map area. This turns out to be the case for very frequent past participles such as stato 'been' and fatto 'done'. As a final point, a more detailed analysis of memory traces in the past participle area of the map is likely to highlight significant stem patterns in the subregular microclasses. If confirmed, this should provide fresh evidence supporting the existence of prototypical morphonological stem patterns consistently selecting specific subregular endings (Albright, 2002).</Paragraph> </Section> <Section position="4" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 5.4 Simulation 2: Second level map </SectionTitle> <Paragraph position="0"> A SOM projects n-dimensional data points onto grid units of reduced dimensionality (usually 2).</Paragraph> <Paragraph position="1"> We can take advantage of this data compression to train a new SOM with complex representations consisting of the output units of a previously trained SOM. The newly trained SOM is a second level projection of the original data points.</Paragraph> <Paragraph position="2"> To test the consistency of the paradigm-based organisation of the map in Figure 2, we trained a While Italian regular 1st and 3rd conjugation verbs present a thematic vowel in their past participle endings (-ato and -ito respectively), regular 2 conjugation past participles (TV -e-) end, somewhat unexpectedly, in -uto.</Paragraph> <Paragraph position="3"> novel SOM with verb type vectors. Each such vector contains all 10 inflected forms of the same verb type, encoded through the co-ordinates of their best-matching units in the map grid of Figure Besides, we can identify other micro-areas, somewhat orthogonal to the main ones.The most significant such micro-class (circled by a dotted line) contains so-called [g]-inserted verbs (Pirrelli, 2000; Fanciullo, 1998), whose forms exhibit a characteristic [g]/0 stem alternation, as in vengo/venite 'I come, you come (plur.)' and tengo/tenete 'I have/keep, you have/keep (plur.)'. The class straddles the 2nd and 3rd conjugation areas, thus pointing to a convergent phenomenon affecting a portion of the verb system (the present indicative and subjunctive) where the distinction between 2nd and 3rd conjugation inflections is considerably (but not completely) blurred. All in all, Italian verbs appear to fall not only into equivalence classes based on the selection of inflectional endings (traditional conjugations), but also into homogeneous micro-classes reflecting processes of variable stem formation.</Paragraph> <Paragraph position="4"> Identification of the appropriate micro-class is a crucial problem in Italian morphology learning.</Paragraph> <Paragraph position="5"> Our map appears to be in a position to tackle it reliably.</Paragraph> <Paragraph position="6"> Note finally the very particular position of the verb stare 'stay' on the grid. Although stare is a 1st conjugation verb, it selects some 2nd conjugation endings (e.g. stessimo 'that we stayed (subj.)' and stette '(s)he stayed'). This is captured in the map, where the verb is located halfway between the 1st and 2nd conjugation areas.</Paragraph> </Section> </Section> class="xml-element"></Paper>