XML Viewer - c73-1016

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/73/c73-1016_metho.xml
Size: 10,329 bytes
Last Modified: 2025-10-06 14:11:07
<?xml version="1.0" standalone="yes"?>
<Paper uid="C73-1016">
  <Title>GEar J. VAN DER STEEN A TI~EATMENT OF INDEPENDENT SEMANTIC</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
A TI~EATMENT OF INDEPENDENT SEMANTIC
COMPONENTS
1. A TREATMENT OF INDEPENDENT SEMANTIC COMPONENTS
</SectionTitle>
    <Paragraph position="0"> To distinguish things, we use terms which characterize them for us.</Paragraph>
    <Paragraph position="1"> For two balls it may be their color, for two people it may be their height, or their manner of speaking. In order to illustrate the differences in meaning for many words J. j. KATZ and J. A. FODOR (1963) proposed the use of &amp;quot;semantic characteristics &amp;quot;. They give an example for the meanings of man and ball: man- ...- (physical object) -- (human)- (adult) -- (male) ball~ --...- (social activity) -- (large) -- (assembly) ball~- ...- (physical object) D. BOLINGER (1965) proposed the systematizing of these characteristics, with hierarchic structures, so that the meaning of the word bachelor could be represented by a row of characteristics (fig. 1).</Paragraph>
    <Paragraph position="2"> bachelor human~-- ,. an~nal male &amp;quot;~&amp;quot;&amp;quot;~&amp;quot;~'-~ed ucand phocizm adult military hierarchic hirsute nonbecoming hierarchic permanent male unmated noble inferior adult inferior young dependent nubile proximate unmated young ~g.l.</Paragraph>
    <Paragraph position="3"> Here the meaning of a word is given by a refering it to other words. These words, in their turn, can be referenced by other words. There is the feeling that from here endless references will originate. Let us suppose that there are a number of elementary characteristics which can not be expressed in other characteristics. We shall call ~mmiP. lmWlmalr&amp;quot;uamm=- ~ &amp;quot;&amp;quot;=,=--,~ ~ w~ __~ .... -- ........ 202 GERT. J. VAN DER STEEN them el ... ez. The question if they correspond with any existing word or expression let be leaved as it is, just as the limitation of I. We shall represent the meaning of a word by the intensity of the presence of specific characteristics er If we construct a model in an/-dimensional vector-space with unit-vectors _el ... _ei we may represent the meaning of a word or expression W by the vector W = wx _e~ + w2_e~ + ... + +w~zwithw~&gt;O for i=1 ... L The common in the meaning of two words is the sum of the common in each of the basis characteristics.</Paragraph>
    <Paragraph position="4"> In our model this is for the vectors</Paragraph>
    <Paragraph position="6"> For I = 2 refer to fig. 2.</Paragraph>
    <Paragraph position="7"> C/~: I I Fig. 2.</Paragraph>
    <Paragraph position="8"> For the determination of the norm of the vectors we consider that the common of V and W is determined via their characteristics. Our consciousness can evaluate the factors v~ and w~ only one by one. Therefore we put as norm:</Paragraph>
    <Paragraph position="10"> by definition called the measure Of association between _V and W.</Paragraph>
    <Paragraph position="11"> (rain is the minimum-function, e.g. rain (5, 7)= 5).</Paragraph>
    <Paragraph position="12"> To test this model we designed two tests (G.J. VAN ,r~r STE~N,'1971). In the first test, individuals are asked to write down 12 words, starting with the word bird, and, relative to the associations between them, to indicate the measure of the association. This has to be a number between 0 and 10; ' 0' for: &amp;quot;no association &amp;quot;, ' 10' for: &amp;quot;syn-A TREATMENT OF INDEPENDENT SEMANTIC COMPONENTS 203 onym&amp;quot;. For an example: see table 1. For the Words W1 to WI~. in our model, we use the equations:</Paragraph>
    <Paragraph position="14"> for 1 &lt; nl &lt; N-1 nl &lt; n2 ~&lt; N (here N = 12) wherein the numbers v,l,, 2 are given. From these equations the unknowns wn i (i = 1, .... 1; n = 1 .... , N) have to be solved. At the same time, the number of characteristics I has to be determined. An upper limit for I is the number of equations: each association runs over a separate characteristic.</Paragraph>
    <Paragraph position="15"> We determine I and the unknown wnC/ by an iteration-process. Suppose that the factors wn~ (1 ~&lt; n ~&lt; N; 1 ~&lt; i &lt; I1) are determined</Paragraph>
    <Paragraph position="17"> with v m and ,,re+l) = (wnl, wn2). nl,n2 ~ l)nl,n2 Vnl,n2 .~, rain i=I~+1 Let us denote the sum of all v's in step/1 q- 1 with S, so</Paragraph>
    <Paragraph position="19"> To minimize I we try to solve the system with S as small as possible. By successively assuming that a specific wn,, is the smallest of all wn, 1 's we can determine for each of the suppositions the sum S.</Paragraph>
    <Paragraph position="20"> We choose now the wn,1 which belongs to the smallest sum S. If there are more sums S with this value then there are more refined criteria available. Suppose this is wnllc In the equation rain (wnl,l, wn2,1 ) q- !,~t1+11 = vl~0 * ~nl,n2 nl,n2 * we choose then wnl,~ = v,,1,2.&amp;quot;~I') Therewith Vnl,ne&amp;quot;(Z'+l) = 0. wn2z, will be determined later.</Paragraph>
    <Paragraph position="21"> In all equations wherein wnl,~ appears v,l,, 2''(z'+1) can now be determined. In the remaining equations we now apply the same process till there stays at last one equation, for instance ~(11+1) l)(lt) min (wn3iI, wn4,)-q-&amp;quot;,3,,~ = .3,,~ 204 GERT J. VAN DER $TEEN Here we choose wn3,,----wn41, = ,',3,,,4.~'(I') Therewith our iteration step for I1 has been ended. When all vc~,+l) ___ 0 then I---- I1 and the whole - ni,nj iteration-process has come to an end.</Paragraph>
    <Paragraph position="22"> The process is illustrated in table 2 for 4 words with associations</Paragraph>
    <Paragraph position="24"> chosen). With si we denote the sum which belongs to a w i which appears in a line with the lowest v.</Paragraph>
    <Paragraph position="25"> With some small modifications this solution-scheme can be used also if some equations have been deleted in the beginning; in others words: when some associations are not given. This is illustrated in table 3 with the same v's as in table 2, except for vl,3 which is omitted. The small modification concerns the calculation of si: we divide s i by the number of lines minus 1 in which wj. appears.</Paragraph>
    <Paragraph position="26"> Wc now try our model by omitting some of the given associations from one individual. According to the foregoing method we determine the number of characteristics I and the vector-representations of the words. From them we calculate the omitted associations with the aid of formula (1). For the discrepancies between the thus predicted and the omitted associations we can determine statistically an estimation. null If the associations are randomly given then the mean and the standard-deviation do agree indeed with their calculated values. If the associations are given by test-persons these numbers are significantly lower.</Paragraph>
    <Paragraph position="27"> There are interesting discrepancies if associations are left out which express an extra aspect of meaning. In a specific case the words bird, leg, table and chair were given among others. If the association between leg and bird was left out an association of 0 was predicted, as it should be. The number of characteristics was decreased by one.</Paragraph>
    <Paragraph position="28"> The evaluation of associations between given words by test-persons is subjective. This, however, plays no role here: the relations between a number of consciousness-contents are concerned. If the test-individual is not consistent in his evaluations then the discrepancies between the given and the predicted associations become greater. In practice there seems to be a good correlation of consistency in evaluation and the intelligence-level of the test-person.</Paragraph>
    <Paragraph position="29"> On a more reliable level are the extended observations of the language expressions of a test-individual. For this purpose a second test was designed. As source material 20 pages of the novel De verliezers of Anna Blaman were taken. With the aid of the programming language A TREATMENT OF INDEPENDENT SEMANTIC COMPONENTS 205 SNOBOL a frequency-table was made for all words in that piece&amp;quot; of the text. From them 20 words with a high frequency were chosen which are relevant wkh regard to each other. Then it was determined how many times each of the 20 words was found together with each of the other words in the same sentence. This number was taken as a measure of the associations between the words. If we leave out the 0-associations and predict their associations by the method of the first test our model will not be unreliable if we predict the value 0.</Paragraph>
    <Paragraph position="30"> Indeed, it appears that there are O's predicted, except in associations between nouns and the words mine and your which give values 2 and 3, and some, randomly distributed, exceptions. The method of prediction of 0-associations was chosen to avoid the rather crude measure of association. This measure was used because of the absence of a well-defined method to detect coherent subphrases. If two words never occur together in a phrase they will certainly never appear in the same sub-phrase.</Paragraph>
    <Paragraph position="31"> It will be interesting to try this model on the common usage of languages. The obtained vector-representations can be transferred to other characteristic systems by means of matrix-manipulations. A further extension lies in the determination of the representation of words in different natural languages with a mutual comparison and eventual transformation of the representations. The measure of association is critical here. This should be refined by using more knowledge about the syntactic structure of the sentences.</Paragraph>
    <Paragraph position="32"> A further restriction lies in the number of developed characteristics.</Paragraph>
    <Paragraph position="33"> For the first test this was approximately 13, for the second approximately 76. From these 76, the first 20 were the most relevant. The remaining characteristics served more to compensate for several small discrepancies. By chancing to a larger amount of language information the number of relevant characteristics will naturally increase. 1</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML