File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/91/p91-1018_evalu.xml

Size: 8,781 bytes

Last Modified: 2025-10-06 14:00:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="P91-1018">
  <Title>regier@cogsci.Berkeley.ED U * TR &amp;quot;Above&amp;quot; Figure 1: Learning to Associate Scenes with Spatial Terms</Title>
  <Section position="7" start_page="141" end_page="144" type="evalu">
    <SectionTitle>
5 Details
</SectionTitle>
    <Paragraph position="0"> The system described in this section learns perceptually-grounded semantics for spatial terms using the</Paragraph>
    <Paragraph position="2"> quiekprop 3 algorithm \[Fahlman, 1988\], a variant on back-propagation \[Rumelhart and McClelland, 1986\].</Paragraph>
    <Paragraph position="3"> This presentation begins with an exposition of the representation used, and then moves on to the specific network architecture, and the basic ideas embodied in it.</Paragraph>
    <Paragraph position="4"> The weakening of evidence from implicit negative instances is then discussed.</Paragraph>
    <Section position="1" start_page="142" end_page="142" type="sub_section">
      <SectionTitle>
5.1 Representation of the LM and TR
</SectionTitle>
      <Paragraph position="0"> As mentioned above, the representation scheme for the LM comprises the following: * A bitmap in which those pixels corresponding to the interior of the LM are the only ones set.</Paragraph>
      <Paragraph position="1"> * The z, y coordinates of several &amp;quot;key points&amp;quot; of the LM, where z and y each vary between 0.0 and 1.0, and indicate the location of the point in question as a fraction of the width or height of the image.</Paragraph>
      <Paragraph position="2"> The key points currently being used are the center of mass (CoM) of the LM, and the four corners of the LM's bounding box (UL: upper left, UR: upper right, LL: lower left, LR: lower right).</Paragraph>
      <Paragraph position="3"> The (punctate) TR is specified by the z, V coordinates of the point.</Paragraph>
      <Paragraph position="4"> The activation of an output node of the system, once trained for a particular spatial concept, represents the appropriateness of using the spatial term in describing the TR's location, relative to the LM.</Paragraph>
    </Section>
    <Section position="2" start_page="142" end_page="142" type="sub_section">
      <SectionTitle>
5.2 Architecture
</SectionTitle>
      <Paragraph position="0"> Figure 6 presents the architecture of the system. The eight spatial terms mentioned above are learned simultaneously, and they share hidden-layer representations.</Paragraph>
      <Paragraph position="1">  Consider the right-hand part of the network, which receives input from the LM interior map. Each of the three nodes in the cluster labeled &amp;quot;I&amp;quot; (for interior) has a receptive field of five pixels.</Paragraph>
      <Paragraph position="2"> When a TR location is specified, the values of the five neighboring locations shown in the LM interior map, centered on the current TR location, are copied up to the five input nodes. The weights on the links between these five nodes and the three nodes labeled &amp;quot;I&amp;quot; in the layer above define the receptive fields learned. When the TR position changes, five new LM interior map pixels will be &amp;quot;viewed&amp;quot; by the receptive fields formed. This allows the system to detect the LM interior (or a border between interior and exterior) at a given point and to bring that to bear if that is a relevant semantic feature for the set of spatial terms being learned.</Paragraph>
      <Paragraph position="3">  The remainder of the network is dedicated to computing parameterized regions. Recall that a parameterized region is much the same as any other region which might be learned by a perceptron, except that the lines 3Quickprop gets its name from its ability to quickly converge on a solution. In most cases, it exhibits faster convergence than that obtained using conjugate gradient methods \[Fahlman, 1990\].</Paragraph>
      <Paragraph position="4"> which define the relevant half-planes are constrained to go through specific points. In this case, these are the key points of the LM.</Paragraph>
      <Paragraph position="5"> A simple two-input perceptron unit defines a line in the z, tt plane, and selects a half-plane on one side of it. Let wffi and w v refer to the weights on the links from the z and y inputs to the pereeptron unit. In general, if the unit's function is a simple threshold, the equation for such a line will be zw~ + wy = O, (1) i.e. the net input to the perceptron unit will be herin = actor. + yltO~. (2) Note that this line always passes through the origin: (0,0).</Paragraph>
      <Paragraph position="6"> If we want to force the line to pass through a particular point (zt,yt) in the plane, we simply shift the entire coordinate system so that the origin is now at (zt, yt). This is trivially done by adjusting the input values such that the net input to the unit is now ,,et,,, = (x - x,)w, + (V - V,)w,. (3) Given this, we can easily force lines to pass through the key points of an LM, as discussed above, by setting (zt, V~) appropriately for each key point. Once the system has learned, the regions will be parameterized by the coordinates of the key points, so that the spatial concepts will be independent of the size and position of any particular LM.</Paragraph>
      <Paragraph position="7"> Now consider the left-hand part of the network. This accepts as input the z, y coordinates of the TR location and the LM key points, and the layer above the input layer performs the appropriate subtractions, in line with equation 3. Now each of the nodes in the layer above that is viewing the TR in a different coordinate system, shifted by the amount specified by the LM key points.</Paragraph>
      <Paragraph position="8"> Note that in the BB cluster there is one node for each corner of the LM's bounding-box, while the CoM cluster has three nodes dedicated to the LM's center of mass (and thus three lines passing through the center of mass). This results in the computation, and through weight updates, the learning, of a parameterized region.</Paragraph>
      <Paragraph position="9"> Of course, the hidden nodes (labeled 'T') that receive input from the LM interior map are also in this hidden layer. Thus, receptive fields and parameterized regions are learned together, and both may contribute to the learned semantics of each spatial term. Further details can be found in \[Regier, 1990\].</Paragraph>
    </Section>
    <Section position="3" start_page="142" end_page="144" type="sub_section">
      <SectionTitle>
5.3 Implementing &amp;quot;Weakened&amp;quot; Mutual
Exclusivity
</SectionTitle>
      <Paragraph position="0"> Now that the basic architecture and representations have been covered, we present the means by which the evidence from implicit negative instances is weakened. It is assumed that training sets have been constructed using mutual exclusivity as a guiding principle, such that each negative instance in the training set for a given spatial term results from a positive instance for some other term.</Paragraph>
      <Paragraph position="1">  * Evidence from implicit negative instances is weakened simply by attenuating the error caused by these implicit negatives.</Paragraph>
      <Paragraph position="2"> * Thus, an implicit negative instance which yields an error of a given magnitude will contribute less to the weight changes in the network than will a positive instance of the same error magnitude.</Paragraph>
      <Paragraph position="3"> This is done as follows: Referring back to Figure 6, note that output nodes have been allocated for each of the spatial terms to be learned. For a network such as this, the usual error term in back-propagation is</Paragraph>
      <Paragraph position="5"> where j indexes over output nodes, and p indexes over input patterns.</Paragraph>
      <Paragraph position="6"> We modify this by dividing the error at each output node by some number/~j,p, dependent on both the node and the current input pattern.</Paragraph>
      <Paragraph position="8"> The general idea is that for positive instances of some spatial term, f~j,p will be 1.0, so that the error is not attenuated. For an implicit negative instance of a term, however, flj,p will be some value Atten, which corresponds to the amount by which the error signals from implicit negatives are to be attenuated.</Paragraph>
      <Paragraph position="9"> Assume that we are currently viewing input pattern p, a positive instance of &amp;quot;above&amp;quot;. 'then the target value for the &amp;quot;above&amp;quot; node will be 1.0, while the target values for all others will be 0.0, as they are implicit negatives.</Paragraph>
      <Paragraph position="10"> Here, flabove,p = 1.0, and fll,p = Atten, Vi ~ above.</Paragraph>
      <Paragraph position="11"> The value Atten = 32.0 was used successfully in the experiments reported here.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML