File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-1209_intro.xml
Size: 5,361 bytes
Last Modified: 2025-10-06 14:06:44
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1209"> <Title>Knowledge Extraction and Recurrent Neural Networks: An Analysis of an Elman Network trained on a Natural Language Learning Task</Title> <Section position="4" start_page="0" end_page="0" type="intro"> <SectionTitle> 3. Results </SectionTitle> <Paragraph position="0"> 3.1. Graphical cluster analysis of the network having two hidden units.</Paragraph> <Paragraph position="1"> Graphical cluster analysis for the 2-hidden unit case is shown in Figure 1. Clusters are labeled with the current input. There is marked separation of clusters representing the high frequency inputs, NN, VB,/S, PR and IN. There is overlap of those clusters representing low frequency inputs. Although only 51% of the training set was learned by the network, there is evidence of further clustering based on the current and previous inputs. For example, Figure 2 shows cluster formation when NN is the current input and either AIL NN, PR, PS, VB or/S is the previous input. The PR,NN sub-cluster could be further broken down into sub-subclusters, representing the three input sequences /S,PR,NN and IN, PR,NN and VB,PR,NN.</Paragraph> <Paragraph position="2"> Schellhammer, Diederich, Towsey and Brugman 74 Knowledge Extraction and Recurrent Neural Nets input categories is the current input producing that activation. The activations tend to be clustered according to the input. Clusters representing high frequency categories such as NN, VB and/S are more dispersed and broken into sub-clusters that represent both the current and previous inputs. The performance of FSA's having 6 to 22 states is displayed in Table 2. The second column gives the total number of transitions permitted by the FSA. The third column gives the percent prediction score on the training data. Best score is 60% which compares with 69% of the training data learned by the original Elman network from which the hidden unit activations were obtained. The total prediction score tends to increase with the number of states.</Paragraph> <Paragraph position="3"> The fourth column of Table 2 gives the percentage of the 642 transitions in the data not permitted by the FSA's. The number of missing transitions is small, in all but two cases less than 2%. When a missing transition occurs, the FSA defaults to a 'rescue' state. The percent correct predictions for non-missing transitions are shown in the rightmost column of Table 2. They are little different from the total scores in most cases, simply because the number of missing transitions is so few.</Paragraph> <Paragraph position="4"> The transition diagram for the 8-state FSA is shown in Transitions with thick arrows have a frequency count >20, transitions displayed with thin arrows have a frequency count of 5 to 20 and transitions with a frequoney count <5 are not shown to preserve clarity.</Paragraph> <Paragraph position="5"> The states have been numbered in sequence according to the average word position of their associated inputs. For example states 2, 6 and 8 all occur following input of the NN category but they are distinguished in cluster analysis by the NN having an average word-position in the sentence of 3.1, 5.7 and 6.5 respectively.</Paragraph> <Paragraph position="6"> The states having highest correct prediction rate, $7 and $8, are associated with the ends of sentences. $7 is reached when the last category in a sentence is predicted to be NN and $8 occurs when the end-of-sentence is predicted.</Paragraph> <Paragraph position="7"> Many of the transitions in the FSA's occur with low frequency and could be primed with minimal loss of performance. For example, the FSA with ten states has 45 permitted transitions. When transitions having a frequency <5 are pruned, the number of missing transitions jumps from 1.7% to 10.28% but the prediction score drops only slightly from 53% to 51% (Table 3).</Paragraph> <Paragraph position="8"> Finally we look at the effect of the state chosen as the rescue state for the FSA having 10 states. The default state is the state closest to the beginning of the sentence, in this case state 2.</Paragraph> <Paragraph position="9"> Schellhammer, Diederich, Towsey and Brugman 76 Knowledge Extraction and Recurrent Neural Nets The percent score of correct predictions is greater only in two other cases, that is when states 7 and 9 are used as the 'rescue' state. Changing the 'rescue' state also changes the number of transitions that the FSA does not recognise. However only in ilae case of rescue state 8 is this number less than for rescue state 2. It is apparent that a decrease in the number of missing transitions does not necessarily lead to a higher score.</Paragraph> <Paragraph position="10"> 3.3. Weight initialisation using domain knowledge Setting links between the hidden layer and the NN and /S input units has a beneficial effect during the early * stages of network training. As indicated by the faster initial decrease in prediction error, the optimum number of set links from inputs NN and/S was 5 or 8 (Figure 3). Table 4 The effect of choice of'rescue' state on the methods section 2.4 for definition of this term) to the NN and the/S input units. I, 5 and 8 links means that this number of hidden units has a set link to the NN and/S input units.</Paragraph> </Section> class="xml-element"></Paper>