File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/p00-1065_evalu.xml
Size: 8,823 bytes
Last Modified: 2025-10-06 13:58:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P00-1065"> <Title>Automatic Labeling of Semantic Roles</Title> <Section position="6" start_page="0" end_page="8" type="evalu"> <SectionTitle> 5 Results </SectionTitle> <Paragraph position="0"> Results for di#0Berent methods of combining the probability distributions described in the previous section are shown in Table 4. The linear interpolation method simply averages the probabilities given by each of the distri-</Paragraph> <Paragraph position="2"> The results shown in Table 4 re#0Dect equal values of #15 for each distribution de#0Cned for the relevant conditioning event #28but excluding distributions for which the conditioning event was not seen in the training data#29.</Paragraph> <Paragraph position="3"> pt phrase type, gf grammatical function, h head word, and t target word, or predicate.</Paragraph> <Paragraph position="5"> The variable gf is only de#0Cned for noun phrases. The roles de#0Cned for the removing frame in the motion domain are: Agent, Theme, CoTheme #28#5C... had been abducted with him&quot;#29 and Manner.</Paragraph> <Paragraph position="6"> Other schemes for choosing values of #15, including giving more weight to distributions for which more training data was available, were found to have relatively little e#0Bect. We attribute this to the fact that the evaluation depends only the the ranking of the probabilities rather than their exact values.</Paragraph> <Paragraph position="8"> In the #5Cbacko#0B&quot; combination method, a lattice was constructed over the distributions in Table 2 from more speci#0Cc conditioning events to less speci#0Cc, as shown in Figure 3. The less speci#0Cc distributions were used only when no data was present for any more speci#0Cc distribution. As before, probabilities were combined with both linear interpolation and a geometric mean.</Paragraph> <Paragraph position="9"> The #0Cnal system performed at 80.4#25 accuracy, which can be compared to the 40.9#25 achieved by always choosing the most probable role for each target word, essentially chance performance on this task. Results for this system on test data, held out during development of the system, are shown in Table</Paragraph> <Section position="1" start_page="8" end_page="8" type="sub_section"> <SectionTitle> 5.1 Discussion </SectionTitle> <Paragraph position="0"> It is interesting to note that looking at a constituent's position relative to the target word along with active#2Fpassive information performed as well as reading grammatical function o#0B the parse tree. A system using grammatical function, along with the head word, phrase type, and target word, but no passive information, scored 79.2#25. A similar system using position rather than grammatical function scored 78.8#25 |nearly identical performance. However, using head word, phrase type, and target word without either position or grammatical function yielded only 76.3#25, indicating that while the two features accomplish a similar goal, it is important to include some measure of the constituent's syntactic relationship to the target word. Our #0Cnal system incorporated both features, giving a further, thoughnot signi#0Ccant, improvement. As a guideline for interpreting these results, with 8176 observations, the threshold for statistical signifance with p#3C:05 is a 1.0#25 absolute di#0Berence in performance.</Paragraph> <Paragraph position="1"> Use of the active#2Fpassive feature made a further improvement: our system using position but no grammatical function or passive information scored 78.8#25; adding passive information brought performance to 80.5#25.</Paragraph> <Paragraph position="2"> Roughly 5#25 of the examples were identi#0Ced as passive uses.</Paragraph> <Paragraph position="3"> Head words proved to be very accurate indicators of a constituent's semantic role when data was available for a given head word, con#0Crming the importance of lexicalization shown in various other tasks. While the distribution P#28rjh;t#29 can only be evaluated for 56.0#25 of the data, of those cases it gets 86.7#25 correct, without use of any of the syntactic features.</Paragraph> </Section> <Section position="2" start_page="8" end_page="8" type="sub_section"> <SectionTitle> 5.2 Lexical Clustering </SectionTitle> <Paragraph position="0"> In order to address the sparse coverage of lexical head word statistics, an experiment was carried out using an automatic clustering of head words of the type described in #28Lin, 1998#29. A soft clustering of nouns was performed by applying the co-occurrence model of #28Hofmann and Puzicha, 1998#29 to a large corpus of observed direct object relationships between verbs and nouns. The clustering was computed from an automatically parsed version of the British National Corpus, using the parser of #28Carroll and Rooth, 1998#29. The experimentwas performed using only frame elements with a noun as head word. This allowed a smoothed estimate of P#28rjh;nt;t#29to be computed as</Paragraph> <Paragraph position="2"> ming over the automatically derived clusters c to which a nominalhead word h might belong.</Paragraph> <Paragraph position="3"> This allows the use of head word statistics even when the headword h has not been seen in conjunction was the target word t in the training data. While the unclustered nominal head word feature is correct for 87.6#25 of cases where data for P#28rjh;nt;t#29 is available, such data was available for only 43.7#25 of nominal head words. The clustered head word alone correctly classi#0Ced 79.7#25 of the cases where the head word was in the vocabulary used for clustering; 97.9#25 of instances of nominal head words were in the vocabulary. Adding clustering statistics for NP constituents into the full system increased overall performance from 80.4#25 to 81.2#25.</Paragraph> </Section> <Section position="3" start_page="8" end_page="8" type="sub_section"> <SectionTitle> 5.3 Automatic Identi#0Ccation of Frame Element Boundaries </SectionTitle> <Paragraph position="0"> The experiments described above have used human annotated frame element boundaries |here we address how well the frame elements can be found automatically. Experiments were conducted using features similar to those described above to identify constituents in a sentence's parse tree that were likely to be frame elements. The system was given the human-annotated target word and the frame as inputs, whereas a full language understanding system would also identify which frames come into play in a sentence |essentially the task of word sense disambiguation. The main feature used was the path from the target word through the parse tree to the constituent in question, represented as a string of parse tree nonterminals linked bysymbols indicating upward or downward movement through the tree, as shown in frame element #5CHe&quot; to the target word #5Cate&quot; can be represented as NP &quot; S #23 VP #23 V, with &quot; indicating upward movement in the parse tree and #23 downward movement.</Paragraph> <Paragraph position="1"> The other features used were the identity of the target word and the identity of the constituent's head word. The probability distributions calculated from the training data were P#28fejpath#29, P#28fejpath;t#29, and P#28fejh;t#29, where feindicates an eventwhere the parse constituent in question is a frame element, path the path through the parse tree from the target word to the parse constituent, t the identity of the target word, and h the head word of the parse constituent. By varying the probability threshold at which a decision is made, one can plot a precision#2Frecall curve as shown in Figure 5. P#28fejpath;t#29 performs relatively poorly due to fragmentation of the training data #28recall only about 30 sentences are available for each target word#29.</Paragraph> <Paragraph position="2"> While the lexical statistic P#28fejh;t#29 alone is not useful as a classi#0Cer, using it in linear interpolation with the path statistics improves results. Note that this method can only identify frame elements that have a corresponding constituent in the automatically generated parse tree. For this reason, it is interesting to calculate how many true frame elements overlap with the results of the system, relaxing the criterion that the boundaries must match exactly. Results for partial matching are shown in Table 6.</Paragraph> <Paragraph position="3"> When the automatically identi#0Ced constituents were fed through the role labeling system described above, 79.6#25 of the constituents which had been correctly identi#0Ced in the #0Crst stage were assigned the correct role in the second, roughly equivalent to the performance when assigning roles to constituents</Paragraph> </Section> </Section> class="xml-element"></Paper>