File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0803_metho.xml
Size: 8,336 bytes
Last Modified: 2025-10-06 14:09:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0803"> <Title>SENSEVAL-3 TASK Automatic Labeling of Semantic Roles</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 The Senseval-3 Task </SectionTitle> <Paragraph position="0"> This Senseval-3 task calls for the development of systems to meet the same objectives as the Gildea and Jurafsky study. The data for this task is a sample of the FrameNet hand-annotated data. Evaluation of systems is measured using precision and recall of frame elements and overlap of a system's frame element sentence positions with those identified in the FrameNet data.</Paragraph> <Paragraph position="1"> The basic task for Senseval-3 is: Given a sentence, a target word and its frame, identify the frame elements within that sentence and tag them with the appropriate frame element name.</Paragraph> <Paragraph position="2"> The FrameNet project has just released a major revision (FrameNet 1.1) to its database, with 487 frames using 696 distinctly-named frame elements (although it is not guaranteed that frame elements with the same name have the same meaning). This release includes 132,968 annotated sentences (mostly taken from the British National Corpus). The Senseval-3 task used 8,002 of these sentences</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Association for Computational Linguistics </SectionTitle> <Paragraph position="0"> for the Semantic Analysis of Text, Barcelona, Spain, July 2004 SENSEVAL-3: Third International Workshop on the Evaluation of Systems selected randomly from 40 frames (also selected randomly) having at least 370 annotations (out of the 100 frames having the most annotations).1 Participants were provided with a training set that identified, for each of the 40 frames, the lexical unit identification number (which equates to a file name) and a sentence identification name. They were also provided with the answers, i.e., the frame element names and their beginning and ending positions. Since the training set was much larger than the test set, participants were required to use the FrameNet 1.1 dataset to obtain the full sentence, its target word, and the tagged frame elements.</Paragraph> <Paragraph position="1"> For the test data, participants were provided, for each frame, with sentence instances that identified the lexical unit, the lexical unit identification number, the sentence identification number, the full sentence, and a specification of the target along with its start and end positions.</Paragraph> <Paragraph position="2"> Participants were required to submit their answers in a text file, with one answer per line. Each line was to identify the frame name and the sentence identifier and then all the frame elements with their start and end positions that their systems were able to identify. For example, for the sentence However, its task is made much more difficult by the fact that derogations granted to the Welsh water authority allow <Agent>it</> to <Target>pump</> <Fluid>raw sewage</> <Goal>into both those rivers</>.</Paragraph> <Paragraph position="3"> the correct answer would appear as follows: The sentences provided to participants were not presegmented (as defined in the Gildea and Jurafsky study); this was left to the participants' systems. The FrameNet dataset contains considerable information that was tagged by the FrameNet lexicographers.</Paragraph> <Paragraph position="4"> Participants could use (and were strongly encouraged to use) any and all of the FrameNet data in developing and training their systems. In the test, participants could use any of this data, but were strongly encouraged to use only data available in the sentence itself and in the frame that is identified. (This corresponds to the &quot;more difficult task&quot; identified by Gildea and Jurafsky.) Participants could submit two runs, one with (non-restrictive case) and one without (restrictive case) using the additional data; these were scored separately.</Paragraph> <Paragraph position="5"> FrameNet recognizes the permissibility of &quot;conceptually salient&quot; frame elements that have not been instantiated in a sentence; these are called null instantiations (see Johnson et al. for a fuller description). An example occurs in the following sentence (sentID=&quot;1087911&quot;) from the Motion frame: &quot;I went and stood in the sitting room doorway, but I couldn't get any further -- my legs wouldn't move.&quot; In this case, the FrameNet taggers considered the Path frame element to be an indefinite null instantiation (INI). Frame elements that have been so designated for a particular sentence appear to be Core frame elements, but not all core frame elements missing from a sentence have designated as null instantiations. The correct answer for this case, based on the tagging, is as follows: Motion.1087911 Theme (82,88) Path (0,0) Participants were instructed to identify null instantiations in submissions by giving a (0,0) value for the frame element's position.2 Participants were told in the task description that null instantiations would be analyzed separately.3 For this Senseval task, participants were allowed to download the training data at any time; the 21-day Explorer provides several facilities for examining the FrameNet data: by frame, frame element, and lexical units. For each unit, a user can explore a frame's elements, associated lexical units, frame-to-frame relations, frame and frame element definitions, lexical units and their definitions, and all sentences.</Paragraph> <Paragraph position="6"> 2This turned out to be an incorrect method, since some frame elements (notably &quot;I&quot; at the beginning of a sentence) would have a position of (0,0), i.e, the beginning and ending positions are both 0. Such instances in the test set were identified and handled separately to distinguish them from null instantiations. 3No analysis of null instantiations has yet been performed.</Paragraph> <Paragraph position="7"> restriction on submission of results after downloading the training data was waived since this is a new Senseval task and the dataset is very complex.</Paragraph> <Paragraph position="8"> Participants could work with the training data as long as they wished. The 7-day restriction of submitting results after downloading the test data still applied. In general, FrameNet frames contain many frame elements (perhaps an average of 10), most of which are not instantiated in a given sentence. Systems were not penalized if they returned more frame elements than those identified by the FrameNet taggers. For the 8002 sentences in the test set, only 16212 frame elements constituted the answer set.</Paragraph> <Paragraph position="9"> In scoring the runs, each frame element (not a null instantiation) returned by a system was counted as an item attempted. If the frame element was one that had been identified by the FrameNet taggers, the answer was scored as correct. In addition, however, the scoring program required that the frame boundaries identified by the system's answer had to overlap with the boundaries identified by FrameNet.</Paragraph> <Paragraph position="10"> An additional measure of system performance was the degree of overlap. If a system's answer coincided exactly to FrameNet's start and end position, the system received an overlap score of 1.0. If not, the overlap score was the number of characters overlapping divided by the length of the FrameNet start and end positions (i.e., end-start+1)4 The number attempted was the number of non-null frame elements generated by a system. Precision was computed as the number of correct answers divided by the number attempted. Recall was computed as the number of correct answers divided by the number of frame elements in the test set.</Paragraph> <Paragraph position="11"> Overlap was the average overlap of all correct answers. The percent Attempted was the number of frame elements generated divided by the number of frame elements in the test set, multiplied by 100. If a system returned frame elements not identified in the test set, its precision would be lower.</Paragraph> </Section> </Section> class="xml-element"></Paper>