File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-0803_evalu.xml
Size: 2,274 bytes
Last Modified: 2025-10-06 13:59:16
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0803"> <Title>SENSEVAL-3 TASK Automatic Labeling of Semantic Roles</Title> <Section position="4" start_page="0" end_page="0" type="evalu"> <SectionTitle> 2 Results </SectionTitle> <Paragraph position="0"> Eight teams submitted 20 runs. Three teams submitted runs only for the restricted case (no prior knowledge about frame boundaries). The other five teams submitted at least two runs, with one team submitting 8 runs and another submitting 4 runs.</Paragraph> <Paragraph position="1"> Four of these five teams submitted a restricted run and an unrestricted run (frame boundaries were identified, i.e., the task was a classification task of identifying the applicable frame element).</Paragraph> <Paragraph position="2"> The results for the classification task are shown in Table 1. The average precision over all these runs is 0.803 and the average recall is 0.757. The overlap in each run is almost identical to the precision, and differs slightly because there may have been some slight positional errors in either the FrameNet data or the sentence string provided in the test data.</Paragraph> <Paragraph position="3"> 0.595 and the average recall is 0.481. The average overlap is noticeably lower than the precision, indicating the additional difficulty for these runs of identifying the frame element boundaries.</Paragraph> <Paragraph position="4"> the start and end positions.</Paragraph> <Paragraph position="5"> In both cases, the percent attempted is quite high, except for one system in the restricted runs. This indicates that systems were able to identify potential frame elements in quite a large percentage of the cases. Systems were allowed to return any number of frame elements for a sentence and it is possible for a system to identify more frame elements than were identified by the FrameNet taggers. For example, run 08a asserted many more frame elements than were identified in the answer key. As a result, its percent attempted was much higher than 100 percent. The number of frame elements in other runs not identified in the answer key is unknown. The effect of a higher number attempted lowers the precision for a run and increases the percent attempted.</Paragraph> </Section> class="xml-element"></Paper>