File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1126_evalu.xml
Size: 10,063 bytes
Last Modified: 2025-10-06 13:59:08
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1126"> <Title>Information Extraction from Single and Multiple Sentences</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Event level analysis </SectionTitle> <Paragraph position="0"> After transforming each data set into the common format it was found that there were 276 events listed in the MUC data and 248 in the Soderland set. Table 1 shows the number of matches for each data set following the matchingprocessdescribedinSection3.2. Thecounts under the \MUC data&quot; and \Soderland data&quot; headings list the number of events which fall into each category for the MUC and Soderland data sets respectively along with corresponding percentagesofthatdataset. Itcanbeseenthat 112 (40.6%) of the MUC events are fully covered by the second data set, and 108 (39.1%) partially covered.</Paragraph> <Paragraph position="1"> the MUC data set which partially match with the Soderland data but that 118 events in the Soderland data set record partial matches with the MUC data. This occurs because the matching process allows more than one Soderland event to be partially matched onto a single MUC event. Further analysis showed that the difierence was caused by MUC events which were partially matched by two events in the Soderland data set. In each case one event contained details of the move type, person involved and post title and another contained the sameinformationwithouttheposttitle. Thisis caused by the style in which the newswire stories which make up the MUC corpus are written where the same event may be mentioned in more than one sentence but without the same level of detail. For example, one text contains the sentence \Mr. Diller, 50 years old, succeeds Joseph M. Segel, who has been named to the post of chairman emeritus.&quot; which is later followed by \At that time, it was announced that Diller was in talks with the company on becoming its chairman and chief executive upon Mr. Segel's scheduled retirement this month.&quot; Table 1 also shows that there are 56 events in the MUC data which fall into the nomatch category. Each of these corresponds to an event in one data set with no corresponding event in the other. The majority of the unmatched MUC events were expressed in such a way that there was no corresponding event listed in the Soderland data. The events shown in Figure 1 are examples of this. As mentioned in Section 2.2, a sentence must contain a minimum amount of information to be marked as an event in Soderland's data set, either name of an organisation and post or the name of a person changing position and whether they are entering or leaving. In Figure 1 the flrst sentence lists the organisation and the fact that executives were leaving. Thesecondsentenceliststhenamesoftheexecutives and their positions. Neither of these sentences contains enough information to be listed as an event under Soderland's representation, consequently the MUC events generated from these sentences fall into the nomatch category.</Paragraph> <Paragraph position="2"> It was found that there were eighteen events in the Soderland data set which were not included in the MUC version. This is unexpected since the events in the Soderland corpus should be a subset of those in the MUC corpus. Analysis showed that half of these corresponded to spuriouseventsintheSoderlandsetwhichcould notbematchedontoeventsinthetext. Manyof these were caused by problems with the BADGER syntactic analyser (Fisher et al., 1995) used to pre-process the texts before manual analysis stage in which the events were identifled. Mistakes in this pre-processing sometimes caused the texts to read as though the sentence contained an event when it did not. We examined the MUC texts themselves to determine whether there was an event rather than relying on the pre-processed output.</Paragraph> <Paragraph position="3"> Of the remaining nine events it was found that the majority (eight) of these corresponded to events in the text which were not listed in the MUC data set. These were not identifled as events in the MUC data because of the the strict guidelines, for example that historical events and non-permanent management moves should not be annotated. Examples of these event types include \... Jan Carlzon, who left last year after his plan for a merger with three other European airlines failed.&quot; and \Charles T. Young, chief flnancial o-cer, stepped down voluntarily on a 'temporary basis pending conclusion' of the investigation.&quot; The analysis also identifledoneeventintheSoderlanddatawhich appeared to correspond to an event in the text but was not listed in the MUC scenario template for that document. It could be argued that there nine events should be added to the setofMUCeventsandtreatedasfullymatches.</Paragraph> <Paragraph position="4"> However, the MUC corpus is commonly used as a gold standard in IE evaluation and it was decided not to alter it. Analysis indicated that one of these nine events would have been a full match and eight partial matches.</Paragraph> <Paragraph position="5"> It is worth commenting that the analysis carried out here found errors in both data sets.</Paragraph> <Paragraph position="6"> There appeared to be more of these in the Soderland data but this may be because the event structures are much easier to interpret andsoerrorscanbemorereadilyidentifled. Itis also di-cult to interpret the MUC guidelines in some cases and it sometimes necessary to make ajudgementoverhowtheyapplytoaparticular event.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Event Field Analysis </SectionTitle> <Paragraph position="0"> A more detailed analysis can be carried out examining the matches between each of the four flelds in the event representation individually. There are 1,094 flelds in the MUC data. Although there are 276 events in that data set seven of them do not mention a post and three omit the organisation name. (Organisation names are omitted from the template when the text mentions an organisation description rather than its name.) Table4.2liststhenumberofmatchesforeach of the four event flelds across the two data sets. Each of the pairs of numbers in the main body ofthetablereferstothenumberofmatchinginstances of the relevant fleld and the total number of instances in the MUC data.</Paragraph> <Paragraph position="1"> The column headed \Full match&quot; lists the MUC events which were fully matched against the Soderland data and, as would be expected, all flelds are matched. The column marked \Partial match&quot; lists the MUC events which are matched onto Soderland flelds via partially matching events. The column headed \Nomatch&quot; lists the event flelds for the 56 MUC events which are not represented at all in the Soderland data.</Paragraph> <Paragraph position="2"> Of the total 1,094 event flelds in the MUC data 727, 66.5%, can be found in the Soderland data. The rightmost column lists the percentages of each fleld for which there was a match. Thecountsforthetypeandpersonfleldsarethe same since the type and person flelds are combined in Soderland's event representation and hence can only occur together. These flgures also show that there is a wide variation between theproportionofmatchesforthedifierentflelds with 76.8% of the person and type flelds being matched but only 43.2% of the organisation fleld.</Paragraph> <Paragraph position="3"> This difierence between flelds can be explainedbylookingatthestyleinwhichthetexts null forming the MUC evaluation corpus are written. It is very common for a text to introduce a management succession event near the start of the newswire story and this event almost invariably contains all four event flelds. For example, one story starts with the following sentence: \Washington Post Co. said Katharine Graham stepped down after 20 years as chairman, and will be succeeded by her son, Donald E. Graham, the company's chief executive o-cer.&quot; Later in the story further succession events may be mentioned but many of these use an anaphoric expression (e.g. \the company&quot;) rather than explicitly mention the name of the organisationintheevent. Forexample,thissentence appears later in the same story: \Alan G. Spoon, 42, will succeed Mr. Graham as president of the company.&quot; Other stories again may only mention the name of the person in the succession event. For example, \Mr. Jones is succeeded by Mr. Green&quot; and this explains why some of the organisation flelds are also absent from the partially matched events.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Discussion </SectionTitle> <Paragraph position="0"> From some perspectives it is di-cult to see why there is such a difierence between the amount of events which are listed when the entire text is viewed compared with considering single sentences. After all a text comprises of an ordered list of sentences and all of the information the text contains must be in these. Although, as we have seen, it is possible for individual sentences to contain information which is di-cult to connect with the rest of the event description when a sentence is considered in isolation.</Paragraph> <Paragraph position="1"> The results presented here are, to some extent, dependent on the choices made when representing events in the two data sets. The events listed in Soderland's data require a minimal amount of information to be contained within a sentence for it to be marked as containing information about a management succession event. Although it is di-cult to see how any less information could be viewed as representing even part of a management succession event.</Paragraph> </Section> </Section> class="xml-element"></Paper>