File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1079_metho.xml

Size: 27,825 bytes

Last Modified: 2025-10-06 14:14:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1079">
  <Title>Message Understanding Conference - 6: A Brief History</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 The MUC Evaluations
</SectionTitle>
    <Paragraph position="0"> We have just completed the sixth in a series of Message Understanding Conferences, which have been organized by NRAD, the RDT&amp;E division of the Naval Command, Control and Ocean Surveillance Center (formerly NOSC, the Naval Ocean Systems Center) with the support of DARPA, the Defense Advanced Research Projects Agency.</Paragraph>
    <Paragraph position="1"> This paper looks briefly at the history of these Conferences and then examines the considerations which led to the structure of MUC-6} The Message Understanding Conferences were initiated by NOSC to assess and to foster research on the automated analysis of military messages containing textual information. Although called &amp;quot;conferences&amp;quot;, the distinguishing characteristic of the MUCs are not the conferences themselves, but the evaluations to which participants must submit in order to be permitted to attend the conference. For each MUC, participating groups have been given sample messages and instructions on the type of information to be extracted, and have developed a system to process such messages.</Paragraph>
    <Paragraph position="2"> Then, shortly before the conference, participants are given a set of test messages to be run through their system (without making any changes to the system); the output of each participant's system  sundheim@poj ke. nosc. mil is then evaluated against a manually-prepared answer key.</Paragraph>
    <Paragraph position="3"> The MUCs are remarkable in part because of the degree to which these evaluations have defined a prograin of research and development. DARPA has a number of information science and technology programs which are driven in large part, by regular evaluations. The MUCs are notable, however, in that they in large part have shaped the research program in information extraction and brought it to its current state}</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="466" type="metho">
    <SectionTitle>
2 Early History
</SectionTitle>
    <Paragraph position="0"> MUC-1 (1987) was basically exploratory; each group designed its own format for recording the information in the document, and there was no formal evaluation. By MUC-2 (1989), the task had crystalized as one of template filling. One receives a description of a class of events to be identiffed in the text; for each of these events one must fill a template with information about the event.</Paragraph>
    <Paragraph position="1"> The template has slots for information about the event, such as the type of event, the agent, the time and place, the effect, etc. For MUC-2, the template had 10 slots. Both MUC-1 and MUC-2 involved sanitized forms of military messages about naval sightings and engagements.</Paragraph>
    <Paragraph position="2"> The second MUC also worked out the details of the primary evaluation measures, recall and precision. To present it in simplest terms, suppose the answer key has Nke~ filled slots; and that a system fills Neor,.~t slots correctly and Nin~or,,~t incorrectly (with some other slots possibly left unfilled). Then Ncorrect recall Nkey null 2There were, however, a number of individual rescm'eh efforts in information extraction underway be\[bre the first MUC, including the work on information formatting of medieM narrative by Sager at New York University; the formatting of naval equipment failure reports by Marsh at the Naval Research Laboratory; and the DBG work by Logieon for RADC.</Paragraph>
    <Paragraph position="3">  For MUC-3 (1991), tile task shifted to reports of terrorist events ill Central and South America, as reported in articles provided by the Foreign Broadcast Information Service, and the template becmne somewhat more complex (18 slots). This same task was used for MUC-4 (1992), with a further small increase in template complexity (24 slots).</Paragraph>
    <Paragraph position="4"> MUC-5 (1993), which was conducted as part of the Tipster program, a represented a substantial fllrther jump in task complexity. Two tasks were involved, international joint ventures and electronic circuit fabrication, in two hmgnages, English and Japanese. The joint venture task required 11 templates with a total of 47 slots for the output double tile number of slots defined for MUC-4 and the task documentation was over 40 pages long.</Paragraph>
    <Paragraph position="5"> One innovation of MUC-5 was the use of a nested template structure. In earlier MUCs, each event had been represented as a single temi)late * in effect, a single record in a data l)ase, with a large nuinber of attributes. This format proved awkward when an event had several participmlts (e.g., several victims of a terrorist attack) and one wanted to record a set of facts about each participant. This sort of information (:ould be ranch more easily recorded in the hierarchical structure introduced for MUC-5, in which there was a single template for an event, which pointed to a list of temI(lates, one for each particii)mlt in tile event;. 4 3 MUC-6: initial goals 1)ARI)A convened a meeting of Tipster participants and government representatives in Decca&gt; bet' 1993 to define goals and tasks tot MUC-6) Among the goals which were identified were * demonstrating taskqndependent component technologies of information extraction which would be immediately useflfl * encouraging work to make information extractioil systems in&lt;)re portable * encouraging work on &amp;quot;deeper understanding&amp;quot; aTipster is a U.S. Govermnent program of research and development in the areas of inibrmation retrieval and information extraction.</Paragraph>
    <Paragraph position="6"> 4In fact the MUC-5 structure wa~s much (nor(; complex, because there were separate temt)lates for products, time, activities of organizations, etc. '~The representatives of the resear(:h community were Jim Cowie, lS(.alph Grishman (commit;tee chair), Jerry Hobbs, Paul Jacobs, Lea Schubert, Carl Weir, and Ralph Weischedel. The government people attending wcre George Doddington, Donna Harman, Boyan Onyshkevych, John Prangc, Bill Schultheis, and Beth Sundheim.</Paragraph>
    <Paragraph position="7"> Each of these can been seen in part as a reaction to the trends in the prior MUCs. The MUC-5 tasks, in particular, had been quite complex and a great effort had been invested by the government in preparing the training and test data and by the participants in adapting their systems for these tasks. Most participants worked on the tasks for 6 months; a few (the Tipster contractors) had been at work on the tasks tbr consi(lerably longer. While the performance of solne systems was quite impressive (the best got 57% recall, 64% precision overall, with 73% recall and 74% t)recision on the 4 &amp;quot;(:or(;&amp;quot; template types), tile question naturally arose as to whether there were many apl)lieations tbr which art investment of one or several developers over half-&gt;year (or more) could be justified. Furthermore, while so much effort had been expended, a large portion was specific to tire particular tasks. It wasn't clear whether much progress was being made on the underlying technologies which would be needed for hetter understanding.</Paragraph>
    <Paragraph position="8"> To address these goals, the meeting formulated an ambitious menu of tasks for MUC-6, with the idea that individual participants could choose a subset of these tasks. We consider the three goals in the three sections below, and describe the tasks which were developed to address each goal.</Paragraph>
  </Section>
  <Section position="5" start_page="466" end_page="466" type="metho">
    <SectionTitle>
4 Short-term subtasks
</SectionTitle>
    <Paragraph position="0"> The first goal was to identit~y, from the component technologies being developed for information extraction, flmctions which would be of 1)ractical use, would be largely domain indet)endent, and couhl in the near term be performed automatically with high ac('uracy. To meet this goal the con&gt; mittce developed the &amp;quot;named entity&amp;quot; task, which t(asically involves identifying the names of all the people, organizations, and geographic locations in a text.</Paragraph>
    <Paragraph position="1"> The final task specification, which also involved time, currency, and percentage expressions, used SGML markup to identify the names in a text.</Paragraph>
    <Paragraph position="2"> Figure 1 shows a sample sentence with named entity annotations. The tag ENAMEX (&amp;quot;entity name expression&amp;quot;) is used for both people and organization names; the tag NUNEX (&amp;quot;numeric expression&amp;quot;) is used for currency and I)ercentages.</Paragraph>
  </Section>
  <Section position="6" start_page="466" end_page="467" type="metho">
    <SectionTitle>
5 Portability
</SectionTitle>
    <Paragraph position="0"> The second goal was 1;o focus on portability in the inibrmation extraction task the ability to rapidly retarget a system to extract; information about a different class of events. The comnfittee felt that it was important to demonstrate that useful extraction systems eouht be created in a few weeks. To meet this goal, we decided that the infbrmation extraction task for MUC-6 wouhl have to involve a relatively simple template, more like MUC-2 than MUC-5; this was duhbed &amp;quot;mini- null MUC&amp;quot;. In keeping with |;he hierarchical template structure introduced in MUC-5, it was envisioned |;hat the inini-MUC would have an event-level template pointing to templates representing |;he partieitmnts in the event (people, orgmfizations, products, e.tc.), me(liated perhaps by a &amp;quot;relational&amp;quot; level template.</Paragraph>
    <Paragraph position="1"> To further increase portability, a proposal was made to standardize the lowest-level tenlplates (for peoph',, orgaIfizations, etc.), since these basic (:lasses are involved in a wide variety of actions. In this way, MUC participants could develop code for these low-level telnplates once, and then use them with many different types of events. These low-level t;emptates were named &amp;quot;telnplate elements&amp;quot;. As the specification finally deveh)ped, tit(; reinplate element for orgalfizations had six slots, for the inaximal organization nalne, any aliases, the type, a descriptive noun phrase, the locale (inost specific location), and country. Slots are tilh,d only if inforlnation is explicitly given in the text (or, ill the ease of the country, can be inDrred Doln an explicit locale). The text We are striving to have a strong renewed creative partnership with Coca-Cola,&amp;quot; Mr. Dooner says. However, o(lds of that hapt;ening are slim since word from Coke headquarters in Atlanta is that...</Paragraph>
    <Paragraph position="2"> wouht yiehl an organization telnplate elenmnt with live of these six slots filled:</Paragraph>
    <Paragraph position="4"> plate 5 from article 9402240\]33).</Paragraph>
    <Paragraph position="5"> Ever on the lookout for additional ewfluation measm'es, the committee decide, d to nlake the creation of telnI)late eh,ments tbr all the people and organizations in a text a separate MUC task. lake the named entil;y task, this was also seen as a potential demonstration of the ability of systelns 1;o pertbrm a useflfl, relatively dolnain independent task with near-term extraction te(:hnoh)gy (although it was recognized as being more dillicult than named entity, since it required merging information from several places in the text). The old-style MUC information extraction task, based on a description of a particular (:lass of events (a &amp;quot;scenario&amp;quot;) was called the &amp;quot;scenario template&amp;quot; task. A sample scenario template is shown in the appendix.</Paragraph>
  </Section>
  <Section position="7" start_page="467" end_page="467" type="metho">
    <SectionTitle>
6 Measures of deep understanding
</SectionTitle>
    <Paragraph position="0"> Another concern which was noted about the MUCs is that tile systenls we.re tending towards relatively shallow understanding techlfiques (based IIrimarily on local pa.ttern inatching), and that not enough work was being done to build up the mechanisms needed for deeper understanding. Therefore, tile committee, with strong encouragement front I)AII.PA, included three MUC tasks which were intended to measure, aspex:ts of the internal processing of an inforlnation extra(:lion or hmguage understanding systenL These three tasks, which were collectively called Se- null mEwfl (&amp;quot;Senmntic Ewfluation&amp;quot;) were: * Coreference: the systent would have to mark coreferential noun t)hrases (the initial SlmCification envisioned marking set-subsel; and part-whole relations, ill addition to identity relations) * Word sense disambiguation: for each ope.n (:lass word (noun, verb, a, djective, adverb) in the text, the systein would have to determine its sense using the Wordlmt class|Ileal|on (its &amp;quot;synset&amp;quot;, in Wordnet termii~of ogy) * Predicate-argument structure: the sys- null tem wouhl have to create a tree interrelating the constituellts of the sentence, using sonm set of gralnma.tical flnmtional relations The committee recognized that, in seh;eting sneh internal measures, it, was inaking sortie presumI) tion regarding the structures and decisions which an analyzer should make in understanding a docllmellt. Not everyone would share these pre, sumplions, lint participants in the next MU(J would be free 1;o enter the infornlation extraction evaluation and skip some or all of these internal ewdua-Lions. Language understanding technology might develop in ways very diIii?rent from those imagined by the committee, and these internal evaluations might turn ollt t() t)e irrelevant distractions. However, froln the current perslmctive of tnost of the eolnmittec, @ese seenmd fairly \])asic aspects of unde, rstanding, and so an experinmnt in evahlating them (and encouraging improvem(mt in them)  The committee, had l)ropos(;(t a ve.ry anll)itious I)rogrmn of cvahu~l;ions. Wc now had to r(xhlce.</Paragraph>
    <Paragraph position="1"> these I)roi)osals to (let;ailed spe.cifi(:ations. The. first step was t;o (lo some ma,mlal te.xl; anuol:a.-lion for the fore ~,asks named em;ity mM the Selnt,;val triad whi(:h were quit(: (tifii!r(!nt from what had be(m l;rie(l before, lh'M! sp(~(:ifi(:ations were prepared for ca(:h task, and in the sl)ring of \] 994 a grou I) of vohmt(~ers (most;ly vel;(n:ans ()f ear-. lier MU(Js) annol:~mxl a shorl: newst)~p(w m'tM(', using ('.ach set of specifi(:ations.</Paragraph>
    <Paragraph position="2"> Prot)lems arose with ea.(:h of t;he SemEval tasks. * For corefcren('e., ther(', were problems i(hull;i\[ying i)art-whoh~ and sei;-sul)s(C/ rela.tions, mM distinguishing the, two; a decision wa.s lm;er made to limit ourselv(;s I;() i(lenLi(;y rela.I;ions. (r) bbr sens(' I:~gging, l;h(; ~l.llllOl;tl, t,()l'S forum that, in some cases Wordn(,t made very \[ine dis1;incl;ions a,nd thai; making l;hese (list, incti(ms consistently ill l;agging was very ditticulI;.</Paragraph>
    <Paragraph position="3"> e For predicate-argumealt sl;ru(;l;llr(',, pracl;ically every new CoIIS\[;Ill(;l; 1)(;y()lI(l simple clauses and noun l)hrases r;tise(l new issues which had I;o t)e toilet:lively r(:solve(l.</Paragraph>
    <Paragraph position="4"> Beyon(l th(;se in(lividuM t)rolflenls, il; was fell: l;hal; l;he menu was simply (;oo anfl)il, ious, mM l;hal; w('. would do t)('.l:t('x by (:on(:entrat, ing on out; (',lemenl: of the Sem(;v;fl l,riad for MUC-6; at a. me('.l;ing hehl in .hllm 1994, a decision was mad(; to go with coref('xea,(:('.. In i/arl;, this r(~tl(w.l;est a feeling that the t)rol)lems wi@ Lh(', (:()refl',ren(:(~ Sl)(X&gt; ili(:a.I;ion w('.re l:he mosl; mn(mable l:o soluli(m, lilt, also re.fle(:i;ed a. (:onvicl;ion I;hal; (:or(ff('r(m(:(~ idea&gt; t:ilication had 1)een, &amp;nd would re, main, critical 1;o success ill inforina.t;iou cxl;r~mi;ion, au(1 st) it wgts \[IIlpor\[;~l,ll\[; 1;o (?llC()llrtl,~(~ a, dvtl~ltc(;s in (:or(',\[ k m(;nc(;, tin contrasl;, mosl; (;xt, rat:l;ion sysl;ems did nol; buil(t fltll t)redi(:ate-atrgument sl;ru(:l;ures, and word-sense (lismnbigual;ion played a relal;iv('ly stnall role ill exl;ra(:l;ion (particularly since (;xl;l';t(&gt; l;ion sysl;ems ol)erated in a narrow domain).</Paragraph>
    <Paragraph position="5"> 'Phe (:or('~h'a'(;n(:('~ task, like. the nam(xl entil;y l;ask, was a.nnotal;ed using SGMI, n()tal;i()tl. A C{\]REF tag has mt ID ai;l;ri|)ul;(' whi(:h i(lenl;ifies l;he tagged noult 1)hrase or l)ron(mn, ll; tn;ty also ha.vc a.n at,l;ril)ut;(' of the \[orm REF--n, which indi(:al,es thai; this lfln'ase is (:or(,fe, r(mtiM with I;he 1)hrasc wit;h I1) n. Figure, 2 shows an (;x(:('rt)I; fl'om ;m m'tMe, ann(/l;al;c(t \[or (;orefereal(;e. (;</Paragraph>
  </Section>
  <Section position="8" start_page="467" end_page="467" type="metho">
    <SectionTitle>
6 'The TYPE mM M\]~N ;tl;l;I'il)uLes which appear in l, he
</SectionTitle>
    <Paragraph position="0"> ;tctmd annot;al;ion have been omitted here fin the s~tke of readM)ilil;y.</Paragraph>
    <Paragraph position="1">  The next st;(; 1) was the IWel)axal;hm of a substa.nt;ia.1 Lra.illing corpllS for LII(~ l;wo novel t, asks which renmined (nmued Clll;il;y &amp;lid COI'(~,f(~,FO,1IC(Q. SRA Col l)orat:ion kindly provided tools which aided in t;he a nnol;ation process. Again a sl;alwa.rt grtml) of vt)luui;e(w a.nn()i;alx)rs was assenfl)led; 7 each was 1)to vide(l with 25 m't;i(:lcs from 1;h('. Wall Street .\]ourna.l. There. was SOlUe overlap b(!Lween t;hc arLi(:les assigned, s() t, haL we could IIIO&amp;Slll'(! ~;}1( ~. consistency of a.mloi;m;ion /w.|:weeu silx~s. This amloi,ation w~s (lone. in I.he winter o\[ 1994-95.</Paragraph>
    <Paragraph position="2"> A major role o\[ the. mmol;aLion l)ro(:e.ss was Lo i(lemify and res()lv(~ l)r(fl)h!ms wil;h l;he task Sl)(X&gt; ifi(:a.tions. For na.nied cnl;iifies, this was rel~tl;ively st, rtdght, forwar(\[. For COI'(~\[(~I'(;I/(',(;, it proved r(',markat)ly (lifli(:ult to f()rmutat;e guitl(,lines which were reasonal)ly comI/lel;(~ a, nd &lt;:onsist, ent.. s RomM 3: dry rml ()nee the t;ask sl)e(:ifica.l;ions seemed r(~asonably stM)l(b Nlbd) ()rg;ufiz(~(l a &amp;quot;(lry run&amp;quot; a full-s(:al(~ r(~hearsal for MUC-6, I)ul; with all result:s r('4)ori;ed a.nouymously. The dry run Ix)ok t)l;u:e in April 1995, wil;h a s(:enario iuvolving labor union (:()n.</Paragraph>
    <Paragraph position="3"> l,ra,c.t; n(~gotia.l:i(ms. ()f 0m sil;es whi(:h we.re involved in t;he annot;ation l/r()(:('~s,q, t;en 1)arl:i(:ipatx~(l in lhe dry run. Results of t;h(~ dry run were r(&gt; l)()rWxl n.I, l;he Tit)sl:er I)hase II 12--mout;h me(Mug in May 1995.</Paragraph>
  </Section>
  <Section position="9" start_page="467" end_page="471" type="metho">
    <SectionTitle>
8 The formal evaluation
</SectionTitle>
    <Paragraph position="0"> The MUC6 formal ewflu;ttion was /mhl in ,~{(q)l:emt)ex 1995. The s(:(;nario (l(~finil;ion w;L,q dist, ribuIxxt at, t;he t)egimling ()\[' S(q)tember I l;he test data was disiaibut, cd four we('.ks late.r, with re sult;s due by (,he end ()\[' th(; w('.ek. The ,qcena.rio involv(M (;h;l,II~O,S ill COI'|)OF;I,I;(~ (LK(~CIII;iv(; II\],%II;/,~Cm('.n(; p('a,~onn(~l. The. (;valua.1;i(m reel; mmly ()I 1;t1('. f~oals which had /)(~en set, 1)y th('~ iniLial planning (:onfer(mt:e. in l)e(:emlmr ()f 1993.</Paragraph>
    <Paragraph position="1"> There were (;va\]u;Lti(ms for \[our t, asks: 1HIIII(RI entit;y, (:orel'('.re.n(:e, 1;eml)lat(, c, lt!inenI;, }l, ll(t s(;c-nmio I;e, mt~lm;(u Ttmre w('r(; 16 t)m'ti(;ipmfl;s; 11.5 1)arti(:it)al;e(l in the nmne(l ent, it, y task, 7 in (',oref(~l'O, ll(~(~,, \] 1 ill t(',ml)lat;(; elemenl;, an(l 9 in s(:enari() l,(;mi)lal;(,,.</Paragraph>
    <Paragraph position="2"> Name(l eni;ity was inl;(mdcd to b(; a siml)h~ task on whi(:h syst, ems coul(t (lernoustrat, e a high level of 1)(!rforumn(:e ... high enough for imme(lim;e use. Our su(:(;(;ss iu I;his t, ask (~x(:(;(~(le(l our &gt;l'he annol;;)A;ion groups were from BBN, Brall(t(fis Univ., t~he Univ. of Durham, Lo(:kheed-Marl;in, New Mexico Sl;ai;e Univ., Nlbd), New York Univ., PRC, l;he, Univ. of l)(mnsylwmia, SAIC (San /)iego), SRA, SR\[, the Univ. of Shefliehl, SouLhe, rn Metlmdisl; Univ., mr(1 Ultisys.</Paragraph>
    <Paragraph position="3"> SAs exl)e, rienced (:Oml)ut~tional linguists, we 1)rol)ably should ha,re kuown 1)el;l;(',r l;han to l;hink this wa.s an easy t~ask.</Paragraph>
    <Paragraph position="4">  expectations. The majority of sites had recall and precision over 90%; the highest-scoring system had a recall of 96% and a precision of 97%.</Paragraph>
    <Paragraph position="5"> Although one must keep in mind the somewhat limited range of texts in the test set (all are from the Wall Street Journal, in particular), the results are excellent. A couple of these systems have been commercialized, and several are being incorporated into government text-processing systems.</Paragraph>
    <Paragraph position="6"> Given this level of performance, there is probably little point in repeating this task with the same ground rules in a future MUC (although there might be interest in processing monoease text and in performing comparable tasks oil a more varied corpus and for languages other than English).</Paragraph>
    <Paragraph position="7"> The template element task, while superficially similar to named entities - ~ it is also based on identifying people and organizations ~ is significantly more difficult. One has to identify descriptions of entities (&amp;quot;a distributor of kumquats&amp;quot;) as well as names. If an entity is mentioned several times, possibly using descriptions or different forms of a name, these need to be identified together; there should be only one template element for each entity in an article. Consequently, the scores were appreciably lower, ranging across most systems from 65 to 75% in recall, and from 75% to 85% in precision. The top-scoring system had 75% recall, 86% precision. Systems did particularly poorly in identifying descriptions; the highest-scoring system had 38% recall and 51% precision for descriptions.</Paragraph>
    <Paragraph position="8"> There seemed general agreement that having prepared code for template elements in advance did make it easier to port a system to a new seenario in a few weeks. This factor, and the room that exists for improvement in performance, suggest that including this task in a future MUC may be worthwhile.</Paragraph>
    <Paragraph position="9"> The goal for scenario templates mini-MUC -- was to demonstrate that effective information extraction systems could be created in a few weeks. This too was successful. Although it is difficult to meaningfully compare results on different scenarios, the scores obtained by most systems after a few weeks (40% to 50% recall, 60% to 70% precision) were comparable to the best scores obtained in prior MUCs. The highest performance overall was 47% recall and 70% precision.</Paragraph>
    <Paragraph position="10"> One can observe an increasing convergence of methods tbr information extraction. Most of the systems participating in MUC-6 employed a cascade of finite-state pattern recognizers, with the earlier pattern sets recognizing entities, and the later sets recognizing scenario-specific patterns. This convergence may be one reason for tile bunching of scores for this task -- most systems fell in a rather narrow range in both recall and precision.</Paragraph>
    <Paragraph position="11"> The results of this MUC provide valuable positive testimony on behalf of information extra(&gt; tion, but further improvement in both portability and performance is needed tbr many applications.</Paragraph>
    <Paragraph position="12"> With respect to port~bility, custoiners would like to have systems which can be ported in a t'ew hours, or at most a few days, by someone with less expertise than a system developer. How this might be tested in the context of a MUC is not entirely clear. For one thing, most sites spent several days just studying the scenario description and annotated corpus, in order to understand tile scenario definition, before coding began. Perhaps a micro-MUC 9 with an even simpler template structure, is needed to push the limits of port, ability. Getting systems which can be custonfized by others is also a tall order, given the complexity and variety of knowledge sources needed for a typical MUC information extraction task.</Paragraph>
    <Paragraph position="13"> With respect to performance, tile bunching of scores suggests that many sites were able to solve a common set of &amp;quot;easy&amp;quot; problems, but were stymied in processing messages which involved &amp;quot;hard&amp;quot; problems. Whether this is true, and just what the hard problems are, will require more extensive analysis of the results of MUC-6. Are the shortcomings due primarily to a lack of coverage in the basic patterns, to a lack of background knowledge in the domain, to failures in coreference, or something else? We. may hope that the failings are primarily in one area, so that we may concentrate our energies there, but more likely the failings will be in many areas, and broad improvements in extraction engines will be needed to improve performance. null  Pushing improvements in the underlying technology was one of tlm goals of SemEval and its current survivor, eoreference.. Much of tile energy for the current round, however, went into honing the definition of the task. Philosol)hers of language have been arguing over reference and coreferencc for centuries, so we should not have been surprised that it would t)e so hard to prepare a precise and consistent definition. Additional work on the definition will he necessary, and it may be necessary to narrow the task fllrther. Despite these distractions, a few interesting early results were ol)tained regarding eoreference methods; we may hot)e that, once the task specification settles down, the availability of coreferenceaimotated corpora and the chance for glory ill fltrther evaluations will ein'ourage more work in this area.</Paragraph>
    <Paragraph position="14"> Appendix: Sample Scenario Template Shown below is a set of templates for the MUC-6 scenario template task. Tile scenario involved changes in corporate executive management personnel. ~br the text; McCann has initiated a new so-called global collaborative system, (:omposed of world-wide account directors paired with creative partners. In addition, P(&gt; ter Kim was hired from WPP Grout)'s .I.</Paragraph>
    <Paragraph position="15"> Walter Thompson last; Septenfl)er as vice chairman, chief strategy officer, worhlwide. null the following templates were to be generated:</Paragraph>
    <Paragraph position="17"> POST: &amp;quot;vice chairman, chief strategy officer, world-wide&amp;quot;</Paragraph>
    <Paragraph position="19"> Although we cannot explain al\] tile details of the template here, a few highlights shouht be noted. For each executive post; one generates a SUCCESSION_EVENT template, which contains refl~rences to the ORGANIZATION template for the organization involved, and the IN_AND OUT template for the activity involving that post (if an article describes a person leaving and a per-son start;ing the same job, there will be two IN_AND_OUT templates). The IN_AND_OUT template contains references to the tmnt)lates fl)r the PERSON and tbr the ORGANIZATI()N from which the person came (if he/she is starting a new job). The PERSON and ORGANIZATION templates are the &amp;quot;temt)late element&amp;quot; templates, which are invariant across scenarios.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML