File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/m98-1030_metho.xml

Size: 43,518 bytes

Last Modified: 2025-10-06 14:14:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="M98-1030">
  <Title>The Message Understanding Conference Scoring Software User's Manual</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
OBJ_STATUS: OPTIONAL
</SectionTitle>
    <Paragraph position="0"> Key objects differ from response objects in a few respects: a slot may be marked &amp;quot;optional&amp;quot; by placing a slash character (&amp;quot;/&amp;quot;) before the very first fill of the object. If the response object includes the optional slot, then the response fill and object fill are compared like any other fills. If the response object doesn't have the optional slot, no points are scored against it.</Paragraph>
    <Paragraph position="1"> a slot may contain &amp;quot;alternative&amp;quot; fills, separated by a slash character as the first non-blank character on a line. The response fill is matched with whichever of the alternatives gives the best &amp;quot;f&amp;quot;-score for that fill.</Paragraph>
    <Paragraph position="2"> the entire object may be marked optional, by including the &amp;quot;status&amp;quot; slot, with a fill of &amp;quot;optional&amp;quot;. If an optional object is aligned with a response object, it is scored like any other object. But if no response object aligns with the optional object, no points are scored against the response.</Paragraph>
    <Paragraph position="3"> Template File Caveats The information extraction task descriptions often include a BNF which describes the different types of objects in the task. The scorer makes some some further assumptions about the format of template files which are not specified in the BNF's: all objects from a document should be grouped in one place in the template file.</Paragraph>
    <Paragraph position="4"> an object's header should be on its own line.</Paragraph>
    <Paragraph position="5"> if a line has a slot name, the name should be the first non-blank token on the line.</Paragraph>
    <Paragraph position="6"> there should be only one fill per line.</Paragraph>
    <Paragraph position="7"> a line containing a fill may have &amp;quot;link information&amp;quot; at the end of the line: SLOT_NAME: &amp;quot;a slot fill&amp;quot; ##392#404#textsfilename This is a pair of pound signs (&amp;quot;##&amp;quot;) followed by the &amp;quot;start offset&amp;quot; of the fill, then a single pound sign followed by the &amp;quot;end offset&amp;quot; of the fill, then another single pound sign, followed by the name of the texts file. None of the offset information is used in scoring, but it may be used in later versions of the scorer to highlight portions of the texts file. At present the scorer reads the start offset and end offset, but ignores the name of the texts file. The texts file name should not contain any pound signs.</Paragraph>
    <Paragraph position="8"> comments may be inserted into the template files on lines that have a pound sign or a semicolon as the very first character on a line.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SGML Task Files
</SectionTitle>
    <Paragraph position="0"> The coreference and named entity tasks involve adding Standard Generalized Markup Language (SGML) to the the texts file to create the key and response files.</Paragraph>
    <Paragraph position="1"> The Scoring Software's View of SGML SGML is a very flexible and powerful language for adding structure to computer documents. The MUC scoring software recognizes a subset of SGML when it scores the coreference and named entity tasks. This discussion is a (very) simplified description of SGML.</Paragraph>
    <Paragraph position="2"> An SGML tag is a character string inserted into a text file. Tags usually come in pairs, consisting of an open tag and a close tag. A pair of tags enclose a section of the text. For example, here is a piece of text, then the same text with some SGML tags added.</Paragraph>
    <Paragraph position="3"> Be glad you don't work  Open tags start with an open angle bracket, and are followed immediately by the generic identifier for that type of tag. Next come a sequence of attribute definitions for that type of tag. The end of an open tag is the close angle bracket. Close tags start with an open angle bracket, then a slash and the same generic identifier as close tag. Close tags don't have attributes.</Paragraph>
    <Paragraph position="4"> In the above example, the three tag pairs have generic identifiers ADVICE, STRUCTURE, BODY, and LOC. Only the BODY tag has an attribute, named TYPE, with a value of WATER.</Paragraph>
    <Paragraph position="5"> Conversion of SGML tags to MUC objects In all MUC tasks, the texts file already has some SGML tags. In the coreference and named entity tasks, the annotators and systems add more tags to the texts to create the keys and responses. The scoring software converts the tags (together with the text they enclose) into objects which have the same internal structure as the objects for the information extraction tasks.</Paragraph>
    <Paragraph position="6"> For example, here's some text marked up with TIMEX tags, which were part of the MUC6 named entity task. &lt;TIMEX TYPE=&amp;quot;DATE&amp;quot; ALT=&amp;quot;fiscal 1994&amp;quot;&gt;the first six months of fiscal 1994&lt;/TIMEX&gt; The scorer would convert the text into an object which in a template file would look like this:</Paragraph>
    <Paragraph position="8"> TEXT: &amp;quot;the first six months of fiscal 1994&amp;quot; /&amp;quot;fiscal 1994&amp;quot;</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
TYPE: DATE
SGML task caveats
</SectionTitle>
    <Paragraph position="0"> In the coreference and named entity tasks, there are some things to be careful of when you are preparing keys or responses. One thing is to not delete or insert any characters outside of the SGML tags. Doing this almost always confuses the scoring software and lowers the score. To see if you've changed anything you shouldn't have, you can use the unix &amp;quot;sed&amp;quot; command, or something similar, as in this example with the coreference tags: unix% sed 's/&lt;COREF[^&gt;]*&gt;//g' rsp  |sed 's/&lt;\/COREF[^&gt;]*&gt;//g' &gt;rsp.notags unix% diff texts rsp.notags The sed command above removes the COREF tags from the responses file (named rsp), and then compares what's left to the original texts file (named texts). The diff command will then show what part of the original texts file has been changed.</Paragraph>
    <Paragraph position="1"> Output File Formats The MUC scoring software prints several reports to show how the key and response compared. There is a score report, which only shows &amp;quot;the numbers.&amp;quot; There's also report summary, which shows in more detail how the key and response objects were aligned. For the coreference task, there is a &amp;quot;partitions&amp;quot; file, which shows how the key and response equivalence classes compared. And there is a &amp;quot;map history&amp;quot; file, which gives a detailed, if not very readable, description of how the objects were aligned.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Report Summary Files
</SectionTitle>
      <Paragraph position="0"> The &amp;quot;report summary&amp;quot; files show how the fills and objects of the keys and responses align. There are three types of report summary files: one for the coreference task, one for the named entity task, and one for the information extraction tasks.</Paragraph>
      <Paragraph position="1">  ENAMEX cor cor PERSON PERSON &amp;quot;Consuela Washington&amp;quot; &amp;quot;Consuela Washington&amp;quot; ENAMEX cor inc PERSON PERSON &amp;quot;John Dingell&amp;quot; &amp;quot;Washington&amp;quot; ENAMEX cor inc PERSON PERSON &amp;quot;Carter&amp;quot; &amp;quot;Tim Wirth&amp;quot; TIMEX cor cor DATE DATE &amp;quot;01/19/93&amp;quot; &amp;quot;01/19/93&amp;quot; ENAMEX mis mis PERSON &amp;quot;Washington&amp;quot; &amp;quot;&amp;quot; ENAMEX spu spu ORGANIZATION &amp;quot;&amp;quot; &amp;quot;Exchange&amp;quot; ENAMEX spu spu ORGANIZATION &amp;quot;&amp;quot; &amp;quot;Old Executive Office&amp;quot; The named entity report summary file gives a one-line-per-object-pair description of how the objects were aligned. Each line has seven fields. The first is the generic identifier of the tag which defines the object. The second and third contain three-letter abbreviations for how the key and response objects or fills compared. The abbreviations are: cor Correct. The key and response fills agree.</Paragraph>
      <Paragraph position="2"> inc Incorrect. The key and response fills disagree.</Paragraph>
      <Paragraph position="3"> mis Missing. There was a key fill but no response fill.</Paragraph>
      <Paragraph position="4"> spu Spurious. There was a response fill but no key fill.</Paragraph>
      <Paragraph position="5"> opt Optional. There was a key object but no response object, but the key object was marked &amp;quot;optional&amp;quot;. The key object's fills are also counted as &amp;quot;optional&amp;quot;.</Paragraph>
      <Paragraph position="6"> The fourth and fifth fields are the key and response TYPE fills, if there are any. The sixth and seventh fields are the key and response TEXT fields. If the key contained more than one TEXT fill (through use of the ALT attribute), the one that was aligned with the response fill is the one shown.</Paragraph>
      <Paragraph position="7"> If you are interested in seeing all alternatives, you can specify that you want to use the information-extraction-style report summary files. Just include the line :use_IE_report_summary yes somewhere in the configuration file.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Information Extraction Report Summaries
</SectionTitle>
      <Paragraph position="0"> This is a portion of a information extraction task (TE, TR, or ST) report summary file:</Paragraph>
      <Paragraph position="2"> The information extraction report summaries files have four columns. The first column shows the result of the pairing on that line. Upper case values are for object comparisons, and lower case values are for single fill comparisons.</Paragraph>
      <Paragraph position="3"> Possible values are: cor Correct. The key and response agree.</Paragraph>
      <Paragraph position="4"> inc Incorrect. The key and response disagree.</Paragraph>
      <Paragraph position="5"> mis Missing. There was a key but no response.</Paragraph>
      <Paragraph position="6"> spu Spurious. There was a response but no key.</Paragraph>
      <Paragraph position="7"> opt Optional. There was a key but no response, and the key object or slot was marked optional. uns Unscored. The object or slot isn't scored. rem Removed. This is for pointers to optional objects. If a key pointer points to an optional key object that was not aligned with any response object, the fill is &amp;quot;removed,&amp;quot; and doesn't count toward the score. The second column shows the name of the slots for the key and response object for the line (and the lines following if there are multiple fills in the slot).</Paragraph>
      <Paragraph position="8"> The third and fourth columns show the key and response object records, respectively. Coreference &amp;quot;Partition&amp;quot; Files For the coreference task, there is an extra report generated, which shows the COREF objects' equivalence classes, and how they are partitioned by the comparison between keys and responses. Key equivalence classes are surrounded by star characters (*****), and response equivalence classes by equal signs (=====). Here is a portion of a partition file that gives one key equivalence class from a MUC 6 document.  have, in order from left to right, the start offset of the noun phrase in the texts file.</Paragraph>
      <Paragraph position="9"> the end offset of the noun phrase in the texts file.</Paragraph>
      <Paragraph position="10"> the ID of the key COREF object.</Paragraph>
      <Paragraph position="11"> the ID of the key COREF object to which this object points (or &amp;quot;NULL&amp;quot; if the object has no REF attribute). the ID of the response COREF object aligned with the key coref object. the noun phrase that was marked up to create the object.</Paragraph>
      <Paragraph position="12"> In the &amp;quot;missing&amp;quot; objects' lines, the fields are the same except the response object's ID is, of course, missing. Note that there are blank lines between some of the COREF object lines. These show the partitions of the key equivalence class by the response. While the key ties together every noun phrase between the stars, the response doesn't, so there are &amp;quot;breaks&amp;quot; in the equivalence class. These breaks are what are counted to get the recall error. The precision error is got from the response equivalence classes in a symmetric manner.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Map History Files
</SectionTitle>
      <Paragraph position="0"> The &amp;quot;map history&amp;quot; output file is meant primarily for other computer programs to read. It consists of one large Tclstyle list. Each element of this list is itself a list which corresponds to one &amp;quot;document&amp;quot; from the keys and/or responses file. The document lists also contain lists, and this nesting of lists continues on down to the single fill level. Lists in the hierarchy consist of attribute name/attribute value pairs. Attribute names start with a hyphen.</Paragraph>
      <Paragraph position="1"> The Hierarchy of Map History Lists In hierarchy order, the attributes are:  single-fill tallies for all objects of one type in one document.</Paragraph>
      <Paragraph position="2"> clean_fill How a single string fill looks when it is compared to another fill. Leading and trailing whitespace has been trimmed, certain substrings have been removed, and all intertoken whitespaces are changed to single space characters. For example, the string &amp;quot;a corporation that manages the Seaport&amp;quot; would be changed to &amp;quot;that manages the seaport&amp;quot; (depending on how the scorer is configured), because the premodifier &amp;quot;a&amp;quot; and the corporate designator &amp;quot;corporation&amp;quot; are both removed, and all characters are made lowercase. docnum the string identifying the document in the texts file doctallies the totals of the (in order) possible, actual, correct, partial, incorrect, missing, spurious, and noncommittal single-fill &amp;quot;tallies&amp;quot; for the entire document.</Paragraph>
      <Paragraph position="3"> doc_section in the SGML tasks (named entity and coreference), the name SGML tags which enclose the object in the texts document, e.g. &amp;quot;HEADLINE&amp;quot; or &amp;quot;TEXT&amp;quot;.</Paragraph>
      <Paragraph position="4"> fill the fill as it appeared in the key or response file (with one exception: in the coreference task's REF fill, this is how the REF attribute would look if it were written as a template object pointer). key_obj_id The identification string of the key object of the pair.</Paragraph>
      <Paragraph position="5"> key_obj_optional Whether the object in the key is marked optional. This attribute has no value following it in the list; its presence alone means the key was marked optional.</Paragraph>
      <Paragraph position="6"> key_obj_rep_id Almost always the same as the key_obj_id. In the scenario template task of past MUC's, there have been objects in the key that are &amp;quot;identical&amp;quot;. All objects are put in equivalence classes (different from the equivalence classes of the coreference task), so that pointers to any object in an equivalence class are still counted correct, even though they don't point to exactly the same object.</Paragraph>
      <Paragraph position="7"> key_obj_end_offset in the SGML tasks (named entity and coreference), the position in the texts file, measured from the beginning of the file, where the close tag for the object is.</Paragraph>
      <Paragraph position="8"> key_obj_start_offset in the SGML tasks (named entity and coreference), the position in the texts file, measured from the beginning of the file, where the open tag for the object is.</Paragraph>
      <Paragraph position="9"> key_single_fill a list describing the single fill from the key.</Paragraph>
      <Paragraph position="10"> key_slot_optional whether the slot was marked optional in the key. This attribute has no value associated with it. If the attribute name is there, it means the slot was marked optional.</Paragraph>
      <Paragraph position="11"> multi_fill_pairs the list describing how the key slot fill alternatives were aligned with the response alternatives. (When scoring system responses, there should be only one response alternative. For interannotator comparisons, both key and response may have many alternatives.) multi_fill_tallies the tallies for the single fills in this pairing of alternatives (see multi_fill_pairs). obj_pair_status How the objects of a pair compared at the object-level; correct, incorrect, etc. obj_pair_tallies the tallies for the single fills in this pair of objects.</Paragraph>
      <Paragraph position="12"> obj_pairs a list describing how objects of one type were aligned.</Paragraph>
      <Paragraph position="13"> rsp_obj_id The identification string of the rsp object of the pair.</Paragraph>
      <Paragraph position="14"> rsp_obj_optional Whether the object in the response is marked optional.</Paragraph>
      <Paragraph position="15"> rsp_obj_rep_id Almost always the same as the rsp_obj_id. In the scenario template task of past MUC's, there have been objects in response that are identical. All objects are put in equivalence classes (different from the equivalence classes of the coreference task), so that pointers to any object in an equivalence class are still counted correct, even though they don't point to exactly the same object.</Paragraph>
      <Paragraph position="16"> rsp_single_fill a list describing the single fill from the response.</Paragraph>
      <Paragraph position="17"> single_fill_pair_status A three-character abbreviation for how the two single fills in a pair compared. single_fill_pair_tallies Another way for writing the single_fill_pair_status, that is compatible with all other tallies up the hierarchy. single_fill_pairs the list describing how one list of key single fills (possibly from many alternatives) was aligned with one list of response single fills.</Paragraph>
      <Paragraph position="18"> slot_name the name of the slots which are paired here.</Paragraph>
      <Paragraph position="19"> slot_pairs the list describing how the two objects' slots compared.</Paragraph>
      <Paragraph position="20"> slot_tallies the tallies for the single fills in this slot's comparison.</Paragraph>
      <Paragraph position="21"> type the type of the single fill (set fill, string fill, or pointer fill).</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Score Files
Information Extraction Score Report
</SectionTitle>
      <Paragraph position="0"> Figure shows one page from a scores file for the MUC-6 scenario template task. There is one page of scores for each document in the task, plus one page for the totals over all documents. Each page is divided into four sections. The first section shows the &amp;quot;text filtering&amp;quot; or &amp;quot;relevance&amp;quot; scores. These have to do with judging whether each document is even relevant to the scenario the NLP system should be looking for. The second section gives the object scores, which shows how the keys and response agree at the object level. The third section shows how well the keys and responses agree at the slot fill level. Only the slot scores determine the final scores, which are the last thing on a page.</Paragraph>
      <Paragraph position="1"> The template element and template relation score reports are identical to the scenario template score reports, except that they have no text filtering section.</Paragraph>
      <Paragraph position="2">  The report has several parts: subtask scores Each named entity tag contains an attribute categorizing the marked-up text. This section shows how well the response did for each category.</Paragraph>
      <Paragraph position="3"> section scores Each document is already marked up with SGML even before the keys and responses are made. This section summarizes how the response did for each &amp;quot;section&amp;quot; of the SGML document.</Paragraph>
      <Paragraph position="4"> object scores Tallies at the object level. These tallies don't contribute to the final score at the bottom of the page. slot scores Tallies at the slot level. It is the slot level tallies which are used to determine the final score.  The scoring software has three configuration files, that you use to specify how the keys and responses are compared. The reason there are three files is partly historical and partly because parsing some of the configuration options differs a little. In future versions the three files will probably coalesce into one file.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Main Configuration File Format
</SectionTitle>
      <Paragraph position="0"> You must specify the name of the main configuration file on the command line when you invoke the scorer. The configuration file tells the scorer how to compare the keys and responses. It consists of a list of options. Each option is specified by a colon (&amp;quot;:&amp;quot;) as the first character of a line, followed immediately (no spaces) by the name of the option. After some more spaces come the value or values of the option. Values are separated by spaces. Values which themselves contain spaces must be enclosed in single or double quotes. The current options are: class_defs The strings which declare the scorer objects' types and give their score report names and mapping order. This is a required entry in the configuration file. Each class_def string is a quadruple of tokens:  1. the name of the class 2. the name of the class that you want to appear in the score report 3. either &amp;quot;scored&amp;quot; or &amp;quot;unscored,&amp;quot; depending on whether you want the object-level scoring to count this class of object.Note that this value doesn't affect whether the fills within the objects are scored. Slot-level scoring is specified in the slot_defs option, described below.</Paragraph>
      <Paragraph position="1"> 4. the map threshold. The f-score for each slot of an object is calculated and multiplied by that slot's map  weight. The weighted f-scores are them summed, and if they exceed the object's map threshold, then the response and key objects are deemed similar enough to be aligned. For the past couple of MUC's, the threshold has been set to 0 and the map weights have be made really big, so that if the two objects agree in just one fill of one slot, they may be aligned.</Paragraph>
      <Paragraph position="2"> Here is an example of the class_def option, for the named entity task: :class_defs &amp;quot;enamex enamex scored 0&amp;quot; &amp;quot;numex numex scored 0&amp;quot; &amp;quot;timex timex scored 0&amp;quot; The class def strings should be in the order that you want the classes of objects aligned. For the named entity, template element and coreference tasks, this order is unimportant. But for the template relation and scenario template tasks, the pointer fills are judged correct or incorrect based on whether or not the objects they point to are aligned. So the aligning should always start with objects that contain no pointer fills, and proceed to objects whose only pointer fills reference objects without pointer fills, etc. (See the section on how the TR and ST tasks are scored, below.) content_name In the scenario template task, the name of the slot in the &amp;quot;template&amp;quot; object (see the template_name option below) which must have fills if the document is relevant to the ST task, and which must not have fills if the document is not relevant to the ST task. Default: &amp;quot;content&amp;quot;.</Paragraph>
      <Paragraph position="3"> corporate_designators A list of substrings which will be removed from string fills before they are compared. As the name implies, it's usually a list of strings like &amp;quot;corporation&amp;quot;, &amp;quot;ltd&amp;quot;, etc. Note that if you want to remove substrings that themselves have postmodifiers (see below), you must specify the substrings with postmodifiers changed to spaces, and all resulting spaces in the corporate designator string squashed into one. For instance, if you don't want the string &amp;quot;S.A. DE C.V.&amp;quot; to affect stringfill comparisons, it should go into the configuration file as &amp;quot;S A DE C V&amp;quot;, with one space between the &amp;quot;A&amp;quot; and the &amp;quot;DE&amp;quot;. (But for the coreference task, you should not take out the postmodifiers.) Default: the empty list.</Paragraph>
      <Paragraph position="4"> doc_section_groups Used in the Named Entity task, to group doc_section (see below) scores. In MET 2, the documents in the texts file have various SGML formats. For example, in some documents there is a HEADLINE tag, but in other documents, the tag is called HL. To get a score for all document sections which are the same semantically, but differ in their tags, you can &amp;quot;group&amp;quot; the similar tags, by putting, for example, &amp;quot;Headline HEADLINE HL&amp;quot; as one value for this option. The first token in a value string is what you want to call the group. The rest of the tokens are the name of doc sections. You must also specify the rest of the tokens in the &amp;quot;doc_sections&amp;quot; option described below. If this option is not in the configuration file, the doc_sections scores are used. If it's specified, the scorer only gives the tallies for the names given. Default: None; uses doc_sections instead. doc_sections The names of the SGML sections that should be parsed for coreference or named entity objects, and which will be used to report &amp;quot;document section scores&amp;quot; in the named entity task. The default is this list of sections: &lt;DOC&gt;, &lt;DATELINE&gt;, &lt;DD&gt;, &lt;HEADLINE&gt;, and &lt;TEXT&gt;. Note that as long as the documents are enclosed by &lt;DOC&gt;, all of the objects will be parsed by default. Having tags that don't really occur in the documents won't hurt anything. If one section is nested in another section, it is the innermost section which will be reported for the score. For example, if there are HEADLINE's inside the TEXT, the objects will be considered to be inside the HEADLINE.</Paragraph>
      <Paragraph position="5"> dump_map_history Whether or not to print the map history report. Default: &amp;quot;no&amp;quot;. If anything else, the map history will be printed. equatable_objects In the scenario template task, which objects may possibly be identical. In MUC 6, the &amp;quot;IN_AND_OUT&amp;quot; objects were like this. Default: no objects.</Paragraph>
      <Paragraph position="6"> key_file The name of the keys file. Default: &amp;quot;keys&amp;quot;.</Paragraph>
      <Paragraph position="7"> map_history_file The name of the map history output file. Default: &amp;quot;map_history&amp;quot; muc_base_directory A string that is prepended to the names of all filename options. This allows you to give the absolute pathname of all filenames without a lot of typing. Defaults to the empty string.</Paragraph>
      <Paragraph position="8"> ne_subtask_names A list of strings, each with three tokens. The first token is the object type. The second token is the slot name. The third token is the fill value. The tallies for all fills of that value in that slot in that type of object will be reported in the NE subtask section of the score report. Default: the following strings: &amp;quot;enamex type organization&amp;quot; &amp;quot;enamex type person&amp;quot; &amp;quot;enamex type location&amp;quot; &amp;quot;enamex type other&amp;quot; &amp;quot;timex type date&amp;quot; &amp;quot;timex type time&amp;quot; &amp;quot;timex type other&amp;quot; &amp;quot;numex type money&amp;quot; &amp;quot;numex type percent&amp;quot; &amp;quot;numex type other&amp;quot; optional_status_slot The name of the slot in all objects that you use to specify that an objects is optional, by putting the string &amp;quot;OPTIONAL&amp;quot; or &amp;quot;OPT&amp;quot; as the slot's only fill.</Paragraph>
      <Paragraph position="9"> partition_file The name of the partition output file for the coreference task.</Paragraph>
      <Paragraph position="10"> postmodifiers A list of strings that are changed to spaces in stringfills before the stringfills are compared. Usually used so that punctuation marks don't affect the comparisons. Default: the empty list.</Paragraph>
      <Paragraph position="11"> premodifiers A list of tokens that are removed from the beginning of stringfills before they are compared. Usually used so that the words &amp;quot;a,&amp;quot; &amp;quot;an,&amp;quot; and &amp;quot;the&amp;quot; don't affect the scoring. report_field_separator A character string that is printed between the fields of the information extraction-style &amp;quot;report summary&amp;quot; files. The default is the vertical bar (&amp;quot;|&amp;quot;).</Paragraph>
      <Paragraph position="12"> report_summary_file The name of the report summary file. Default: &amp;quot;report_summary&amp;quot;.</Paragraph>
      <Paragraph position="13"> response_file The name of the responses file. Default: &amp;quot;responses&amp;quot;.</Paragraph>
      <Paragraph position="14"> score_report_file The name of the scores file. Default: &amp;quot;scores&amp;quot;.</Paragraph>
      <Paragraph position="15"> scoring_method One of either &amp;quot;key2response&amp;quot; or &amp;quot;key2key&amp;quot;. &amp;quot;Key2response&amp;quot; is the default. Key2key is used for interannotator comparisons.</Paragraph>
      <Paragraph position="16"> scoring_task One of &amp;quot;coreference&amp;quot;, &amp;quot;named_entity&amp;quot;, &amp;quot;template_element&amp;quot;, &amp;quot;template_relation&amp;quot;, and &amp;quot;scenario_template.&amp;quot; There is no default for this option. It must specified.</Paragraph>
      <Paragraph position="17"> sgml_ALT_slot In the named entity task, the name of the slot whose contents will be moved into the TEXT slot (see sgml_TEXT_slot below), as an alternative to the contents got from the text between the SGML tags. sgml_DOCNUM_gid The name of the SGML tag which identifies the section which holds the document numbers. Default: DOCNO. Note that every document in the keys or responses file must have this section. The document number is simply every digit [0-9] in the specified document section.</Paragraph>
      <Paragraph position="18"> sgml_DOC_gid The name of the SGML tags which enclose one entire document. Default: &amp;quot;DOC&amp;quot;. sgml_ID_slot In the coreference task, the name of the attribute of the tags for the task which give the unique identification string for the object. Default: &amp;quot;ID&amp;quot;.</Paragraph>
      <Paragraph position="19"> sgml_MIN_slot In the coreference task, the name of the attribute which holds the &amp;quot;head&amp;quot; of the noun phrases enclosed in the coreference tags. Default: &amp;quot;MIN&amp;quot;.</Paragraph>
      <Paragraph position="20"> sgml_REF_slot In the coreference task, the name of the attribute which holds the pointer some other &amp;quot;identical&amp;quot; object in the document. Default: &amp;quot;REF&amp;quot;.</Paragraph>
      <Paragraph position="21"> sgml_TEXT_slot In the named entity and coreference tasks, the name of the slot into which the text between the open and close tags goes. Default: &amp;quot;TEXT&amp;quot;.</Paragraph>
      <Paragraph position="22"> sgml_TYPE_slot In the named entity task, the name of the slot for the categorization subtask. Default: &amp;quot;TYPE&amp;quot;. sgml_alternative_separator In the named entity and coreference tasks, the character which separates alternatives within attribute values. Note that for the current tasks, this is only relevant for the keys. Default: the vertical bar character, (&amp;quot;|&amp;quot;). sgml_attribute_quote_char In the named entity and coreference tasks, if a tag's attribute value is a string which contains a double quote, the scorer's parser will become confused. This option contains the character which has been substituted for the double quote in the keys or responses file. (Again, in the current tasks, this will only affect how the keys are prepared, since the response don't have attributes that might contain quotes.) Default: the &amp;quot;star&amp;quot; character (&amp;quot;*&amp;quot;). slot_defs The list of slot definitions. This is a required entry in the configuration file. Each slot definition consists of six tokens:  1. The name of the class of object to which the slot belongs.</Paragraph>
      <Paragraph position="23"> 2. The name of the slot.</Paragraph>
      <Paragraph position="24"> 3. The name of the slot that you want printed in the score report file.</Paragraph>
      <Paragraph position="25"> 4. Either &amp;quot;scored&amp;quot; or &amp;quot;unscored,&amp;quot; depending on whether you want the fills of this slot to be scored. 5. The map weight. See the entry for the class_def option for an explanation of this number. 6. The slot type; either &amp;quot;set&amp;quot;, &amp;quot;string&amp;quot;, or &amp;quot;pointer&amp;quot; (you may put anything here for a pointer slot. The scorer  only looks to see that it isn't &amp;quot;set&amp;quot; or &amp;quot;string&amp;quot;).</Paragraph>
      <Paragraph position="26"> Here's an example of the slot_defs option for the named entity class: :slot_defs &amp;quot;enamex text text scored 4 string&amp;quot; &amp;quot;enamex type type scored 4 set&amp;quot; &amp;quot;enamex status status unscored 4 set&amp;quot; &amp;quot;enamex alt alt unscored 4 string&amp;quot; &amp;quot;timex text text scored 4 string&amp;quot; &amp;quot;timex type type scored 4 set&amp;quot; &amp;quot;timex status status unscored 4 set&amp;quot; &amp;quot;timex alt alt unscored 4 string&amp;quot; &amp;quot;numex text text scored 4 string&amp;quot; &amp;quot;numex type type scored 4 set &amp;quot; &amp;quot;numex status status unscored 4 set&amp;quot; &amp;quot;numex alt alt unscored 4 string&amp;quot; stringfill_correct_comparison one of &amp;quot;ORIG&amp;quot;, &amp;quot;STRAIGHTENED&amp;quot;, or &amp;quot;CLEAN&amp;quot;. Which part of a pair of stringfills is compared to see if they match. If ORIG, the original stringfills are compared. If STRAIGHTENED, some massaging is performed: Whitespaces are trimmed before and after the fills, and all whitespaces between the tokens are turned into single spaces. If CLEAN, the premodifiers, postmodifiers, and corporate designators strings (see the option descriptions for these last three) are removed from the string. Default: CLEAN stringfill_partial_comparison If the stringfills don't match correctly, the comparison used to see if partial credit is given for the match. See stringfill_correct_comparison for the possible values. In additions to the three values listed there, you may specifiy NONE (the default) if you want no partial credit given.</Paragraph>
      <Paragraph position="27"> template_name The name of the &amp;quot;template&amp;quot; object used in the scenario template object. This object has a &amp;quot;content&amp;quot; slot (see the &amp;quot;content_name&amp;quot; option) whose filling or leaving empty determines whether document is relevant to the scenario. For scoring the text-filtering part of the task, only one one template object per document will be checked for content. Default &amp;quot;TEMPLATE&amp;quot;.</Paragraph>
      <Paragraph position="28"> use_IE_report_summary Defaults to &amp;quot;no&amp;quot;. If anything else, the one-line-per-object report summaries used for the named entity and coreference tasks will be replaced with the template-object-record-style report summaries used in the Information Extraction tasks. (This option doesn't affect the TE, TR, or ST tasks). Calculation of Scores Template Element (TE) Scoring The methods for scoring the Template Element, Template Relation, Scenario Template, and Named Entity tasks are very similar. From the standpoint of calculating scores, The template element (TE) task is the basic task of these four. This section will explain how TE is scored, and subsequent sections will tell how the NE, TR, and ST tasks can be seen as extensions to TE scoring.</Paragraph>
      <Paragraph position="29"> Simply put, the final score for the four tasks is found by aligning the key objects with the response objects and then comparing the objects' single fills. Structures are aligned at each level of the object/slot/multi-fill/single-fill structure hierarchy. However, it is the single-fill alignments that we count to get the score. The result of aligning one key single fill to one response single fill (or of leaving one key or response single fill unaligned) is called a tally. There are six kinds of tallies:</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ACT
</SectionTitle>
    <Paragraph position="0"> Intuitively, information extraction systems often sacrifice precision for recall, or vice versa. If a system is tuned to &amp;quot;catch everything&amp;quot; (good recall), it often catches more than it should (bad precision). And if it tries to be conservative (good precision), it tends to miss some information (bad recall). When evaluating responses, then, one has to be careful about comparing one response from a system tuned for high recall to another response from a system tuned for high precision. van Rijsbergen's F-measure is used to combine recall and precision measures into one measure.</Paragraph>
    <Paragraph position="1"> The formula for F is  When aligning two multi-fills, the scoring software pairs all single-fills of the multi-fills. For example, if the key multi-fill has three single-fills, and the response multi-fill has two multi-fills, then the scorer creates six pairs of single-fills. Each single-fill pair has an F-score associated with it. The scorer sorts these single-fill pairs by F-score in decreasing order. It then proceeds down the sorted list, picking out pairs of single-fills for which neither single-fill has been chosen yet, and adding them to the final alignment for that pair of multi-fills. Any key or response single fills left over (in our example, there would be a key single fill left) is tallied as missing or spurious.</Paragraph>
    <Paragraph position="2"> A key slot is aligned with a response slot when the two slots have the same name. The lone multi-fill in the response slot is aligned with the multi-fill in the key slot that results in the best multi-fill-to-multi-fill F-score. Any leftover multi-fills in the key slot are unscored, and are tallied as &amp;quot;noncommittal&amp;quot;.</Paragraph>
    <Paragraph position="3"> Key objects are aligned with response objects of the same object &amp;quot;type&amp;quot; or &amp;quot;class&amp;quot; To choose which objects are paired, the scorer first generates all possible pairs of objects in the class. The F-score for each pair of objects is calculated from the way the objects' single-fills align. The weighted F-score is also calculated, by multiplying each slot-pair's F-score by the mapping weight of that slot, and summing the factors. The object pairs are sorted by (unweighted) F-score in decreasing order. Then the scorer proceeds down the sorted list, picking out pairs of objects for which neither single-fill has been chosen yet, and for which the weighted F-score exceeds the threshold for that type of object.</Paragraph>
    <Paragraph position="4"> If any objects are left over after this, the scorer looks for any key objects which are marked &amp;quot;optional&amp;quot;. The single fills of these objects are tallied as non-committal. If any key objects are left after this, their single-fills are tallied as missing. The single fills of any leftover response objects are tallied as spurious.</Paragraph>
    <Paragraph position="5"> When all classes of objects have been aligned, the tallies are summed, and the resulting measures are calculated.</Paragraph>
    <Paragraph position="6"> Template Relation (TR) and Scenario Template (ST) Scoring For the TR and ST tasks, the scoring proceeds just as in TE scoring, but the order of alignment of objects is important. It is helpful to look at the classes of objects in a TR or ST task as vertices of a topological graph. If one type of object has a slot containing pointers to another type of object, then the graph has a directed edge from the first class to the pointed-to class: When comparing a key pointer fill to a response pointer fill, the only way the scorer can compare the pointers is by looking to see if the objects to which they point have already been aligned by the scorer. If they have, and if the object pointed to by the key pointer is aligned to the object pointed to by the response pointer, then the pointers are tallied as correct.</Paragraph>
    <Paragraph position="7"> Since pointer correctness is defined in this way, the directed graph cannot have any directed cycles in it. Further, the scorer has to align the objects so that any pointed-to objects must already have been aligned. So in the above figure, the order of mapping could be D-B-C-A or D-C-B-A. Any other order would confuse the scorer.</Paragraph>
    <Paragraph position="8"> The only other difference between the TR and ST task and the TE task is the existence of implicitly optional objects in the key. In TR, a &amp;quot;relation&amp;quot; object that points to an optional &amp;quot;template element&amp;quot; object is optional, whether it's marked optional or not. And in ST, an object is implicitly optional if the only pointers pointing to that object are in optional slots or in one one multi-fill of a slot, but not in another multi-fill of the same slot (ie, there is an alternative multi-fill in the slot that doesn't point to the object).</Paragraph>
    <Paragraph position="9"> Named Entity (NE) Task Scoring The Named Entity task is scored like the Template Element task, except that the objects which are aligned must come from SGML elements in the same position of the original text file. For instance, if in the key the name &amp;quot;Bill Clinton&amp;quot; is tagged in the first paragraph of an article, and in the response &amp;quot;Bill Clinton&amp;quot; is tagged in the tenth paragraph, the objects will not be aligned, even if they would give an F-score of 100%.</Paragraph>
    <Paragraph position="10"> Coreference (CO) Task Scoring The scoring of the Coreference task is very different from that of the other four tasks. Rather than counting single fills, the CO algorithm compares equivalence classes of objects in the key with equivalence classes of objects in the the response. For a detailed explanation, see A Model-Theoretic Coreference Scoring Scheme, by Mark Vilain, John Burger, John Aberdeen, Dennis Connolly, and Lynette Hirschman in the MUC-6 Proceedings.</Paragraph>
    <Paragraph position="11"> For more website information contact: Ellen Voorhees For more evaluation information contact: Nancy Chinchor Last updated: Tuesday, 08-Mar-05 15:19:11 Date created: Friday, 12-Jan-01</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML