XML Viewer - m93-1011

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/m93-1011_metho.xml
Size: 24,273 bytes
Last Modified: 2025-10-06 14:13:24
<?xml version="1.0" standalone="yes"?>
<Paper uid="M93-1011">
  <Title>GE-CMU: DESCRIPTION OF THE SHOGUN SYSTEM USEDFOR MUC- 5</Title>
  <Section position="4" start_page="109" end_page="110" type="metho">
    <SectionTitle>
SYSTEM OVERVIE W
</SectionTitle>
    <Paragraph position="0"> The TIPSTER/SHOGUN system as configured for the 24-month/ MUC-5 benchmark has roughly the same components as earlier versions of the system, but the system now performs linguistic analysis entirel y using a finite-state pattern matcher, instead of LR parsing or chart-style parsing, both of which were par t of the configuration in MUC-4 .</Paragraph>
    <Paragraph position="1"> Figure 1 shows the basic components of the SHOGUN system, using our own names for modules, wher e applicable, along with the labels used in Jerry Hobbs ' paper &amp;quot;The Generic Information Extraction System&amp;quot; . The core components of SHOGUN are a subset of the modules that Hobbs describes . However, the syste m differs from other current extraction systems in the use of the finite-state analyzer and the way that corpus-based knowledge is integrated into the lexico-syntactic rules .</Paragraph>
    <Paragraph position="2">  Because many of the MUC-5 systems now perform much the same type of pre-processing, name recognition, and post. processing that SIIOGUN has, we will concentrate here on linguistic analysis, includin g  parsing and lexical disambiguation, which were the main research areas of our work on SIIOGUN . About half of the MUC-5 systems still use linguistic analysis driven by &amp;quot;traditional&amp;quot; phrase structur e rules, traditional in the sense that there is a clearly separable syntactic component whose knowledge consists mainly of rules for recognizing grammatical constituents based on word categories (like noun, verb) and wor d order. SHOGUN differs from all these systems in that it no longer has any purely syntactic component, an d uses finite state rules in place of phrase structure rules .</Paragraph>
    <Paragraph position="3"> The remaining systems divide roughly into those that emphasize pattern matching and those that emphasize fragment parsing. The fragment parsing systems, notably BBN's, work fairly close to the way our MU( 1- 4 system did, taking advantage of partial parses by using a combination of syntactic and domain knowledg e to guide the combination of syntactic chunks . The difference between this approach and SHOGUN's curren t processing is that fragment parsing is still a largely syntax-first method, while pattern matching tends t o introduce specialized domain and corpus knowledge by combining this knowledge with syntactic knowledg e in the system's declarative representation .</Paragraph>
    <Paragraph position="4"> By this coarse characterization, the &amp;quot;pattern matching &amp;quot; group of systems includes, for example, SRI an d Unisys as well as GE-CMU . We also consider UMass to be in this category, because their linguistic analysi s emphasizes lexical and conceptual knowledge rather than constituent structure .</Paragraph>
    <Paragraph position="5"> Among these approaches, we believe the main differentiator is not in the basic processing algorithms but, in the way that knowledge ends up getting assigned to various system components . If there is one noteworth y trend among the MUC systems as they have evolved over time, it is that they have become more knowledge based, especially emphasizing more corpus-based and lexical knowledge as well as automated knowledge acquisition methods. Within the emerging &amp;quot;generic&amp;quot; model, the main difference among systems is thus i n the content of their knowledge bases . Here, the distinguishing characteristic of SHOGUN is probably the degree to which the system still includes sentence-level knowledge, assigning linguistic and conceptual role s much the way the TRUMP/TRUMPET combination did but using more detailed, lexically-driven knowledge .</Paragraph>
    <Paragraph position="6"> Many of the sentence-level rules, for example, include groupings like start a facility and organization nou n phrase, which combine traditional syntactic phrases with lexical or domain knowledge .</Paragraph>
    <Paragraph position="7"> As systems continue to become still broader in scope and more accurate, it is likely that the way knowledg e is acquired will become the main differentiator .</Paragraph>
    <Paragraph position="8"> The rest of this paper will discuss the overall results of SHOGUN on MUC-5 and describe how the syste m handles some of the system walkthrough examples . The analysis of the examples will highlight some of thes e characteristics and demonstrate the system's actions in various stages of processing .</Paragraph>
  </Section>
  <Section position="5" start_page="110" end_page="110" type="metho">
    <SectionTitle>
OVERALL RESULTS
</SectionTitle>
    <Paragraph position="0"> The SHOGUN system did very well on MUC-5 . The team 's specific goals were to achieve results on the MUC 5/TIPSTER tasks that were above the level of the simpler MUC-4 task, to attain comparable performance across languages and domains, and to reduce customization time as much as possible . In addition, the ai m was to produce near-human accuracy at a throughput orders of magnitude faster than human beings . These goals seemed rather ambitious, but SHOGUN reached all of them .</Paragraph>
    <Paragraph position="1"> The following is a summary of SHOGUN 's performance on all the official metrics . We put error rat e first and F-measure last in this table because these are the only ones that can be used for overall syste m comparison (the goal being low error rate and high F-measure) .</Paragraph>
    <Paragraph position="2">  it is very difficult to compare results across domains across languages, it is clear that this shows substantia l progress, as the MUC-5 tasks are certainly much harder and more detailed than MUC-4 . In addition, the average improvement between the TIPSTER 18 month benchmark and the current point was over 20%, an d there is certainly more room for further improvement . Thus we are confident that our current methods an d algorithms support continued progress toward high accuracy .</Paragraph>
    <Paragraph position="3"> While it seems that there is substantial variation among the scores on the different language-domain pairs , this variation is reasonable given the differences among the task and the variations on the test samples . The EME result is worse than the others, but the EME MUC-5 test set seemed to be a very difficult one for ou r system . In fact, the system on a blind test using the same configuration scored 9 error rate points better i n EME than on the test reported above . We are not sure what accounts for this variability in EME, which is much greater than on the other domain-language pairs .</Paragraph>
    <Paragraph position="4"> With respect to achieving human performance, it is not clear where good human perform falls on these scales, but we are close. At the TIPSTER 12-month test, a study of trained human analysts placed individua l analysts between 70 and 80 in F-measure. However, this test used a somewhat more generous scorin g algorithm than the current one (there have been a number of important changes to the scoring since th e 12-month point), and did not separate the analysts work from the preparation of the &amp;quot;ideal&amp;quot; answers--it i s important in a blind test that the human subject have no impact on the answer key, because there are man y texts that involve fine-grained interpretation .</Paragraph>
    <Paragraph position="5"> The results on Japanese are, on average, somewhat higher than the English results . This is consistent with our tests. We attribute this to the fact that the Japanese tests are considerably easier than the English ( a factor that is somewhat difficult to weight, given that none of our system developers know Japanese) . Some of the influences that make the Japanese easier are greater homogeneity in the text sources (for example , In;ME includes very different sources from EJV, while JJV and JME are quite consistent in style), shorte r stories with fewer distinct events in Japanese, far fewer new joint venture companies in Japanese, and a n emphasis in Japanese on research and sales rather than production (production activities are more difficul t to assign to codes in the template design) .</Paragraph>
    <Paragraph position="6"> In addition to the SHOGUN system, the GE-CMU team ran the Japanese benchmarks only using a syste m called TEXTRACT, which was developed in parallel to SHOGUN by Tsuyoshi Kitani, a visiting researche r at CMU from NTT Data . TEXTRACT, like SHOGUN, emphasizes lexically-driven pattern matching , and the two systems share a Japanese tagging/segmentation program from NTT Data, called MAJESTY .</Paragraph>
    <Paragraph position="7"> While there is little else that is directly shared between the two system's, additions to TEXTRACT 's knowledge base were incrementally adapted, in functionality, to SHOGUN 's knowledge base in JJV, thus it it, not surprising that the systems had similar performance on this set . TEXTRACT generally had a better performance on company name recognition than SHOGUN, and a somewhat more effective metho d of splitting events . SHOGUN had better coverage of industry types and products (based, we think, on the heavy use of statistically-based training), and had higher recall (but lower precision) in JME .</Paragraph>
    <Paragraph position="8"> Figure 3 shows the results of both systems on the recall/precision scale on the various MUC-5 sets .</Paragraph>
  </Section>
  <Section position="6" start_page="110" end_page="115" type="metho">
    <SectionTitle>
ANALYSIS OF WALKTHROUGH MESSAGE S
</SectionTitle>
    <Paragraph position="0"> Overview of Example s The examples are in many ways typical of the TIPSTER-SHOGUN system. These are relatively easy messages, but the problems the system encountered are illustrative. In the English message, the system made a few minor mistakes, some of which may even have been matters of fine-grained interpretation, and had an error rate of 15 for EJV0592 . This is much better than the average message ; on the whole, the EJV performance is pulled down by &amp;quot;tangled tie-up&amp;quot; messages in which the system has a great deal of difficulty determining who is doing what with whom .</Paragraph>
    <Paragraph position="1"> .IJ V0002 was much harder, because it requires information to be split across two tie-ups . The system correctly determined that there were two tie-ups (which it did not do when it ran this message at the 12 nronth point), but, it failed to recognize &amp;quot;Toukyou kaijou&amp;quot; as an alias for &amp;quot;Toukyou kaijou kasai hoken &amp;quot; , and as a result ended up getting a whole bunch of aliases and entity pointers wrong . In addition, SHOGUN mad e the very typical mistake of almost getting the product service information but losing most of the points , anyway. In this case, the Japanese text says that the tie-up will be selling a new product called &amp;quot;hyu-man &amp;quot; .  SHOGUN correctly spots this and assumes that whatever &amp;quot;hyu-man&amp;quot; is will be wholesale sales with code 50 . The analyst infers from the context that &amp;quot;hyu-man&amp;quot; is an insurance product, so the actual industry type i s &amp;quot;finance&amp;quot; rather than &amp;quot;sales&amp;quot; . Finally, the answer key contains an error in the string fill, so SHOGUN gets scored completely wrong on this object .</Paragraph>
    <Paragraph position="2"> We emphasize these minor mistakes because it helps to show, for one thing, how hard it is to get extremely high accuracy, and, for another, the relative effects of easy and hard objects . SHOGUN was, by far, the most accurate system in determining industry information, probably because our efforts on automated knowledg e acquisition used this object as a test case for both English and Japanese . However, the net effect of th e industry object in SHOGUN was a reduction in error of .2 in English and 1 .2 in Japanese over what th e system would have produced by leaving the product service slot blank . This is because potentially spuriou s information on hard objects and slots dilutes the good scores produced on the easier objects and slots . Hence it is very difficult to show improvement by getting more information ; the easiest improvements are to ge t higher and higher performance on the &amp;quot;critical&amp;quot; slots and objects .</Paragraph>
    <Paragraph position="3"> In addition, the system made many technical errors with the location and alias slots, some of which ar e illustrated here . Often these were due to bugs, but there are many other problems . The location slot(s) proved much more difficult than expected, because many forms of subtle inferences often affect locatio n information, such as inferring that one site subsumes another or inferring location by process of eliminatio n (especially in Japanese) .</Paragraph>
    <Paragraph position="4"> We will now show, very briefly, the results of each stage in processing of SHOGUN on the EJV and JJ V examples.</Paragraph>
    <Paragraph position="6"> Where a company name is marked in pre-processing, this means that the name is &amp;quot;learned&amp;quot; rather tha n recognized as a known name . In JJV0002, Daiwashouken ()C*IIIA) is a known name, so it is not marked above.</Paragraph>
    <Paragraph position="7"> Linguistic analysi s Linguistic analysis uses the same pattern matcher and same knowledge base notation as pre-processing, bu t relies on a mixture of syntactic and lexical information to perform sentence-level interpretation . For example, the following is one rule for marking verb phrases with activity information in English :</Paragraph>
    <Paragraph position="9"> In linguistic analysis, the pattern matcher annotates the text, much like it does during pre-processing , but these annotations can be very close to the roles that portions of text will play in the template . For example, where pre-processing finds company names and organization descriptions, sentence analysis wil l often find partners and ventures.</Paragraph>
    <Paragraph position="10"> The following are exarnples of this analysis from the walkthrong h</Paragraph>
    <Paragraph position="12"> TRUMPET then takes these pieces of semantic interpretation and tries to map them onto a final template, applying domain constraints, reference resolution, and heuristics for merging and splitting information fro m multiple sentences and paragraphs .</Paragraph>
    <Section position="1" start_page="115" end_page="115" type="sub_section">
      <SectionTitle>
Discourse Processin g
</SectionTitle>
      <Paragraph position="0"> Before producing the final template, SHOGUN must take all the references to objects and events and try to resolve them . Often the resolution of object references affects the resolution of event references, because th e objects become the only tie-in from one description of an event to the next .</Paragraph>
      <Paragraph position="1"> The discourse processing knowledge of the system is considerably more developed in English than i n Japanese . This is a case where it was difficult to do all the experiments we would have liked because ou r developers were not bilingual, and discourse cues in Japanese are often fairly subtle .</Paragraph>
      <Paragraph position="2"> In EJV 0592, the system correctly resolves most of the event and object references, but still does badl y on the location and activity site slots because it assumes that the location of the joint venture company i s the location of the production activity, and it fails to guess that &amp;quot;Kaohsiung&amp;quot; is in Taiwan . In addition , there is a very subtle inference here that the production of clubs in Japan is not an additional location fo r the production of clubs by the Taiwan unit ; SHOGUN treats both Japan and Taiwan as production bases .</Paragraph>
      <Paragraph position="3">  In order to process Japanese, the SHOGUN system uses a morphological analyzer called MAJESTY developed at NTT Data. As part of our early efforts in the Joint Venture domain, Tusyoshi Kitani of NT T Data (who was then a visiting scientist at Carnegie Mellon) wrote several AWK scripts to identify Japanes e company names in the segmented output . Later, rules for identifying other kinds of text fields includin g proper names, locations, numbers and times were added . This year, he has extended this set of finite-stat e rules and augmented it with other modules to perform the entire TIPSTER task on Japanese texts . For the MUC-5 evaluation, we have submitted TEXTRACT's results on the JJV and JME texts as optional scores .</Paragraph>
      <Paragraph position="4"> These were officially scored by the government, and the results appear in the table .</Paragraph>
      <Paragraph position="5">  I'I?X'I'Ii .A('1' is comprised of four major components : preprocessing, pattern snatching, discourse processin g and template generation . Although only the first of these modules is shared with the SIHO( .7UN system , ,, ,</Paragraph>
      <Paragraph position="7"> both systems share the basic method of using finite-state pattern matching instead of full natural language parsing.</Paragraph>
      <Paragraph position="8"> In the preprocessor, Japanese text is segmented into primitive words and they are tagged parts of speech by a Japanese segmenter called MAJESTY . Then, proper nouns, monetary, numeric and temporal expressions are identified by the proper noun recognizer . The comprising segments are grouped together to provid e meaningful sets of segments to the succeeding processes [2] . The pattern matcher searches all possibl e patterns of interest in a sentence that match defined patterns such as tie-up relationships and economi c activities. In the discourse processor, company names are identified uniquely throughout a text, allowin g recognition of company relationships and correct merging of information within a text . Finally, the template generator puts extracted information together to create the required template format .</Paragraph>
      <Paragraph position="9"> The JJV configuration of TEXTRACT has been under development since the Spring, and during th e TIPSTER 18 month evaluation it achieved a recall of 29 and a precision of 70 (for an F-measure of 40 .9) . With 5 months of additional work, TEXTRACT now has a recall of 60 and a precision of 68, giving an F-measure of 63 .8 .</Paragraph>
      <Paragraph position="10"> The JJV Textract system was ported to the microelectronics domain in three (3) weeks by one person . This was possible because most of the system's modules were shared across both domains (and because identifying company names is a key element to performance in both domains) . Most of the development time was spent identifying key expressions from the corpus . The JME configuration of the TEXTRAC T system performed about the same as the base SHOGUN system on JME, but had higher precision compare d to the higher recall of SHOGUN .</Paragraph>
      <Paragraph position="11"> Our experience with TEXTRACT confirms that finite-state pattern matching allows for very rapid development of high performance text extraction for new domains .</Paragraph>
      <Paragraph position="12"> TEXTRACT : Company name identification throughout a text .</Paragraph>
      <Paragraph position="13"> Unifying multiple references to the same company throughout a text is key to achieving a high performanc e in the template structure of joint venture . A notion Of &amp;quot;topic companies,&amp;quot; which are the main concern i n the sentence, was introduced. Topic companies are identified where subject case markers such as &amp;quot; h ;&amp;quot; and &amp;quot; (+- &amp;quot; appear . When a subject is missing in a sentence, which is often the case in Japanese, the subject i s automatically assumed as the topic companies taken from the previous sentence .</Paragraph>
      <Paragraph position="14"> Company aliases are identified by applying a substring matching algorithm called the longest commo n subsequence (LCS) method . References of three kinds of company name pronouns, &amp;quot; Ha&amp;quot; (dousha ; the company), &amp;quot; (jisha; the company itself), and &amp;quot;MI&amp;quot; (ryousha; both companies) are also identified using the topic companies and some heuristic rules .</Paragraph>
      <Paragraph position="15"> Every company name in the text, including company aliases and pronouns, is given a unique numbe r by the discourse process. Using topic companies and the unique number, individual pieces of informatio n identified by the preprocessor and the pattern matcher are merged together to generate a relevant templat e structure.</Paragraph>
      <Paragraph position="16"> TEXTRACT: Analysis of a Walkthrough Messag e In JJV0002, all five entities were correctly identified by the preprocessor . The pattern matcher also recognize d two tie-ups correctly, although the pattern selected from four matched patterns was incorrect in Sentence 2 as shown in the traces below. TEXTRACT found one tie-up from Sentence 2, only because it cannot identify multiple tie-ups in a sentence with the current design .</Paragraph>
      <Paragraph position="17"> Sentence no . = 1</Paragraph>
      <Paragraph position="19"> An alias &amp;quot; 3krt +-&amp;quot; (Toukyou kaijou) was found by the LCS method as a substring of the entity name (Toukyou Kaijou Kasai Hoken) . References of &amp;quot;ME&amp;quot; (ryousha; both companies) were correctly resolved as &amp;quot; H AXiLE &amp;quot; (Nisshin Kasai Kaijou Hoken) and &amp;quot; M[fkX' _E%M&amp;quot; (Douwa Kasai Kaijou Hoken) . After the discourse processing, entities were given unique numbers (uniqueid) a s follows:</Paragraph>
      <Paragraph position="21"> Industry objects and the product service slot were completely wrong due to the following reasons : (1) TEXTRACT did not find the Product/Servicel string, and (2) although it did spot the Product/Service 2 string, it gave a wrong pointer to Activityl due to a system bug . Another observation regarding the industry object was that TEXTRACT gave the industry type &amp;quot;sales&amp;quot; with SIC 50 to Product/Service 2 as the SHOGUN system did .</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="115" end_page="115" type="metho">
    <SectionTitle>
COMBINING SYSTEMS : SHOGUN + TEXTRACT
</SectionTitle>
    <Paragraph position="0"> For the Japanese Microelectronics domain, the SHOGUN system scored the highest recall, while the TEXTRACT system scored the highest precision . The F-measure and error scores were almost exactly the same .</Paragraph>
    <Paragraph position="1"> We developed a statistical technique to combine these systems in a way to improve the F-measure, and as a by-product we determined the theoretical limits of combining the output of the two systems .</Paragraph>
    <Paragraph position="2"> The combining algorithm works as follows : both SHOGUN and TEXTRACT are run on an input text , and the output templates are given as input to the combiner . The following methods were examined: SHOGUN this row just shows the scores for the SHOGUN system .</Paragraph>
    <Paragraph position="3"> TEXTRACT this row shows the scores for the TEXTRACT system .</Paragraph>
    <Paragraph position="4"> Theoretical max this row shows the scores for a system which chooses perfectly whether SHOGUN o r TEXTRACT has the better answer for a particular text.</Paragraph>
    <Paragraph position="5"> Entity weight D=T this row shows the results of using total entity weight to select the output template , using TEXTRACT output in case of ties .</Paragraph>
    <Paragraph position="6">  Note that the best performing method was the total entity weight, which used statistics from the development corpus for the entity-name slot to determine which output template had more commonly foun d company names. Intuitively, if the output template had more companies that were associated with correc t keys from the development corpus, that template is more likely to be correct . Note also that no knowledge-free combining method gave a better F-measure than either of the two systems alone .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML