File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/98/w98-1118_relat.xml

Size: 5,483 bytes

Last Modified: 2025-10-06 14:16:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1118">
  <Title>Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition</Title>
  <Section position="11" start_page="157" end_page="158" type="relat">
    <SectionTitle>
9 RELATED WORK
</SectionTitle>
    <Paragraph position="0"> M.E. has been successfully applied to many other tasks in computational linguistics. Some recent work for which there are solid comparable benchmarks is the work of Adwait Ratnaparkhi at the University of Pennsylvania. He has achieved state-of-the art results by applying M.E. to parsing (Ratnaparkhi, 1997a), part-of-speech tagging (Ratnaparkhi, 1996), and sentence-boundary detection (Reynar and Ratnaparkhi, 1997). Other recent work has applied M.E. to language modeling (Rosenfeld, 1994), machine translation (Berger et al., 1996), and reference resolution (Kehler, 1997). M.E. was first applied to named entity recognition at the MUC-7 conference by (Borthwick et al., 1998) and (Mikheev and Grover, 1998).</Paragraph>
    <Paragraph position="1"> Note that part-of-speech tagging is, in many ways, a very similar task to that of named-entity recognition. Ratnaparkhi's tagger is similar to MENE, in that his features look at the surrounding two-word lexical context, but his system makes less use of dictionaries. On the other hand, his system looks at word suffixes and prefixes in the case of unknown words, which is something we haven't tried with MENE and looks at its own output by looking at its previous two tags when making its decision. We do this implicitly through our requirement that the futures we output be consistent, but we found that an attempt to do this more directly by building a consistency feature directly into the model had no effect on our results.</Paragraph>
    <Paragraph position="2"> At the MUC-7 conference, there were two other interesting systems using statistical techniques from the Language Technology Group/University of Edinborough (Mikheev and Grover, 1998) and BBN (Miller et al., 1998). Comparisons with the LTG system are difficult since it was a hybrid model in which the text was passed through a five-stage process, only three of which involved maximum entropy and over half of the system's recall came from the two non-statistical phases. The LTG system demonstrated superior performance on the formal run relative to the MENE-Proteus hybrid system (93.39 vs 88.80), but it isn't clear whether their advantage came from superior handcoded rules or superior statistical techniques, because their system is not as easily broken down into separate components as is MENE-Proteus. It is also possible that tighter system integration between the statistical and hand-coded components was responsible for some of LTG's relative advantage, but note that MENE-Proteus appears to have an advantage over LTG in terms of portability. We are currently experimenting with porting MENE to Japanese, for instance, and expect that it could be combined with a pre-existing Japanese handcoded system, but it isn't clear that this could be done with the LTG system. Nevertheless, one of our avenues for future research is to look at tighter multi-system integration methods which won't compromise MENE's essential portability.</Paragraph>
    <Paragraph position="3"> Table 4 gives a comparison of BBN's HMM-based Identifinder (Miller et al., 1998) and NYU's MENE and MENE-Proteus systems on different training and test sets. We are not sure why MENE-Proteus was hurt more badly by the evaluationtime switch from aviation disaster articles to mis- null sile/rocket launch articles, but suspect that it may have been due to Identifinder's greater quantity and quality of training data. BBN used 790,000 words of training data to our 321,000. The quality advantage may have come from selecting sentences from a larger corpus for their annotators to tag which were chosen so as to increase the variety of training data.</Paragraph>
    <Paragraph position="4"> When MENE-only and ldentifinder are compared training on the same number of articles and testing on within-domain data, Identifinder still has an edge. We speculate that this is due to the dynamic updating of Identifinder's vocabulary during decoding when person or organization names are recognized, which gives the system a sort of long-distan'ce reference resolution which is lacking in MENE. In addition, BBN's HMM-based system implictly predicts named entities based on consecutive pairs of words rather than based on single words, as is done in MENE, because each type of name has its own bigram language model. In the decoding process, the Viterbi algorithm chooses the sequence of names which yields the highest joint probability of names, words, and features associated with each word.</Paragraph>
    <Paragraph position="5"> In comparing the maximum entropy and HMM-based approaches to named entity recognition, we are hopeful that M.E. will turn out to be the better method in the end. Ire think it is possible that some of Identifinder's current advantage can be neutralized by simply adding the just-mentioned features to MENE. On the other hand, we have a harder time seeing how some of MENE's strengths can be integrated into an HMM-based system. It is not clear, for instance, how a wide variety of dictionaries could be added to Identifinder or whether the system could be combined with a handcoded system as was done with our system and the one from LTG.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML