File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/98/m98-1019_relat.xml
Size: 2,286 bytes
Last Modified: 2025-10-06 14:16:05
<?xml version="1.0" standalone="yes"?> <Paper uid="M98-1019"> <Title>NYU: DESCRIPTION OF THE JAPANESE NE SYSTEM USED FOR MET-2</Title> <Section position="6" start_page="1" end_page="1" type="relat"> <SectionTitle> RELATED WORK </SectionTitle> <Paragraph position="0"> There have been several e#0Bortstoapply machine learningtechniques tothe sametask #5B4#5D #5B3#5D #5B5#5D#5B2#5D. In this section, we will discuss a system whichisoneofthe most advanced and which closely resembles our own #5B2#5D. A good review of most of theother systems can be foundintheir paper.</Paragraph> <Paragraph position="1"> Their system uses thedecision tree algorithmand almost the same features. However, there are signi#0Ccant di#0Berences between the systems. Themain di#0Berence is thatthey have more than onedecision tree, each of whichdecides if a particular named entitystarts#2Fends atthe currenttoken. In contrast, our system has only onedecision tree which produces probabilities of informationaboutthenamed entity.Inthis regard, we are similar to #5B3#5D, which also uses a probabilistic method in their N-gram based system. This is a crucial di#0Berence which also has important consequences. Because the system of #5B2#5Dmakes multiple decisions at eachtoken, they could assign multiple, possibly inconsistenttags. They solved the problem byintroducing two somewhat idiosyncratic methods. Oneofthem is thedistance score, whichisusedto#0Cndanopening and closing pair for eachnamed entitymainly based on distance information. Theother is thetag priority scheme, whichchooses a named entity among di#0Berenttypes of overlapping candidates based on the priority order of named entities. These methods require parameters whichmust be adjusted when they are applied toanew domain. In contrast, our system does not require suchmethods, as themultiple possibilities are resolved bythe probabilistic method. This is a strong advantage, because we don't need manual adjustments.</Paragraph> <Paragraph position="2"> The resultthey reported is not comparable to our result, because thetext andde#0Cnition are di#0Berent.</Paragraph> <Paragraph position="3"> Butthetotal F-score of our system is similartotheirs, even though the size of our trainingdataismuch smaller.</Paragraph> </Section> class="xml-element"></Paper>