File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/n03-2005_evalu.xml
Size: 5,631 bytes
Last Modified: 2025-10-06 13:58:57
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-2005"> <Title>Story Link Detection and New Event Detection are Asymmetric</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Evaluation Metric </SectionTitle> <Paragraph position="0"> TDT system evaluation is based on the number of false alarms and misses produced by a system. In link detection, the system should detect linked story pairs; in new event detection, the system should detect new stories. A the a priori target and non-target probabilities, set to 0.02 and 0.98, respectively. The detection cost is normalized by dividing by mina5 a72 a76a79a78 a70a33a70</Paragraph> <Paragraph position="2"> perfect system scores 0, and a random baseline scores 1.</Paragraph> <Paragraph position="3"> Equal weight is given to each topic by accumulating error probabilities separately for each topic and then averaged.</Paragraph> <Paragraph position="4"> The minimum detection cost is the decision cost when the decision threshold is set to the optimal confidence score.</Paragraph> <Paragraph position="5"> 6 Differences between LNK and NED The conditions for false alarms and misses are reversed for the LNK and NED tasks. In the LNK task, incorrectly flagging two stories as being on the same event is considered a false alarm. In contrast, in the NED task, incorrectly flagging two stories as being on the same event will cause a true first story to be missed. Conversely, incorrectly labeling two stories that are on the same event as not linked is a miss, but for the NED task, incorrectly labeling two stories on the same event as not linked may result in a false alarm.</Paragraph> <Paragraph position="6"> In this section, we analyze the utility of a number of techniques for the LNK and NED tasks in an information retrieval framework. The detection cost in Eqn. 1 assigns a higher cost to false alarms since a72a95a76a79a78 a70a33a70</Paragraph> <Paragraph position="8"> minimize false alarms by identifying only linked stories, which results in high precision for LNK. In contrast, a NED system will minimize false alarms by identifying all stories that are linked, which translates to high recall for LNK. Based on this observation, we investigated a number of precision and recall enhancing techniques for the NED task for on-topic and off-topic pairs.</Paragraph> <Paragraph position="9"> LNK and NED systems, namely, part-of-speech tagging, an expanded stoplist, and normalizing abbreviations and transforming spelled out numbers into numbers. We also investigated the use of different similarity measures.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.1 Similarity Measures </SectionTitle> <Paragraph position="0"> The systems developed for TDT primarily use cosine similarity as the similarity measure. In work on text segmentation (Brants et al., 2002), better performance was observed with the Hellinger measure. Table 1 shows that for LNK, the system based on cosine similarity performed better; in contrast, for NED, the system based on Hellinger similarity performed better.</Paragraph> <Paragraph position="1"> The LNK task requires high precision, which corresponds to a large separation between the on-topic and off-topic distributions, as shown for the cosine metric in values for on-topic). Figure 2, which is based on pairs that contain the current story and its most similar story in the story history, shows a greater separation in this region with the Hellinger metric. For example, at 10% recall, the Hellinger metric has 71% false alarm rate as compared to 75% for the cosine metric.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.2 Part-of-Speech (PoS) Tagging </SectionTitle> <Paragraph position="0"> To reduce confusion among some word senses, we tagged the terms as one of five categories: adjective, noun, proper nouns, verb, or other, and then combined the stem and part-of-speech to create a &quot;tagged term&quot;. For example, 'N train' represents the term 'train' when used as a noun. The LNK and NED systems were tested using the tagged terms. Table 2 shows the opposite effect PoS tagging has on LNK and NED.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.3 Stop Words </SectionTitle> <Paragraph position="0"> The broadcast news documents in the TDT collection have been transcribed using Automatic Speech Recognition (ASR). There are systematic differences between ASR and manually transcribed text. For example &quot;30&quot; will be spelled out as &quot;thirty&quot; and 'CNN&quot; is represented as three separate tokens &quot;C&quot;, &quot;N&quot;, and &quot;N&quot;. To handle these differences, an &quot;ASR stoplist&quot; was created by identifying terms with statistically different distributions in a parallel corpus of manually and automatically transcribed documents, the TDT2 corpus. Table 3 shows that use of an ASR stoplist on the topic-weighted minimum detection costs improves results for LNK but not for NED.</Paragraph> <Paragraph position="1"> We also performed &quot;enhanced preprocessing&quot; to normalize abbreviations and transform spelled-out numbers into numerals, which improves both precision and recall. Table 3 shows that enhanced preprocessing exhibits worse performance than the ASR stoplist for Link Detection, but yields best results for New Event Detection.</Paragraph> </Section> </Section> class="xml-element"></Paper>