File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-0117_evalu.xml

Size: 6,022 bytes

Last Modified: 2025-10-06 13:59:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0117">
  <Title>Posts and Telecommunications yuandong@bupt.edu.cn</Title>
  <Section position="5" start_page="122" end_page="124" type="evalu">
    <SectionTitle>
3 Evaluation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="122" end_page="124" type="sub_section">
      <SectionTitle>
3.1 Open tracks
</SectionTitle>
      <Paragraph position="0"> In this open track, we used one lexicon of 294,382 entries, which included the entries of 42,430 MDWs (Morphological Derived Words) generated from the GKB dictionary, 12,487 PNs, 22,907 LNs and 29,032 ONs, 10,414 four-character idioms, plus the word lists generated from the training data provided by the second international Chinese Word Segmentation bakeoff and 80114 GKB words. We also used the training data provided by the last bakeoff for training our trigram word-based language model.</Paragraph>
      <Paragraph position="1">  Table 1 presents the results of this track. For comparison, we also include in the table (Row 1) the results of basic system. From Row 2 to Row 11, it shows the relative contribution of each component and resource to the overall word segmentation performance. The second column shows the recall, the third column the precision, and the fourth column F-score. The last two columns present the recall of the OOV words and the recall of IV words, respectively.</Paragraph>
      <Paragraph position="2">  From Table 1 we can find that, in Row 1, the basic system participated in the last bakeoff already achieves quite good recall, but the recall of OOV is not very good because it cannot correctly identify unknown words that are not in the lexicon such as factoids and name entities (especially the nested named entity) and new words (except factoids, named entities and words abstracted from training data). In Row 2, we only rewrite the factoid rules according to the MSRA Guidelines, and the recall of OOV improves significantly while the recall of IV falls slightly. It shows that the factoid detection affects the recall of IV. As shown in Table 1, the GKB lexicon has made significant and persistent progress in all performance because the GKB lexicon is refined and the words are conformed to the MSRA standard. We also find that the NE postprocessor can improve the recall of OOV but affects slightly the recall of IV in all experiments. It shows that our named entity recognition has make improvement compared with that of last year. As shown in Table 1, TBL has made slightly but persistent progress in all steps it applies to. After TBL adaptation OOV recall stays almost unchanged, for the rules are derived from training corpus, and no OOV words would meet the condition of applying them in theory, but IV recall improves, which compensates the loss of IV recall caused by NE post-process and the factoid detection. It is interesting comparing the performance of two TBL template sets, the first template set is simple and the threshold for generating rules is 3 by default (called TBL in Table 1), and the second is more complicated with a &amp;quot;0&amp;quot; threshold (called New TBL in Table 1). The number of rules generated is 1061 and 12135 respectively. Our experiments demonstrate that more precise rule template set with low threshold always leads to better performance, for they could cover more situations, although a simple rule template set with high threshold does better in OOV word recognition.</Paragraph>
      <Paragraph position="3">  In the track, we used People's Daily 2000 corpus (Yu, 2003) for building our lexicon and training our model.</Paragraph>
      <Paragraph position="4"> Considering that organization names are irregular in their forms compared with person names and location names, and there are many abbreviations and anaphora, TBL adaptation may degrade the performance of organization, we submitted two results, as shown in Table 2.</Paragraph>
      <Paragraph position="5"> 1+TBL1 means that TBL only adapt person and location results of basic system, the organization performance of basic system and 1+TBL1 would be identical. 1+TBL2 means TBL adapt all three types of NE. For comparison, we list (Column 2) the results of basic system. The Row 2 to Row 13 shows the recall, the precision, and the F-score of PN, LN, ON and total.</Paragraph>
      <Paragraph position="6">  To our surprise, performance listed in Table 2 demonstrates that applying TBL causes a dramatic improvement in all three types of NE, especially organization performance. The great similarity between training corpus and test corpus of MSRA may explain this. For the inconsistency of standard between MSRA and PKU, the recall, especially of the ONs, is not very good. We did some effort in the standard adaptation, such as constraint the length and type of candidate words in combining the named entities, but the result is not very good.</Paragraph>
    </Section>
    <Section position="2" start_page="124" end_page="124" type="sub_section">
      <SectionTitle>
3.2 Closed tracks
</SectionTitle>
      <Paragraph position="0"> In Table 3, the basic system (2) shows the window size of the template is 2 and the basic system (3) is 3. As is shown in the table, except the precision and the recall of OOV, the performance of window size with 2 outperforms that of window size with 3.</Paragraph>
      <Paragraph position="1"> In Table 4, the system 6' is the one we submitted in this closed CityU track, but the system 6 is better than the system 6'. In TBL training, we made a mistake that the training data weren't processed by factoid tool and lexicon combining.</Paragraph>
      <Paragraph position="2"> We also can find that the factoid tool doesn't improve the performance. The system 6 isn't the best one (system 3).</Paragraph>
      <Paragraph position="3"> Combining the separated words according to training lexicon improved the performance of both MSRA and CITYU closed track. In the meantime, TBL worked considerably well in all closed tracks.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML