File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0130_metho.xml

Size: 10,606 bytes

Last Modified: 2025-10-06 14:10:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0130">
  <Title>Chinese Named Entity Recognition with Conditional Probabilistic Models</Title>
  <Section position="4" start_page="0" end_page="173" type="metho">
    <SectionTitle>
2 Named Entity Recognizer
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="173" type="sub_section">
      <SectionTitle>
2.1 Models
</SectionTitle>
      <Paragraph position="0"> We trained two named entity recognizers based on conditional random field and one based on maximum entropy model. Both conditional random field and maximum entropy models are capable of modeling arbitrary features of the input, thus are well suit for many language processing tasks. However, there exist significant differences between these two models. To apply a maximum entropy model to NER task, we have to first train a maximum entropy classifier to classify each individual word and then build a dynamic programming for sequence decoding.</Paragraph>
      <Paragraph position="1"> While in CRFs, these two steps are integrated together. Thus, in theory, CRFs are superior to maximum entropy models in sequence modeling problem and this will also confirmed in our Chinese NER experiments. The superiority of CRFs on Chinese information processing was also demonstrated in word segmentation (Peng et al.</Paragraph>
      <Paragraph position="2"> 2004). However, the training speed of CRFs is much slower than that of maximum entropy models since training CRFs requires expensive forward-backward algorithm to compute the partition function.</Paragraph>
      <Paragraph position="3">  We used Taku's CRF package  to train the first CRF recognizer, and the MALLET  package with BFGS optimization to train the second CRF recognizer. We used a C++ implementation  of maximum entropy modeling and wrote our own second order dynamic programming for decoding. null</Paragraph>
    </Section>
    <Section position="2" start_page="173" end_page="173" type="sub_section">
      <SectionTitle>
2.2 Features
</SectionTitle>
      <Paragraph position="0"> The first CRF recognizer used the features C  . In addition, the first CRF recognizer used the tag bigram feature, and the second CRF recognizer used word and character cluster features, obtained automatically from the training data only with distributional word clustering (Tishby and Lee, 1993). The maximum entropy recognizer used the following unigram, bigram features, and type  .</Paragraph>
      <Paragraph position="1"> When using the first CRF package, we found the labeling scheme OBIE performs better than the OBIE scheme. In the OBI scheme, the first character of a named entity is labeled as &amp;quot;B&amp;quot;, the remaining characters, including the last character, are all labeled as &amp;quot;I&amp;quot;. And any character that is not part of a named entity is labeled as &amp;quot;O&amp;quot;. In the OBIE scheme, the last character of a named entity is labeled as &amp;quot;E&amp;quot;. The other characters are labeled in the same way as in OBIE scheme. The first CRF recognizer used the OBIE labeling scheme, and the second CRF recognizer used the OBI scheme.</Paragraph>
      <Paragraph position="2"> We tried a window size of seven characters (three characters preceding the current character and three characters following the current character) with almost no difference in performance from using the window size of five characters. When a named entity occurs frequently in the training data, there is a very good chance that it will be recognized when appearing in the testing data. However, for entity names of rare occurrence, they are much harder to recognize in the  ine the testing data to identify the named entities that occur in the training data, and assign them the same label as in the training data. From the training data, we extracted the person names of at least three characters, the place names of at least four characters, and the organization names of at least four characters. We removed from the dictionary the named entities that are also common words. We did not include the short names in the dictionary because they may be part of long names. We produced a run first using one of the NER recognizers, and then replaced the labels of a named entity assigned by a recognizer with the labels of the same named entity in the training data without considering the contexts.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="173" end_page="174" type="metho">
    <SectionTitle>
3 Results
</SectionTitle>
    <Paragraph position="0"> NER task on MSRA corpus.</Paragraph>
    <Paragraph position="1"> Table 1 presents the official results of five runs in the closed test of the NER task on MSRA corpus. The first two runs, msra_a and msra_b, are produced using the first CRF recognizer; the next two runs, msra_f and msra_g, are produced using the second CRF recognizer which used randomly selected 90% of the MSRA training data. When we retrained the second CRF recognizer with the whole set of the MSRA training data, the overall F-Score is 85.00, precision 90.28%, and recall 80.31%. The last run, msra_r, is produced using the MaxEnt recognizer.</Paragraph>
    <Paragraph position="2"> The msra_a run used the set of basic features with a window size of five characters. Slightly over eight millions features are generated from the MSRA training data, excluding features occurred only once. The training took 321 iterations to complete. The msra_b run is produced from the msra_a run by substituting the labels assigned by the recognizer to a named entity with the labels of the named entity in the training data if it occurs in the training data. For example, in the MSRA training data, the textBi Jia Sak Gu Ju in the sentence Wo Huan Dao Bi Jia Sak Gu Ju Qu Zhan Yang is tagged as a place name. The same entity also appeared in MSRA testing data set. The first CRF recognizer failed to mark the text Bi Jia Sak Gu Ju as  a place name instead it tagged Bi Jia Sak as a per-son name. In post-processing, the textBi Jia Sak Gu Ju in the testing data is re-tagged as a place name. As another example, the person nameZhang Yem Sheng appears both in the training data and in the testing data. The first CRF recognizer failed to recognize it as a person name. In post-processing the text Zhang Yem Sheng is tagged as a person name because it appears in the training data as a person name. The text &amp;quot;Quan Guo Ren Da Xiang Gang Te Bie Hang Zheng Qu Chou Bei Wei Yuan Hui &amp;quot; was correctly tagged as an organization name. It is not in the training data, but the texts &amp;quot;Quan Guo Ren Da &amp;quot;, &amp;quot;Xiang Gang Te Bie Hang Zheng Qu &amp;quot;, and &amp;quot;Chou Bei Wei Yuan Hui &amp;quot; are present in the training data and are all labeled as organization names. In our postprocessing, the correctly tagged organization name is re-tagged incorrectly as three organization names. This is the main reason why the performance of the organization name got much worse than that without post-processing.</Paragraph>
    <Paragraph position="3">  ken down by entity type.</Paragraph>
    <Paragraph position="4"> Table 2 presents the performance of the msra_a run by entity type. Table 3 shows the performance of the msra_b run by entity type. While the post-processing improved the performance of person name recognition, but it degraded the performance of organization name recognition.</Paragraph>
    <Paragraph position="5"> Overall the performance was worse than that without post-processing. In our development testing, we saw large improvement in organization name recognition with post-processing.</Paragraph>
    <Paragraph position="6">  NER task on CITYU corpus.</Paragraph>
    <Paragraph position="7"> Table 4 presents the official results of four runs in the closed test of the NER task on CITYU corpus. The first two runs, msra_a and msra_b, are produced using the first CRF recognizer; the next two runs, msra_f and msra_g, are produced using the second CRF recognizer. The system configurations are the same as used on the MSRA corpus. The cityu_b run is produced from cityu_a run with post-processing, and the cityu_g run produced from cityu_f run with post-processing.</Paragraph>
    <Paragraph position="8"> We used the whole set of CITYU to train the first CRF model, and 80% of the CITYU training data to train the second CRF model. No results on full training data are available at the time of submission. null All the runs we submitted are based characters.</Paragraph>
    <Paragraph position="9"> We tried word-based approach but found it was not as effective as character-based approach.</Paragraph>
  </Section>
  <Section position="6" start_page="174" end_page="175" type="metho">
    <SectionTitle>
4 Discussions
</SectionTitle>
    <Paragraph position="0"> Table 4 is shows the confusion matrix of the labels. The rows are the true labels and the columns are the predicated labels. An entry at row x and column y in the table is the number of characters that are predicated as y while the true label is x. Ideally, all entries except the diagonal should be zero.</Paragraph>
    <Paragraph position="1"> The table was obtained from the result of our development dataset for MSRA data, which are the last 9,364 sentences of the MSRA training data (we used the first 37,000 sentences for training in the model developing phase). As we can see, most of the errors lie in the first column, indicating many of the entities labels are predicated as O. This resulted low recall for entities. Another major error is on detecting the beginning of ORG (B-O). Many of them are mislabeled as O and beginning of location (B-L), resulting low recall and low precision for ORG.</Paragraph>
    <Paragraph position="2">  A second interesting thing to notice is the numbers presented in Table 2. They may suggest that person name recognition is more difficult  than location name recognition, which is contrary to what we believe, since Chinese person names are short and have strict structure and they should be easier to recognize than both location and organization names. We examined the MSRA testing data and found out that 617 out 1,973 person names occur in a single sentence as a list of person names. In this case, simple rule may be more effective. When we excluded the sentence with 617 person names, for person name recognition of our msra_a run, the F-score is 90.74, precision 93.44%, and recall 88.20%.</Paragraph>
    <Paragraph position="3"> Out of the 500 person names that were not recognized in our msra_a run, 340 occurred on the same line of 617 person names.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML