File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/w05-0406_evalu.xml

Size: 3,485 bytes

Last Modified: 2025-10-06 13:59:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0406">
  <Title>Identifying non-referential it: a machine learning approach incorporating linguistically motivated patterns</Title>
  <Section position="9" start_page="45" end_page="46" type="evalu">
    <SectionTitle>
6 Results
</SectionTitle>
    <Paragraph position="0"> Training and testing data were generated from our corpus using the the 25 features described in the previous section. Given Evans's success and the limited amount of training data, we chose to also use TiMBL's k-nearest neighbor algorithm (IB1).</Paragraph>
    <Paragraph position="1"> In TiMBL, the distance metric can be calculated in a number of ways for each feature. The numeric features use the numeric metric and the remaining features (lemmas, POS tags) use the default overlap metric. Best performance is achieved with gain ratio weighting and the consideration of 2 nearest distances (neighbors). Because of overlap in the features for various types of non-referential it and sparse data for cleft, weather, and idiomatic it, all types of non-referential it were considered at the same time and the output was a binary classification of each instance of it as referential or nonreferential. The results for our TiMBL classifier (MBL) are shown in Table 7 alongside our results using a decision tree algorithm (DT, described below) and the results from our replication of Evans  (2001). All three systems were trained and evaluated with the same data.</Paragraph>
    <Paragraph position="2"> All three systems perform a binary classification of each instance of it as referential or nonreferential, but each instance of non-referential it was additionally tagged for type, so the recall for each type can be calculated. The recall by type can been seen in Table 8 for our MBL system. Given that the memory-based learning algorithm is using previously seen instances to classify new ones, it makes sense that the most frequent types have the highest recall. As mentioned in Section 2.2, clefts can be difficult to identify.</Paragraph>
    <Paragraph position="3"> Decision tree algorithms seem suited to this kind of task and have been used previously, but C4.5 (Quinlan, 1993) decision tree algorithm did not perform as well as TiMBL on our data, compare the TiMBL results (MBL) with the C4.5 results (DT) in Table 7. This may be because the verb and adjective lemma features (F10-F12) had hundreds of possible values and were not as useful in a decision tree as in the memory-based learning algorithm.</Paragraph>
    <Paragraph position="4"> With the addition of more relevant, generalized grammatical patterns, the precision and accuracy have increased significantly, but the same cannot be said for recall. Because many of the patterns are designed to match specific function words as the right bracket, cases where the right bracket is omitted (e.g., extraposed clauses with no overt complementizers, truncated clefts, clefts with reduced relative clauses) are difficult to match. Other problematic cases include sentences with a lot of intervening  material between it and the right bracket or simple idioms which cannot be easily differentiated. The results for cleft, weather, and idiomatic it may also be due in part to sparse data. When only 2% of the instances of it are of a certain type, there are fewer than one hundred training instances, and it can be difficult for the memory-based learning method to be very successful.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML