File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-2402_evalu.xml

Size: 11,854 bytes

Last Modified: 2025-10-06 13:59:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2402">
  <Title>Semantic Lexicon Construction: Learning from Unlabeled Data via Spectral Analysis</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> We study Spectral's performance in comparison with the algorithms discussed in the previous sections.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Baseline algorithms
</SectionTitle>
      <Paragraph position="0"> We use the following algorithms as baseline: EM, cotraining, and co-EM, as established techniques for learning from unlabeled data in general; the bootstrapping method proposed by Thelen and Riloff (2002) (hereafter, TRB and TR) as a state-of-the-art bootstrapping method designed for semantic lexicon construction.</Paragraph>
      <Paragraph position="1">  co-EM Naive Bayes classifier To instantiate EM, co-training, and co-EM, we use a standard Naive Bayes classifier, as it is often used for co-training experiments, e.g., (Nigam and Ghani, 2000; Pierce and Cardie, 2001). As in Nigam and Ghani (2000)'s experiments, we estimate  The underlying naive Bayes assumption is that occurrences of features are conditionally independent of each other, given class labels. The generative interpretation in this case is analogous to that of text categorization, when we regard features (or contexts) of all the occurrences of word a10 as a pseudo document.</Paragraph>
      <Paragraph position="2"> We initialize model parameters (a1a3 a8a61a33a55a12 anda1a3 a8a49a48a36a34a33a55a12 ) using labeled examples. The test data is labeled aftera22 iterations. We explorea22 a2a24a23a29a37a55a53a55a53a55a53a68a37a25a23a11a26 for EM and co-EM, anda22 a2a27a23a29a37a55a53a55a53a55a53a68a37a25a23a11a26a28a26 for co-training7. Analogous to the choice of input vectors for spectral analysis, we hypothesize that using all the unlabeled data for EM and co-EM may rather degrade performance. We feed EM and co-EM with the a0 most frequent unlabeled words8. As for co-training, we let each of the classifiers predict labels of all the unlabeled data, and choosea29a30a26 words labeled with the highest confidence9.</Paragraph>
      <Paragraph position="3"> Co-training and co-EM require two redundantly sufficient and conditionally independent views of features. We split features randomly, as in one of the settings in Nigam and Ghani (2000). We also tested left context vs.</Paragraph>
      <Paragraph position="4"> right context (not reported in this paper), and found that random split performs slightly better.</Paragraph>
      <Paragraph position="5"> To study the potential best performance of the baseline methods, we explore the parameters described above and report the best results.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Implementation of Spectral
</SectionTitle>
      <Paragraph position="0"> In principle, spectral vectors can be used with any linear classifier. In our experiments, we use a standard centroid-based classifier using cosine as the similarity measure.</Paragraph>
      <Paragraph position="1"> For comparison, we also test count vectors (with and without tf-idf weighting) with the same centroid-based classifier.</Paragraph>
      <Paragraph position="2"> Spectral has two parameters: the number of input vectors a0 , and the subspace dimensionality a1 . We set a0 a2 a23a11a26a28a26a28a26 and a1 a2a32a31a28a26 based on the observation on a corpus disjoint from the test corpora, and use these settings for all the experiments.</Paragraph>
      <Paragraph position="3">  Cf. Naive Bayes classifiers (NB) trained with 7500 seeds produce 62.9% on average over five runs with random training/test splits.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Target Classes and Data
</SectionTitle>
      <Paragraph position="0"> Following previous semantic lexicon studies, we evaluate on the classification of lemma-form nouns. As noted by several authors, accurate evaluation on a large number of proper nouns (without context) is extremely hard since the judgment requires real-world knowledge. We choose to focus on non-proper head nouns. To generate the training/test data, we extracted all the non-proper nouns which appeared at least twice as the head word of a noun phrase in the AP newswire articles (25K documents), using a statistical syntactic chunker and a lemmatizer. This resulted in approx. 10K words. These 10K words were manually annotated with six classes: five target classes - persons, organizations, geo-political entities (GPE), locational entities, and facilities --, and 'others'.</Paragraph>
      <Paragraph position="1"> The assumed distribution (a4 ) was that of general newspaper articles. The definitions of the classes follow the annotation guidelines for ACE (Automatic Content Extraction)10. Our motivation for choosing these classes is the availability of such independent guidelines. The breakdown of the 10K words is as follows.</Paragraph>
      <Paragraph position="2"> Per. 1347 13.8% Fac. 238 2.4% Loc. 145 1.5% GPE 17 0.2% Org. 136 1.4% Others 7871 80.7% The majority (80.7%) are labeled as Others. The most populous target class is Person (13.8%). The reason for GPE's small population is that geo-political entities are typically referred to by their names or pronouns rather than common nominal. We measure precision  combine them into the F-measure with equal weight.</Paragraph>
      <Paragraph position="3"> The chance performance is extremely low since target classes are very sparse. Random choice would result in F-measure=6.3%. Always proposing Person would produce F=23.1%.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.4 Features
</SectionTitle>
      <Paragraph position="0"> Types of feature extractors used in our experiments are essentially the same as those used in TR's experiments, 10http://www.nist.gov/speech/index.htm which exploit the syntactic constructions such as subjectverb, verb-object, NP-pp-NP (pp is preposition), and subject-verb-object. In addition, we exploit syntactic constructions shown to be useful by other studies -- lists and conjunctions (Roark and Charniak, 1998), and adjacent words (Riloff and Shepherd, 1997).</Paragraph>
      <Paragraph position="1"> We count feature occurrences (a44 a8a49a48a18a50a51a37a27a10a13a12 ) in the unannotated corpus. All the tested methods are given exactly the same data points.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.5 High-frequency seed experiments
</SectionTitle>
      <Paragraph position="0"> Prior semantic lexicon studies (e.g., TR) note that the choice of seeds is critical - i.e., seeds should be high-frequency words so that methods are provided with plenty of feature information to bootstrap with. In practice, this can be achieved by first extracting the most frequent words from the target corpus and manually labeling them for use as seeds.</Paragraph>
      <Paragraph position="1"> To simulate this practical situation, we split the above 10K words into a labeled set and an unlabeled set11, by choosing the a0 most frequent words as the labeled set, where a0 a2 a23a11a26a28a26a7a37a31a28a26a28a26 , anda29a30a26a28a26 . Note that approximately 80% of the seeds are negative examples ('Others'). As we assume that test data is known at the time of training, we use the unlabeled set as both unlabeled data and test data.</Paragraph>
      <Paragraph position="2"> 5.5.1 AP-corpus high-frequency seed results Overall F-measure results on the AP corpus are shown in Figure 1. The columns of the figure are roughly sorted in the descending order of performance. Spectral significantly outperforms the others. The algorithms that exploit unlabeled data outperform those which do not. Tf-idf and Count perform poorly on this task. Although TRB's performance was better on a smaller number of seeds in this particular setting, it showed different trends in other settings.</Paragraph>
      <Paragraph position="3"> Spectral trained with 300 or 500 labeled examples (and 1000 unlabeled examples via spectral analysis) rivals Naive Bayes classifiers trained with 7500 labeled examples (which produce a1a3a2 a62 a4a6a5 on average over five runs 11The labels of the 'unlabeled set' are hidden from the methods. null  erage over five runs with different seeds. Numbers in parentheses are 'random-seed performance' minus 'high-frequency-seed performance' (in Figure 1).</Paragraph>
      <Paragraph position="4"> with random training/test splits).</Paragraph>
      <Paragraph position="5"> Also note that the reported numbers for TRB, cotraining, co-EM, and EM are the best performance among the explored parameter settings (described in Section 5.1.1), whereas Spectral's parameters were determined on a corpus disjoint from the test corpora once and used for all the experiments (Section 5.2).</Paragraph>
      <Paragraph position="6"> 5.5.2 WSJ-corpus high-frequency seed results Figure 2 shows the results of Spectral and the best-performing baseline algorithms when features are extracted from a different corpus (Wall Street Journal 36K documents). We use the same 10K words as the labeled/unlabeled word set while discarding 501 words which do not occur in this corpus. Spectral outperforms the others. Furthermore, Spectral trained with 300 or 500 seeds rivals Naive Bayes classifiers trained with 7500 seeds on this corpus (which achieve a1a3a2 a62 a1 a5 on average over five runs with random training/test splits).</Paragraph>
    </Section>
    <Section position="6" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.6 Random-seed experiments
</SectionTitle>
      <Paragraph position="0"> To study performance dependency on the choice of seeds, we made labeled/unlabeled splits randomly. Figure 3 shows results of Spectral and the best-performing base-line algorithms. The average results over five runs using different seeds are shown.</Paragraph>
      <Paragraph position="1"> All the methods (except Spectral) exhibit the same tendency. That is, performance on random seeds is lower than that on high-frequency seeds, and the degradation is larger when the number of seeds is small. This is not surprising since a small number of randomly chosen seeds provide much less information (corpus statistics) than high frequency seeds. However, Spectral's perfor- null input vectors for spectral analysis. AP-corpus. Random seeds. Using count vectors from high-, medium, low-frequency words (1000 each), and all the 10K words.</Paragraph>
      <Paragraph position="2"> mance does not degrade on randomly chosen seeds. We presume that this is because it learns from unlabeled data independently from seeds.</Paragraph>
    </Section>
    <Section position="7" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.7 Choice of input vectors for spectral analysis
</SectionTitle>
      <Paragraph position="0"> Recall that our basic idea is to use vectors with small estimation errors to achieve better subspace approximation.</Paragraph>
      <Paragraph position="1"> This idea led to applying spectral analysis to the most frequent words. We confirm the effectiveness of this strategy in Figure 4. 'Medium' and 'Low' in the figure compute the subspaces from 1000 words with medium frequency (68 to 197) and with low frequency (2 on average), respectively. Clearly, standard Spectral ('High': computing subspace from the most frequent 1000 words; frequency  a4a2a1 ) outperforms the others. When all the vectors are used (as LSI does), performance degrades to below Medium. 'Low' gains almost no benefits from spectral analysis. The results are in line with our prediction.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML