File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/w00-0104_evalu.xml
Size: 4,914 bytes
Last Modified: 2025-10-06 13:58:39
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0104"> <Title>Automatic Extraction of Systematic Polysemy Using Tree-cut</Title> <Section position="6" start_page="24" end_page="26" type="evalu"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"> To test our method, we chose 5 combinations from WordNet noun Top categories (which we call top relation classes), and extracted cluster pairs which have more than 3 overlapping words. Then we evaluated those pairs in two aspects: related vs. unrelated relations, and automatic vs. manual clusters.</Paragraph> <Section position="1" start_page="24" end_page="25" type="sub_section"> <SectionTitle> 4.1 Related vs. Unrelated Clusters </SectionTitle> <Paragraph position="0"> Of the cluster pairs we extracted automatically, not all are systematically related; some are unrelated, homonymous relations. They are essentially false positives for our purposes. Table 1 shows the number of related and unrelated relations in the extracted cluster pairs.</Paragraph> <Paragraph position="1"> Although the results vary among category combinations, the ratio of the related pairs is rather low: less than 60% on average. There are several reasons for this. First, there are some pairs whose relations are spurious. For example, in ARTIFACT-GROUP class, a pair \[LUMBER, SOCIAL_GROUP\] was extracted. Words which are common in the two clusters are &quot;picket&quot;, &quot;board&quot; and &quot;stock&quot;. This relation is obviously homonymous.</Paragraph> <Paragraph position="2"> Second, some clusters obtained by tree-cut are rather abstract, so that pairing two abstract clusters results in an unrelated pair. For example, in ARTIFACT-MEASURE class, a pair \[INSTRUMENTALITY, LINEAR_UNIT\] was selected.</Paragraph> <Paragraph position="3"> Words which are common in the two clusters include &quot;yard&quot;, &quot;foot&quot; and &quot;knot&quot; (see the previous Figure 4). Here, the concept INSTRUMENTALITY is very general (at depth 1), and it also contains many (polysemous) words. So, matching this cluster with another abstract cluster is likely to yield a pair which has just enough overlapping words but whose relation is not systematic. In the case of \[INSTRUMENTALITY, LINEAR_UNIT\], the situation is even worse, because the concept of LINEAR_UNIT in MEASURE represents a collection of terms that were chosen arbitrarily in the his- null tory of the English language.</Paragraph> </Section> <Section position="2" start_page="25" end_page="26" type="sub_section"> <SectionTitle> 4.2 Automatic vs. Manual Clusters </SectionTitle> <Paragraph position="0"> To compare the cluster pairs our method extracted automatically to manually extracted clusters, we use WordNet cousins. A cousin relation is relatively new in WordNet, and the coverage is still incomplete. However, it gives us a good measure to see whether our automatic method discovered systematic relations that correspond to human intuitions.</Paragraph> <Paragraph position="1"> A cousin relation in WordNet is defined between two synsets (currently in the noun trees only), and it indicates that senses of a word that appear in both of the (sub)trees rooted by those synsets are related, s The cousins were manuMly extracted by the WordNet lexicographers. Table 2 shows the number of cousins listed for each top relation class and the number of cousins our automatic method recovered (in the 'Auto' column). As you see, the total recall ratio is over 80% (27/33~ .82).</Paragraph> <Paragraph position="2"> In the right three columns of Table 2, we also show the breakdown of the recovered cousins, whether each recovered one was an exact match, or it was more general or specific than the corresponding WordNet cousin. From this, we can see that more than half of the recovered cousins were more general than the WordNet cousins. That is partly because some WordNet cousins have only one or two common words.</Paragraph> <Paragraph position="3"> For example, a WordNet cousin \[PAINTING, COLORING_MATERIAL\] in ARTIFACT-SUBSTANCE has only one common word &quot;watercolor&quot;. Such SActually, cousin is one of the three relations which indicate the grouping of related senses of a word. Others are sister and twin. In this paper, we use cousin to refer to all relations listed in &quot;cousin.tps&quot; file (available in a WordNet distribution).</Paragraph> <Paragraph position="4"> a minor relation tends to be lost in our tree generalization procedure. However, the main reason is the difficulty mentioned earlier in the paper: the problem of applying the tree-cut technique to a bushy tree when the data is sparse. In addition to the WordNet cousins, our automatic extraction method discovered several interesting relations. Table 3 shows some examples, null</Paragraph> </Section> </Section> class="xml-element"></Paper>