File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1218_intro.xml

Size: 3,346 bytes

Last Modified: 2025-10-06 14:00:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1218">
  <Title>A Clustering Algorithm-for Chinese Adjectives and Nouns 1</Title>
  <Section position="4" start_page="124" end_page="125" type="intro">
    <SectionTitle>
2 Concepts
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="124" end_page="124" type="sub_section">
      <SectionTitle>
2.1 Problem Description
</SectionTitle>
      <Paragraph position="0"> Our problem can be described as follows: given the set of adjectives A, the set of nouns N and the collocation instances, our system will construct a partition P~ over N and a partition Pa over A that respectively contain sets of nouns and sets of adjectives. And both partitions meet the condition that words in the same set (called cluster) have similar semantic distribution environment.</Paragraph>
    </Section>
    <Section position="2" start_page="124" end_page="124" type="sub_section">
      <SectionTitle>
2.2 Partitions and Clusters
</SectionTitle>
      <Paragraph position="0"> Let S be a set, S~ c S(i = 1,2,...,n). If</Paragraph>
      <Paragraph position="2"> (2) S, NSj = O, Vi, j = 1,2,...n,i ~ j Then Ps is a partition over S.</Paragraph>
      <Paragraph position="3"> In this paper, we call A i ~Pa an &amp;quot;adjective cluster&amp;quot; and Ni ~P~v a &amp;quot;noun cluster&amp;quot;. And we want to obtain the composition of partitions &lt; PA, PN &gt; as the clustering remit.</Paragraph>
    </Section>
    <Section position="3" start_page="124" end_page="125" type="sub_section">
      <SectionTitle>
2.3 Distance between Clusters
</SectionTitle>
      <Paragraph position="0"> In order to measure the distance between clusters of the same part of speech, we use the following equations:</Paragraph>
      <Paragraph position="2"> where O~ is the distribution environment of ~ and is make up of nouns which can be collocated with distribution environment composed of adjectives collocated with N i . ~i A,. ~ is the of N~ and is which can be andW s follow similar definitions. This distance is a kind of Euchdean distance.</Paragraph>
    </Section>
    <Section position="4" start_page="125" end_page="125" type="sub_section">
      <SectionTitle>
2.4 Colloeational Degree
</SectionTitle>
      <Paragraph position="0"> Since redundant collocations might be created during clustering, the concept &amp;quot;collocational'degree&amp;quot; is used to measure ~e collocational relationship between a cluster and its distribution environment. The coUocational degree is defined as the ratio of the existing collocation instances between the cluster and its distribution envffonment to all possible collocations generated by them. Thus,</Paragraph>
      <Paragraph position="2"> IN, till where C is the set of all existing instances.</Paragraph>
    </Section>
    <Section position="5" start_page="125" end_page="125" type="sub_section">
      <SectionTitle>
2.5 Redundant Ratio
</SectionTitle>
      <Paragraph position="0"> After we get the collocational degree of a cluster, redundant ratio (marked as r) is calculated to measure the whole performance of the clustering result. We define the redundant ratio as 1 minus the ratio of all existing instances to all possible collocations generated by all clusters (including nouns and adjectives) and their distribution environments. So r is calculated as</Paragraph>
      <Paragraph position="2"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML