File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2076_metho.xml

Size: 6,898 bytes

Last Modified: 2025-10-06 14:10:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2076">
  <Title>Machine-Learning-Based Transformation of Passive Japanese Sentences into Active by Separating Training Data into Each Input Particle</Title>
  <Section position="4" start_page="587" end_page="589" type="metho">
    <SectionTitle>
3 Machine learning method (support
</SectionTitle>
    <Paragraph position="0"> vector machine) We used a support vector machine as the basis of our machine-learning method. This is because support vector machines are comparatively better than other methods in many research areas (Kudoh and Matsumoto, 2000; Taira and Haruno, 2001;</Paragraph>
    <Section position="1" start_page="587" end_page="589" type="sub_section">
      <SectionTitle>
Small Margin Large Margin
</SectionTitle>
      <Paragraph position="0"> Murata et al., 2002).</Paragraph>
      <Paragraph position="1"> Data consisting of two categories were classified by using a hyperplane to divide a space with the support vector machine. When these two categories were, positive and negative, for example, enlarging the margin between them in the training data (see Figure 4  ), reduced the possibility of incorrectly choosing categories in blind data (test data). A hyperplane that maximized the margin was thus determined, and classification was done using that hyperplane. Although the basics of this method are as described above, the region between the margins through the training data can include a small number of examples in extended versions, and the linearity of the hyperplane can be changed to non-linear by using kernel functions. Classification in these extended versions is equivalent to classification using the following discernment function, and the two categories can be classified on the basis of whether the value output by the function is positive or negative (Cristianini and Shawe-Taylor, 2000; Kudoh, 2000):  The open circles in the figure indicate positive examples and the black circles indicate negative. The solid line indicates the hyperplane dividing the space, and the broken lines indicate the planes depicting margins.</Paragraph>
      <Paragraph position="3"> indicates the context of a training datum, and y</Paragraph>
      <Paragraph position="5"> maximizes the value of L(a) in Eq. (3) under the conditions set by Eqs. (4) and (5).</Paragraph>
      <Paragraph position="7"> Although function K is called a kernel function and various functions are used as kernel functions, we have exclusively used the following polynomial function:</Paragraph>
      <Paragraph position="9"> C and d are constants set by experimentation. For all experiments reported in this paper, C was fixed as 1 and d wasfixedas2.</Paragraph>
      <Paragraph position="10"> A set of x</Paragraph>
      <Paragraph position="12"> , and the summation portion of Eq. (1) is only calculated using examples that are support vectors. Equation 1 is expressed as follows by using support vectors.</Paragraph>
      <Paragraph position="14"> The circles on the broken lines in Figure 4 indicate support vectors.</Paragraph>
      <Paragraph position="15">  Support vector machines are capable of handling data consisting of two categories. Data consisting of more than two categories is generally handled using the pair-wise method (Kudoh and Matsumoto, 2000).</Paragraph>
      <Paragraph position="16"> Pairs of two different categories (N(N-1)/2 pairs) are constructed for data consisting of N categories with this method. The best category is determined by using a two-category classifier (in this paper, a support vector machine  is used as the two-category classifier), and the correct category is finally determined on the basis of &amp;quot;voting&amp;quot; on the N(N-1)/2 pairs that result from analysis with the two-category classifier.</Paragraph>
      <Paragraph position="17"> The method discussed in this paper is in fact a combination of the support vector machine and the pair-wise method described above.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="589" end_page="590" type="metho">
    <SectionTitle>
4 Features (information used in
</SectionTitle>
    <Paragraph position="0"> classification) The features we used in our study are listed in Table 1, where N is a noun phrase connected to the  We used Kudoh's TinySVM software (Kudoh, 2000) as the support vector machine.</Paragraph>
    <Paragraph position="1">  The category number indicates a semantic class of words. A Japanese thesaurus, the Bunrui Goi Hyou (NLRI, 1964), was used to determine the category number of each word. This thesaurus is 'is-a' hierarchical, in which each word has a category number. This is a 10-digit number that indicates seven levels of 'is-a' hierarchy. The top five levels are expressed by the first five digits, the sixth level is expressed by the next two digits, and the seventh level is expressed by the last three digits.</Paragraph>
    <Paragraph position="2">  Kondo et al. constructed a rich dictionary for Japanese verbs (Kondo et al., 2001). It defined types and characteristics of verbs. We will refer to it as VDIC. case particle being analyzed, and P is the phrase's predicate. We used the Japanese syntactic parser, KNP (Kurohashi, 1998), for identifying N, P, parts of speech and syntactic relations.</Paragraph>
    <Paragraph position="3"> In the experiments conducted in this study, we selected features. We used the following procedure to select them.</Paragraph>
    <Paragraph position="4"> * Feature selection We first used all the features for learning. We next deleted only one feature from all the features for learning. We did this for every feature. We decided to delete features that would make the most improvement. We repeated this until we could not improve the rate of accuracy. null 5 Method of separating training data into each input particle We developed a new method of separating training data into each input (source) particle that uses machine learning for each particle. For example, when we identify a target particle where the source particle is ni, we use only the training data where the source particle is ni. When we identify a target particle where the source particle is ga, we use only the training data where the source particle is ga.</Paragraph>
    <Paragraph position="5"> Frequently occurring target case particles are very different in source case particles. Frequently occurring target case particles in all source case particles are listed in Table 2. For example, when ni is a source case particle, frequently occurring  target case particles are ni or ga. In contrast, when ga is a source case particle, a frequently occurring target case particle is wo.</Paragraph>
    <Paragraph position="6"> In this case, it is better to separate training data into each source particle and use machine learning for each particle. We therefore developed this method and confirmed that it was effective through experiments (Section 6).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML