File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-2023_intro.xml

Size: 4,136 bytes

Last Modified: 2025-10-06 14:02:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2023">
  <Title>Improved-Edit-Distance Kernel for Chinese Relation Extraction</Title>
  <Section position="3" start_page="0" end_page="132" type="intro">
    <SectionTitle>
V,(people) and
IBM
</SectionTitle>
    <Paragraph position="0"> (organization). The relation between them is person-affiliation.</Paragraph>
    <Paragraph position="1"> Usually, we can regard RE as a classification problem. All particular entity pairs are found from a text and then decided whether they are a relation which we need or not.</Paragraph>
    <Paragraph position="2"> At the beginning, a number of manually engineered systems were developed for RE problem (Aone and Ramos-Santacruz, 2000). The automatic learning methods (Miller et al., 1998; Soderland, 1999) are not necessary to have someone on hand with detailed knowledge of how the RE system works, or how to write rules for it.</Paragraph>
    <Paragraph position="3"> Usually, the machine learning method represents the NLP objects as feature vectors in the feature extraction step. The methods are named feature-based learning methods. But in many cases, data cannot be easily represented explicitly via feature vectors. For example, in most NLP problems, the feature-based representations produce inherently local representations of objects, for it is computationally infeasible to generate features involving long-range dependencies. On the other hand, finding the suitable features of a particular problem is a heuristic work. Their acquisition may waste a lot of time.</Paragraph>
    <Paragraph position="4"> Different from the feature-based learning methods, the kernel-based methods do not need to extract the features from the original text, but retain the original representation of objects and use the objects in algorithms only via computing a kernel (similarity) function between a pair of objects. Then the kernel-based methods use existing learning algorithms with dual form, e.g.</Paragraph>
    <Paragraph position="5"> the Voted Perceptron (Freund and Schapire, 1998) or SVM (Cristianini and Shawe-Taylor, 2000), as kernel machine to do the classification task.</Paragraph>
    <Paragraph position="6">  Haussler (1999) and Watkins (1999) proposed a new kernel method based on discrete structures respectively. Lodhi et al. (2002) used string kernels to solve the text classification problem. Zelenko et al. (2003) used the kernel methods for extracting relations from text. They defined the kernel function over shallow parse representation of text. And the kernel method is used in conjunction with the SVM and the Voted Perceptron learning algorithms for the task of extracting person-affiliation and organization-location relations from text.</Paragraph>
    <Paragraph position="7"> As mentioned above, the discrete structure kernel methods are more suitable to RE problems than the feature-based methods. But the string-based kernel methods only consider the word forms without their semantics. Shallow parser based kernel methods need shallow parser systems. Because the performance of shallow parser systems is not high enough until now, especially for Chinese text, we cannot depend on it completely. null To cope with these problems, we propose the Improved-Edit-Distance (IED) algorithm to calculate the kernel (similarity) function. We consider the semantic similarity between two words in two strings and some structure information of strings.</Paragraph>
    <Paragraph position="8"> The rest of the paper is organized as follows. In Section 2, we introduce the kernel-based machine learning algorithms and their application in natural language processing problems. In Section 3, we formalize the relation extraction problem as a machine learning problem. In Section 4, we give a novel kernel method, named the IED kernel method. Section 5 describes the experiments and results on a particular relation extraction problem. In Section 6, we discuss the reason why the IED based kernel method yields a better result than other methods. Finally, in Section 7, we give the conclusions and comments on the future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML