File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-1052_intro.xml

Size: 2,979 bytes

Last Modified: 2025-10-06 14:03:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1052">
  <Title>Extracting Relations with Integrated Information Using Kernel Methods</Title>
  <Section position="3" start_page="419" end_page="419" type="intro">
    <SectionTitle>
2 Kernel Methods
</SectionTitle>
    <Paragraph position="0"> Many machine learning algorithms involve only the dot product of vectors in a feature space, in which each vector represents an object in the object domain. Kernel methods (Muller et al., 2001) can be seen as a generalization of feature-based algorithms, in which the dot product is replaced by a kernel function (or kernel) Ps (X,Y) between two vectors, or even between two objects. Mathematically, as long as Ps (X,Y) is symmetric and the kernel matrix formed by Ps is positive semi-definite, it forms a valid dot product in an implicit Hilbert space. In this implicit space, a kernel can be broken down into features, although the dimension of the feature space could be infinite.</Paragraph>
    <Paragraph position="1"> Normal feature-based learning can be implemented in kernel functions, but we can do more than that with kernels. First, there are many well-known kernels, such as polynomial and radial basis kernels, which extend normal features into a high order space with very little computational cost.</Paragraph>
    <Paragraph position="2"> This could make a linearly non-separable problem separable in the high order feature space. Second, kernel functions have many nice combination properties: for example, the sum or product of existing kernels is a valid kernel. This forms the basis for the approach described in this paper. With these combination properties, we can combine individual kernels representing information from different sources in a principled way.</Paragraph>
    <Paragraph position="3"> Many classifiers can be used with kernels. The most popular ones are SVM, KNN, and voted perceptrons. Support Vector Machines (Vapnik, 1998; Cristianini and Shawe-Taylor, 2000) are linear classifiers that produce a separating hyperplane with largest margin. This property gives it good generalization ability in high-dimensional spaces, making it a good classifier for our approach where using all the levels of linguistic clues could result in a huge number of features. Given all the levels of features incorporated in kernels and training data with target examples labeled, an SVM can pick up the features that best separate the targets from other examples, no matter which level these features are from. In cases where an error occurs in one processing result (especially deep processing) and the features related to it become noisy, SVM may pick up clues from other sources which are not so noisy. This forms the basic idea of our approach. Therefore under this scheme we can overcome errors introduced by one processing level; more particularly, we expect accurate low level information to help with less accurate deep level information.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML