XML Viewer - c04-1109

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1109_intro.xml
Size: 4,068 bytes
Last Modified: 2025-10-06 14:02:11
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1109">
  <Title>Discriminative Slot Detection Using Kernel Methods</Title>
  <Section position="3" start_page="1" end_page="2" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1" end_page="2" type="sub_section">
      <SectionTitle>
2.1 Information Extraction
</SectionTitle>
      <Paragraph position="0"> The major task of IE is to find the elements of an event from text and combine them to form templates or populate databases. Most of these elements are named entities (NEs) involved in the event. To determine which entities in text are involved, we need to find reliable clues around each entity. The extraction procedure starts with  text preprocessing, ranging from tokenization and part-of-speech tagging to NE identification and parsing. Traditional approaches would use various methods of analyzing the results of deep preprocessing to find patterns. Here we propose to use support vector machines to identify clues automatically from the outputs of different levels of preprocessing.</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
2.2 Support Vector Machine
</SectionTitle>
      <Paragraph position="0"> For a two-class classifier, with separable training data, when given a set of n labeled vector examples }1,1{),,(),...,,(),,( 2211 [?]+[?]inn yyXyXyX , a support vector machine (Vapnik, 1998) produces the separating hyperplane with largest margin among all the hyperplanes that successfully classify the examples. Suppose that all the examples satisfy the following constraint: 1),( [?]+&gt;&lt;x bXWy ii It is easy to see that the margin between the two bounding hyperplanes 1, +-=+&gt;&lt; bXW i is 2/||W||. So maximizing the margin is equivalent to minimizing ||W||2 subject to the separation constraint above. In machine learning theory, this margin relates to the upper bound of the VC-dimension of a support vector machine. Increasing the margin reduces the VC-dimension of the learning system, thus increasing the generalization capability of the system. So a support vector machine produces a classifier with optimal generalization capability. This property enables SVMs to work in high dimensional vector spaces.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
2.3 Kernel SVM
</SectionTitle>
      <Paragraph position="0"> The vectors in SVM are usually feature vectors extracted by a certain procedure from the original objects, such as images or sentences. Since the only operator used in SVM is the dot product between two vectors, we can replace this operator by a function ),( ji SSph on the object domain. In our case, Si and Sj are sentences. Mathematically this is still valid as long as ),( ji SSph satisfies Mercer's condition 1 . Function ),( ji SSph is often referred to as a kernel function or just a kernel.</Paragraph>
      <Paragraph position="1"> Kernel functions provide a way to compute the similarity between two objects without transforming them into features.</Paragraph>
      <Paragraph position="2"> The kernel set has the following properties:  1 The matrix must be positive semi-definite 1. If ),(1 yxK and ),(2 yxK are kernels on YX x , 0, &gt;ba , then ),(),( 21 yxKyxK ba + is a kernel on YX x .</Paragraph>
      <Paragraph position="3"> 2. If ),(1 yxK and ),(2 yxK are kernels on YX x , then ),(),( 21 yxKyxK x is a kernel on YX x .</Paragraph>
      <Paragraph position="4"> 3. If ),(1 yxK is a kernel on YX x and ),(2 vuK is a kernel on VU x , then</Paragraph>
      <Paragraph position="6"> When we have kernels representing information from different sources, these properties enable us to incorporate them into one kernel. The general kernels such as RBF or polynomial kernels (Muller et al., 2001), which extend features nonlinearly into higher dimensional space, can also be applied to either the combination kernel or to each component kernel individually.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML