File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-1807_metho.xml

Size: 6,929 bytes

Last Modified: 2025-10-06 14:08:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1807">
  <Title>A Knowledge Based Approach to Identification of Serial Verb Construction in Chinese-to-Korean Machine Translation System</Title>
  <Section position="4" start_page="2" end_page="4" type="metho">
    <SectionTitle>
5. Identification of SVCs
</SectionTitle>
    <Paragraph position="0"> To recognize SVCs, we divide the identifying process into two stages. The general categories of SVCs are able to be found at the analysis stage and the subcategories of a separated event SVC are detected in the transfer stage.</Paragraph>
    <Section position="1" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
5.1 Analysis Stage
</SectionTitle>
      <Paragraph position="0"> To recognize the five general categories of SVCs, two resources are used: one is the Grammatical Knowledge Base of Contemporary Chinese (GKBCC) and the other is a verb list with valency information (VLVI) (Zhu et al., 1995). Checking a verb in GKBCC allows us to simply detect a pivot SVC. The remainders of the other types of SVCs should be carefully handled. There are two possible ambiguous  structures of SVCs Case 1 : NP V1 V2 (NP2) Case 2 : NP V1 NP1 V2 (NP2)  Where NP, NP1 and NP2 are noun phrases. The algorithm for each case is illustrated in figure 2 and figure 3. In Figure 3, the test 'V1 takes NP &amp; VP' means that the verb Tou Ting can have a noun phrase or an object clause as an object. The test, 'satisfy valency' denotes that the second verb Xi Huan takes a human subject, and Wai Guo Ren can be the subject of the verb Xi Huan , thus it is classified as an object case. For the other sentence, since Gong Yuan cannot be the subject of the verb Duan Lian , it is determined as a subject</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="4" type="sub_section">
      <SectionTitle>
5.2 Transfer Stage
</SectionTitle>
      <Paragraph position="0"> The simultaneous separate events is easily recognized by the lexical (Zhao ) attached to the first verb. Also, we use a simple heuristic to detect the circumstantial separate events with the lexical pattern information.</Paragraph>
      <Paragraph position="1"> The resource used in this stage is a Chinese thesaurus called Tongyi-ci-cilin (Mei, 1983).</Paragraph>
      <Paragraph position="2"> With the thesaurus the remainders of separate event SVCs are processed with great care. If V2 is related to the interrupt concept then the transitional separate events are assigned. The most difficult and frequently occurring cases are the restrictive separate events and quasi-coordinative separate event.</Paragraph>
      <Paragraph position="3"> The key idea of using the thesaurus is based on the observation that the verb V2, if restricted by V1 makes it possible that the concept of V2 will also be restricted by the concept of V1. To complete the solution, we first define the relations: RSTV, RSTL and RSTM as follows: Definition 2 We define the relations: RSTV, RSTL, and RSTM, as follows: RSTV= {(V1,V2) where V1 and V2 are the first verb and second verb in a given SVC sentence and V2 is semantically restricted by</Paragraph>
      <Paragraph position="5"> the low level concept of the first verb and the low level concept  of second verb in the Chinese thesaurus, respectively, and CL2 is semantically restricted by CL1 : (CL1,CL2) [?]</Paragraph>
      <Paragraph position="7"> are the middle level concept of the first verb and the middle level concept of second verb in the Chinese thesaurus, respectively, and ML2 is semantically restricted by ML1 : (ML1,ML2) [?] (ML2, ML1)} The relations RSTV, RSTL, and RSTM are not symmetric and not reflexive. Based on the definition we derive the following heuristics: if (V1,V2)[?] RSRV then (CL1,CL2) [?]RSTL</Paragraph>
      <Paragraph position="9"> The thesaurus consists of three levels of hierarchy. For example, H, Hj, and Hj20 correspond to the one of highest concept, the next narrow term called middle-level concept and the narrowest term called low-level concept, respectively.</Paragraph>
      <Paragraph position="10"> All three examples from the top of table 5 satisfy the condition that, if (V1, V2) RSTV</Paragraph>
      <Paragraph position="12"> RSTM. If the condition is always true, then we use the middle-level concept relation for detecting a restrictive separate event in order to increase the applicability of our rules. Also, the data structure of RSTM is easily represented with an adjacent matrix with the size of 21*21  (Sahni, 1998) where the matrix M is a square matrix, whose column and row are the middle-level concept, and if M(i,j) = 1 then concept j is semantically restricted by concept i, otherwise (i,j) [?]RSTM.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="4" end_page="6" type="metho">
    <SectionTitle>
RSTV RSTL RSTM
</SectionTitle>
    <Paragraph position="0"> However, the last example reveals that the condition is not always true since we have the result, both (Hi,Hj) and (Hj,Hi) [?] RSTM.</Paragraph>
    <Paragraph position="1"> Thus, it violates the definition of RSTM. Hence, we may not directly use the middle-level concept adjacent matrix and the size of the low-level concept matrix is too large to be used.</Paragraph>
    <Paragraph position="2">  We come up with a solution of a frame with multi level concepts. The frame consists of three parts: the middle-level concept adjacent matrix, the low-level concept adjacent lists and the collocation serial verb list for detecting a serial verb that always appears together.</Paragraph>
    <Paragraph position="3"> Our solution is that the exceptional cases are covered by either the collocation verb lists or the low-level concept adjacent list. The remaining frequently occurring cases are captured by the middle-level adjacent matrix. This leads to the sparse matrix of the low-level concept which  The number of verbs related middle-level concept in the Chinese thesaurus is 21.</Paragraph>
    <Paragraph position="4">  The number of verbs related low-level concepts in the Chinese thesaurus is about 500.</Paragraph>
    <Paragraph position="5"> causes the adaptation of adjacent lists rather than an adjacent matrix for the low-level concepts. The order of searching the frame is the collocation list, the low-level concept list and the middle-level concept matrix. In the collocation list, if V1 and V2 belongs to the collocation list of the restrictive separate events, such as Zhuo Na Gui An or the one of quasi-coordinative, such as Li An Zhen Cha then the sentence is assigned to a restrictive case or a quasi-coordinative case, respectively. In the low-level concept lists and the middle-level concept matrix, if matching succeeds, which means that V2 is semantically restricted by V1, then a restrictive case is assigned; otherwise, a quasi-coordinate case is detected  . The detailed process for identifying the subcategories of separate events is shown in figure 4.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML