File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4008_intro.xml

Size: 8,728 bytes

Last Modified: 2025-10-06 14:02:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4008">
  <Title>Automatic Construction of an English-Chinese Bilingual FrameNet</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 FrameNet and HowNet
</SectionTitle>
    <Paragraph position="0"> FrameNet and HowNet are ontologies with different structures and different semantic role/relation definitions. FrameNet is a collection of lexical entries grouped by frame semantics. Each lexical entry represents an individual word sense, and is associated with semantic roles and some annotated sentences. Lexical entries with the same semantic roles are grouped into a &amp;quot;frame&amp;quot; and the semantic roles are called &amp;quot;frame elements&amp;quot;. For example: Frame: Cause_harm Frame Elements: agent, body_part, cause, event, instrument, iterations, purpose, reason, result, victim..... Lexical Entries in &amp;quot;cause_harm&amp;quot; Frame: bash.v, batter.v, bayonet.v, beat.v, belt.v, bludgeon.v, boil.v, break.v, bruise.v, buffet.v, burn.v,....</Paragraph>
    <Paragraph position="1"> An annotated sentence of lexical entry &amp;quot;beat.v&amp;quot;: [agent I] lay down on him and beat [victim at him] [means with my fists].</Paragraph>
    <Paragraph position="2"> HowNet is a Chinese ontology with a graph structure of word senses called &amp;quot;concepts&amp;quot;, and each concept contains 7 fields including lexical entries in Chinese, English gloss, POS tags for the word in Chinese and English, and a definition of the concept including its category and semantic relations (Dong and Dong, 2000). For example, one translation for &amp;quot;beat.v&amp;quot; is Da :  (Dorr et al. 2002) uses a manual seed mapping of semantic roles between FrameNet and LVD. In this paper, we propose a method of automatically linking the English FrameNet lexical entries to HowNet concepts, resulting in a bilingual FrameNet. We make use of two bilingual English-Chinese lexicons, as well as HowNet and FrameNet. In the following sections 3.1 to 3.3, we use an example FrameNet lexical entry &amp;quot;beat.v&amp;quot; in the &amp;quot;cause_harm&amp;quot; frame to illustrate the main steps of our algorithm in Figure 1.</Paragraph>
    <Paragraph position="3"> For each lexical entry l in FrameNet Find translations T1 of l in HowNet translations.</Paragraph>
    <Paragraph position="4"> Find translations T2 of l in LDC dictionary.</Paragraph>
    <Paragraph position="5"> Combine the T1 and T2 together as T. T= T1[?] T2 Link l to all HowNet concepts LC whose W_C field is in T. LC= {c|c.W_C [?] T}, c is any HowNet concept.</Paragraph>
    <Paragraph position="6"> For each frame F in FrameNet Group all the HowNet concepts together FC which are linked to the lexical entries in F. FC= {c| link(c,l)=true and l [?] F }.</Paragraph>
    <Paragraph position="7"> Compute the frequency of HowNet categories in FC.</Paragraph>
    <Paragraph position="8"> Select the top 3 HowNet categories as valid categories VA for frame F.</Paragraph>
    <Paragraph position="9"> For each HowNet categories a If the similarity score between a and one of the top 3 categories is greater than threshold t. Sim(a, ta) &gt; t, ta is any of the top 3 categories.</Paragraph>
    <Paragraph position="10"> Add a into VA. VA = VA[?]{ a}.</Paragraph>
    <Paragraph position="11"> For each lexical entry l in frame F For each HowNet concept c linked to l If the categories of c is not in VA prune this link.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Baseline mapping based on bilingual lexicon
</SectionTitle>
      <Paragraph position="0"> We use the bilingual lexicon from HowNet and LDC dictionary to first create all possible mappings between FrameNet lexical entries and HowNet concepts whose part-of-speech (POS) tags are the same. Here we assume that syntactic classification for the majority of FrameNet lexical entries (i.e. verbs and adjectives) are semantically motivated and are mostly preserved across different languages. For example &amp;quot;beat&amp;quot; can be translated into {Chui , Bai , Chong Ji , Chu Shou , Nan Dao , Pian Qu , Ying , Zhan Bai ... } in HowNet and {Da , Da Bai , Dao , Qiao Da , Ying ... } in the LDC English-Chinese dictionary. &amp;quot;beat.v&amp;quot; is then linked to all HowNet concepts whose Chinese word/phrase is one of the translations and the part of speech is verb &amp;quot;v&amp;quot;. Figure 2 shows some examples of HowNet concepts that are linked to &amp;quot;beat.v&amp;quot;.</Paragraph>
      <Paragraph position="1"> Figure 2. Partial initial alignment of &amp;quot;beat.v&amp;quot; to HowNet concepts with 144 candidate links</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Disambiguation by semantic contexts in both
languages
</SectionTitle>
      <Paragraph position="0"> At this stage, each FrameNet lexical entry has links to multiple HowNet concepts and categories. For example, &amp;quot;beat.v&amp;quot; in &amp;quot;cause_harm&amp;quot; frame is linked to &amp;quot;Da &amp;quot; in both the &amp;quot;beat&amp;quot; category and the &amp;quot;associate&amp;quot; category (as in&amp;quot;Da Dian Hua /make a phone call&amp;quot;). We need to choose the correct HowNet concept (word sense). Many word sense disambiguation algorithms use contextual words in a sentence as disambiguating features. In this work, we make use of contextual lexical entries from the same semantic frame, as illustrated below: To disambiguate between the above two candidate categories, we make use of the other lexical entries in &amp;quot;cause_harm&amp;quot;, such as &amp;quot;Chui &amp;quot;, and their linked categories in HowNet, such as &amp;quot;beat&amp;quot; again. Each target HowNet category receives a vote from the candidate links. In our example, &amp;quot;beat&amp;quot; receives two votes (from &amp;quot;Da &amp;quot; and from &amp;quot;Chui &amp;quot;), and &amp;quot;associate&amp;quot; only one (from &amp;quot;Da &amp;quot;). We choose the HowNet category with the most votes and its constituent concepts to be the valid word sense links to the source FrameNet lexical entry. Consequently, &amp;quot;beat.v&amp;quot; in &amp;quot;cause_harm&amp;quot; is linked to all HowNet concepts that are translations of &amp;quot;beat&amp;quot; which are verbs, and which also belong to the HowNet category &amp;quot;beat&amp;quot; (vs. &amp;quot;associ- null In our example, Figure 3 shows the top 14 examples of HowNet concepts belonging to two HowNet categories--&amp;quot;beat&amp;quot; and &amp;quot;damage&amp;quot; that are linked to the &amp;quot;cause_harm&amp;quot; frame in FrameNet. Only the concepts in the top N categories are considered as correctly linked to the lexical entries in the &amp;quot;cause_harm&amp;quot; frame. We heuristically chose N to be three in our algorithm.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Compensating links by HowNet taxonomy struc-
</SectionTitle>
      <Paragraph position="0"> ture Using frame context alone in the above step can effectively prune out incorrect links, but it also prunes some correct links whose HowNet categories are not in the top three categories but are similar to them. In this next step, we aim to recover this kind of pruned links. We introduce the category similarity score, which is based on the HowNet taxonomy distance (Liu and Li, 2002):</Paragraph>
      <Paragraph position="2"> Where d is the path length from category1 to category2 in the taxonomy. a is an adjusting parameter, which controls the curvature of the similarity score. We set a= 1.6 in our work following the experiment results in (Liu and Li, 2002). If the similarity of category p and one of the top three categories is higher than a threshold t, the category p is also considered as a valid category for the frame.</Paragraph>
      <Paragraph position="3"> In our example, some valid categories, such as &amp;quot;firing| She Ji &amp;quot; is not selected in the previous step even though it is related to the &amp;quot;cause_harm&amp;quot; frame. Based on the HowNet taxonomy, the similarity score between &amp;quot;firing| She Ji &amp;quot; and &amp;quot;beat|Da &amp;quot; is 1.0, which we consider as high. Hence, &amp;quot;firing|She Ji &amp;quot; is also chosen as a valid category and the concepts in this category are linked to the &amp;quot;beat.v&amp;quot; lexical entry in the &amp;quot;cause_harm&amp;quot; frame. However, using taxonomy distance can cause erros such as Da in the &amp;quot;weave&amp;quot; category to be aligned to &amp;quot;beat.v&amp;quot; in</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML