XML Viewer - w04-1116

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-1116_metho.xml
Size: 12,183 bytes
Last Modified: 2025-10-06 14:09:11
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1116">
  <Title>Automatic Semantic Role Assignment for a Tree Structure</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. Example-based Probabilistic Models for
Assigning Semantic Roles
</SectionTitle>
    <Paragraph position="0"> The idea of example-based approaches is that semantic roles are preserved for the same event frames. For a target sentence, if we can find same examples in the training corpus, we can assign the same semantic role for each constituent of the target sentence as the examples. However reoccurrence of exact same surface structures for a sentence is very rare, i.e. the probability of finding same example sentences in a corpus is very low. In fact, by observing structures of parsed trees, we find that most of semantic roles are uniquely determined by semantic relations between phrasal heads and their arguments/modifiers and semantic relations are determined by syntactic category, semantic class of related words. For example: Original sentence: Wo Men 'wo men' Du 'du' Xi Huan 'xi huan' Hu Die 'hu die'.</Paragraph>
    <Paragraph position="1"> We all like butterflies.</Paragraph>
    <Paragraph position="2">  In Figure2, Xi Huan 'like' is the sentential head; Wo Men 'we' and Hu Die 'butterflies' are the arguments; Du 'all' is the modifier. As a result, the semantic role 'experiencer' of Wo Men 'we' is deduced from the relation between Wo Men 'we' andXi Huan 'like', since the event frame ofXi Huan 'like' has the two arguments of experiencer and goal and the experiencer usually takes the subject position. The semantic roles of Hu Die 'butterflies' and Du 'all' are assigned by the same way. For the task of automatic role assignment, once phrase boundaries and phrasal head are known, the semantic relations will be resolved by looking for similar head-argument/modifier pairs in training data.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Example Exaction
</SectionTitle>
      <Paragraph position="0"> To extract head-argument/modifier examples from the Sinica Treebank is trivial, since phrase boundaries and semantic roles, including phrasal head, are labeled. The extracted examples are pairs of head word and target word. The target word is represented by the head of the argument/modifier, since the semantic relations are established between the phrasal head and the head of argument/modifier phrase. An extracted word pair includes the following features.</Paragraph>
      <Paragraph position="1"> Target word: The head word of argument/modifier.</Paragraph>
      <Paragraph position="2"> Target POS: The part-of-speech of the target word.</Paragraph>
      <Paragraph position="3"> Target semantic role: Semantic role of the constituent contains the target word as phrasal head.</Paragraph>
      <Paragraph position="4"> Head word: The phrasal head.</Paragraph>
      <Paragraph position="5"> Head POS: The part-of-speech of the head word.</Paragraph>
      <Paragraph position="6"> Phrase type: The phrase which contains the head word and the constituent containing target word. Position: Shows whether target word appears before or after head word.</Paragraph>
      <Paragraph position="7"> The examples we extracted from Figure 2 are listed below.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Probabilistic Model for Semantic Role
Assignment
</SectionTitle>
      <Paragraph position="0"> It is possible that conflicting examples (or ambiguous role assignments) occur in the training data. We like to assign the most probable roles. The probability of each semantic role in a constituent with different features combinations are estimated from extract examples.</Paragraph>
      <Paragraph position="2"> Due to the sparseness of the training data, it's not possible to have example feature combinations matched all input cases. Therefore the similar examples will be matched. A back off process will be carried out to reduce feature constraints during the example matching. We will evaluate performances for various features combinations to see which features combinations are best suited for semantic roles assignments.</Paragraph>
      <Paragraph position="3"> We choose four different feature combinations.</Paragraph>
      <Paragraph position="4"> Each has relatively high accuracy. The four classifiers will be back off in sequence. If none of the four classifiers is applicable, a baseline model of assigning the most common semantic role of target word is applied.</Paragraph>
      <Paragraph position="5"> if # of (h,h_pos,t,t_pos,pt,position) &gt; threshold</Paragraph>
      <Paragraph position="7"> h_pos: part-of-speech of head word; t: the target word; t_pos: part-of-speech of target word; pt: the phrase type.</Paragraph>
      <Paragraph position="8"> if # of (h,h_pos,t_pos,pt,position) &gt; threshold</Paragraph>
      <Paragraph position="10"/>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. Experiments
</SectionTitle>
    <Paragraph position="0"> We adopt the Sinica Treebank as both training and testing data. It contains about 40,000 parsed sentences. We use 35,000 sentences as training data and the rest 5,000 as testing data. The table 2 shows the coverage of each classifier, their accuracies, and performance of each individual classifier without back off process. The table 3 shows combined performance of the four classifiers after back off processes in sequence. The baseline algorithm is the simple unigram approach to assign the most common role for the target word. Because the accuracy of the four classifiers is considerably high, instead of using linear probability combinations we will rather use the most reliable classifier for each different features combination.</Paragraph>
    <Paragraph position="1">  the base line (the most common semantic roles)</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Error Analyses
</SectionTitle>
      <Paragraph position="0"> Although the accuracy of back off model is relatively high to the baseline model, it still has quite a room for improvement. After analyzed the errors, we draw following conclusions.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Method Accuracy
Backoff 90.29%
</SectionTitle>
      <Paragraph position="0"> Baseline: 68.68% a) Semantic head vs. syntactic head A semantic role for a prepositional phrase (PP) is mainly determined by the syntactic head of PP, i.e. preposition, and the semantic head of PP, i.e. the head word of the DUMMY-argument of PP. For example, in Figure 3, the two sentences are almost the same, only the contents of PP are different. Obviously, the semantic roles of PP (Zai 'in' Yin Ni 'Indonesia') is location, and the semantic role of PP (Zai 'in' Jin Nian 'this year') is time. Therefore the semantic roles of the two PPs should be determined only within the scope of PP and not relevant to matrix verb.</Paragraph>
      <Paragraph position="1">  Complex structures are always the hardest part of semantic roles assignments. For example, the sentences with passive voice are the typical complex structures. In Figure 4, the semantic role of Hu Die 'Butterflies' is not solely determined by the head verb Xi Yin 'attracted' and itself. Instead we should inspect the existence of passive voice and then reverse the roles of subject and object.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Refined Models
</SectionTitle>
    <Paragraph position="0"> Chen &amp; Huang (1996) had studied the task of semantic assignment during Chinese sentence parsing. They concluded that semantic roles are determined by the following 4 parameters.</Paragraph>
    <Paragraph position="1">  1. Syntactic and semantic categories of the target word, 2. Case markers, i.e. prepositions and postpositions 3. Phrasal head, and 4. Sub-categorization frame and its syntactic  patterns.</Paragraph>
    <Paragraph position="2"> Therefore head-modifier/argument examples only resolve most of semantic role assignments. Some of complex cases need other parameters to determine their semantic roles. For instance, the argument roles of Bei sentences (passive sentence) should be determined by all four parameters. The refined model contain two parts, one is the refinements of features data which provide more precisely information and the other is the improvements of back off process to deal with special semantic roles assignments.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Refinement of Features Extractions
</SectionTitle>
      <Paragraph position="0"> The refinements of features extractions focus on two different cases, one is the features extractions of case-marked structures, such as PP and GP (postpositional phrases), and the other is the general semantic class identifications of synonyms.</Paragraph>
      <Paragraph position="1"> The features of PP/GP include two different  feature types: the internal and the external features. The internal features of phrases compose of phrasal head and Dummy-head; the external features are heads (main verbs) of the target phrases.</Paragraph>
      <Paragraph position="2">  The semantic class identifications of synonyms are crucial for solving data sparseness problems. Some type of words are very productive, such as numbers, DM (determinative measurement), proper names. They need to be classified into different semantic classes. We use some tricks to classify them into specific word classes. For example we label 1Gong Jin 'one kilogram', 2Gong Jin 'two kilograms' as their canonical form Mou Gong Jin 'n kilograms'; Di [?] Tian 'the first day', Di Er Tian 'the second day ' as Di Mou Tian 'the nth days'; Zhang San 'Zhang San', Li Si 'Li Si' as a personal name...etc. With this method, we can increase the number of matched examples and resolve the problem of occurrences of unknown words in a large scale.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Dependency Decisions and Refined Back
off Processes
</SectionTitle>
      <Paragraph position="0"> The refined back off model aimed to solve semantic roles assignments for certain special structures. Using only head-modifier features could result into decision making with insufficient information. As illustrated before, the semantic role of Hu Die 'butterflies' in Figure 4 is 'agent' observed from the head-argument feature. But in fact the feature of passive voice Bei 'passive' tells us that the subject role of Hu Die 'butterflies' should be the semantic role 'goal' instead of the usual role of 'agent'.</Paragraph>
      <Paragraph position="1"> Therefore we enhanced our back off process by adding some dependency decisions. The dependency conditions include special grammar usage like passive form, quotation, topical sentences... etc. In the refined back off process, first we have to detect which dependency condition is happened and resolved it by using dependency features. For example, if the feature word Bei 'passive' occurs in a sentence, we realize that the subjective priority of semantic roles should be reversed. For instance, 'goal' will take subject position instead of 'agent' ('goal' appears before 'agent').</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Experiment Results
</SectionTitle>
      <Paragraph position="0"> The experiments were carried out for the refined back off model with the same set of training data and testing data as in the previous experiments.</Paragraph>
      <Paragraph position="1"> Table 5 shows that the refined back off model gains 2.4 % accuracy rate than the original back off model. However most of the improvement is due to the refinements of features extractions and canonical representation for certain classes of words. A few improvements were contributed to the decision making on the cases of structure dependency.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML