File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1116_intro.xml
Size: 3,476 bytes
Last Modified: 2025-10-06 14:02:33
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1116"> <Title>Automatic Semantic Role Assignment for a Tree Structure</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> For natural language understanding, the process of fine-grain semantic role assignment is one of the prominent steps, which provides semantic relations between constituents. The sense and sense relations between constituents are core meaning of a sentence.</Paragraph> <Paragraph position="1"> Conventionally there are two kinds of methods for role assignments, one is using only statistical information (Gildea and Jurafsky, 2002) and the other is combining with grammar rules (Gildea and Hockenmaier, 2003). However using only grammar rules to assign semantic roles could lead to low coverage. On the other hand, performance of statistical methods relies on significant dependent features. Data driven is a suitable strategy for semantic roles assignments of general texts. We use the Sinica Treebank as information resource because of its various domains texts including politics, society, literature...etc and it is a Chinese Treebank with semantic role assigned for each constituent (Chen etc., 2003). It used 74 abstract semantic roles including thematic roles, such as 'agent'; 'theme', 'instrument', and secondary roles of 'location', 'time', 'manner' and modifiers of nouns, such as 'quantifier', 'predication', 'possessor', etc. The design of role assignment algorithm is based on the different decision features, such as head-argument/modifier, case makers, sentence structures etc. It labels semantic roles of parsed sentences by example-based probabilistic models.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.1 Sinica Treebank </SectionTitle> <Paragraph position="0"> The Sinica Treebank has been developed and released to public since 2000 by Chinese Knowledge Information Processing (CKIP) group at Academia Sinica. The Sinica Treebank version 2.0 contains 38944 structural trees and 240,979 words in Chinese. Each structural tree is annotated with words, part-of-speech of words, syntactic structure brackets, and semantic roles. For conventional structural trees, only syntactic information was annotated. However, it is very important and yet difficult for Chinese to identify word relations with purely syntactic constraints (Xia et al., 2000). Thus, partial semantic information, i.e. semantic role for each constituent, was annotated in Chinese structural trees. The grammatical constraints are expressed in terms of linear order of semantic roles and their syntactic and semantic restrictions. Below is an example sentence of the Sinica Treebank.</Paragraph> <Paragraph position="1"> Original sentence: Ta 'Ta'Yao 'yao' Zhang San 'ZhangSan'Jian 'jian' Qiu 'qiu'.</Paragraph> <Paragraph position="2"> He ask Zhang San to pick up the ball.</Paragraph> <Paragraph position="3"> In the Sinica Treebank, not only the semantic relations of a verbal predicate but also the modifier head relations were marked. There are 74 different semantic roles, i.e. the task of semantic role assignment has to establish the semantic relations among phrasal heads and their arguments/modifiers within 74 different choices. The set of semantic roles used in the Sinica Treebank is listed in the appendix.</Paragraph> </Section> </Section> class="xml-element"></Paper>