File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/c04-1104_abstr.xml
Size: 3,939 bytes
Last Modified: 2025-10-06 13:43:17
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1104"> <Title>Subcategorization Acquisition and Evaluation for Chinese Verbs</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper describes the technology and an experiment of subcategorization acquisition for Chinese verbs. The SCF hypotheses are generated by means of linguistic heuristic information and filtered via statistical methods. Evaluation on the acquisition of 20 multi-pattern verbs shows that our experiment achieved the similar precision and recall with former researches. Besides, simple application of the acquired lexicon to a PCFG parser indicates great potentialities of subcategorization information in the fields of NLP.</Paragraph> <Paragraph position="1"> Credits This research is sponsored by National Natural Science Foundation (Grant No. 60373101 and 603750 19), and High-Tech Research and Development Program (Grant No. 2002AA117010-09).</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Introduction </SectionTitle> <Paragraph position="0"> Since (Brent 1991) there have been a considerable amount of researches focusing on verb lexicons with respective subcategorization information specified both in the field of traditional linguistics and that of computational linguistics. As for the former, subcategory theories illustrating the syntactic behaviors of verbal predicates are now much more systemically improved, e.g.</Paragraph> <Paragraph position="1"> (Korhonen 2001). And for auto-acquisition and relevant application, researchers have made great achievements not only in English, e.g. (Briscoe and Carroll 1997), (Korhonen 2003), but also in many other languages, such as Germany (Schulte im Walde 2002), Czech (Sarkar and Zeman 2000), and Portuguese (Gamallo et. al 2002).</Paragraph> <Paragraph position="2"> However, relevant theoretical researches on Chinese verbs are generally limited to case grammar, valency, some semantic computation theories, and a few papers on manual acquisition or prescriptive designment of syntactic patterns.</Paragraph> <Paragraph position="3"> Due to irrelevant initial motivations, syntactic and semantic generalizabilities of the consequent outputs are not in such a harmony that satisfies the description granularity for SCF (Han and Zhao 2004). The only auto-acquisition work for Chinese SCF made by (Han and Zhao 2004) describes the predefinition of 152 general frames for all verbs in Chinese, but that experiment is not based on real corpus. After observing and analyzing quantity of subcategory phenomena in real Chinese corpus in the People's Daily (Jan.~June, 1998), we removed from Han & Zhao's predefinition 15 SCFs that are actually similar derivants of others, and then with this foundation and linguistic rules from (Zhao 2002) as heuristic information we generated SCF hypotheses from the corpus of People's Daily (Jan.~June, 1998), and statistically filtered the hypotheses into a Chinese verb SCF lexicon. As far as we know, this is the first attempt of Chinese SCF auto-acquisition based on real corpus.</Paragraph> <Paragraph position="4"> In the rest of this paper, the second section describes a comprehensive system that builds verb SCF lexicons from large real corpus, the respective operating principles, and the knowledge coded in our SCF. The third section analyzed the acquired lexicon with two experiments: one evaluated the acquisition results of 20 verbs with multi syntactic patterns against manual gold standard; the other checked the performance of the lexicon when applied in a PCFG parser. The forth section compares and contrasts this research with related works done by others. And at last, Section 5 concludes our present achievements, disadvantages and possible future focuses.</Paragraph> </Section> </Section> class="xml-element"></Paper>