File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-1106_concl.xml
Size: 3,706 bytes
Last Modified: 2025-10-06 13:54:14
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1106"> <Title>Character-Sense Association and Compounding Template Similarity: Automatic Semantic Classification of Chinese Compounds</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 5. Conclusions and Further Remarks </SectionTitle> <Paragraph position="0"> In this paper I have proposed a character-based model of sense determination for Chinese 10 VC (transitive action/activity) and VA (intransitive action/activity) are the two most dominant types of two-character verbs in the corpus, occupying respectively 44% and 27% around. Here the statistics does not include the VH (intransitive state) verbs, because they generally correspond to the adjectives in English, and in deed they are categorized as adjective in HowNet.</Paragraph> <Paragraph position="1"> compounds using compounding template similarity.</Paragraph> <Paragraph position="2"> Based on this model, a system of deep semantic classification for V-V compounds is implemented, which classifies compounds according to the taxonomy of CILIN to its deep-level (level-3 and level-4) classes. The evaluation experiment reports a fairly satisfactory precision rate of the first ranked predicted semantic class (about 38% in outside test and 61% in inside test) against the baseline one (about 18%). The results also show a high inclusion rate of correct answer in the top3 ranked classes, which suggests that in the future the present non-contextual system can cooperate with a WSD module using context information. Though the model is only tested on a partial system for V-V compounds, it can be extended to work for general compounds, like V-N and N-N, with the association network further established for N characters.</Paragraph> <Paragraph position="3"> The model proposed in this paper has the following advantages: (1) It proposes a similarity measure of compounding template to retrieve potential synonyms for sense approximation, which avoids the inherent difficulty of head determination in a head-oriented approach and is thus capable of handling exocentric compounds. (2) It establishes a network of character-sense association, which allows the discovery of latent senses of characters, latent synonymy, and latent polysemy, thus remedying the incompleteness effect of the MRD in use. (3) It can carry out deep semantic classification, not just shallow classification assigning general and vague categories. (4) It requires only a simple format of idealized dictionary, which facilitates the conversion from a general MRD and allows an easy enhancement of the system by adding a new MRD.</Paragraph> <Paragraph position="4"> However, as can be remarked in the discussion of classification errors, the performance of the model relies much on the productivity of compounding semantic templates of the target compounds. To correctly predict the semantic class of a compound with an unproductive semantic template is no doubt very difficult due to a sparse existence of the T-similar compounds. How to remedy such an effect is thus a challenging task in the future. In addition, how to generalize the present character-based model to make it applicable to compounds with multi-character component morphemes will be another essential task to undertake. Besides, a task of automatic lexical translation for Chinese unknown compounds will also be carried out in the future. The task can be executed under the very same structure of the present model, since the only difference will be the change of working dicox (from dico2 to dico1) in the Module-B. A pilot experiment has already shown encouraging results.</Paragraph> </Section> class="xml-element"></Paper>