File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-2024_evalu.xml
Size: 4,346 bytes
Last Modified: 2025-10-06 13:59:27
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-2024"> <Title>Corpus-Oriented Development of Japanese HPSG Parsers</Title> <Section position="7" start_page="142" end_page="143" type="evalu"> <SectionTitle> 6 Experiments </SectionTitle> <Paragraph position="0"> Because the aim of our research is to construct a Japanese parser that can extract semantic information from real-world texts, we evaluated our parser in terms of its coverage and semantic-role identification accuracy. We also compare the accuracy of our parser with that of an existing statistical dependency analyzer, in order to investigate the necessity of further improvements to our disambiguation model.</Paragraph> <Paragraph position="1"> The following experiments were conducted using the EDR Japanese corpus. An HPSG grammar was extracted from 519516 sentences of the corpus, and the same set of sentences were used as a training set for the disambiguation model. 47767 sentences (91.9%) of the training set were successfully converted into an HPSG treebank, from which we extracted lexical entries.</Paragraph> <Paragraph position="2"> When we construct a lexicon from the extracted lexical entries, we reserved lexical entry templates for infrequent words as default templates for unknown words of each POS, in order to achieve sufficient coverage. The threshold for 'infrequent' words 6We could not use the entire corpus for the experiments, because of the limitation of computational resources.</Paragraph> <Paragraph position="3"> were determined to be 30 from the results of preliminary experiments.</Paragraph> <Paragraph position="4"> We used 2079 EDR sentences as a test set. (Another set of 2078 sentences were used as a development set.) The test set is also converted into an HPSG treebank, and the conversion was successful for 1913 sentences. (We will call the obtained HPSG treebank the &quot;test treebank.&quot;) As features of the log-linear model, we extracted the POS of the head, template name of the head, surface string and its ending of the head, punctuation contained in the phrase, and distance between heads of daughters, from each sign in derivation trees. These features are used in combinations.</Paragraph> <Paragraph position="5"> The coverage of the parser7 on the test set was 95.3% (1982/2079). Though it is still below the coverage achieved by SLUNG (Mitsuishi et al., 1998), our grammar has richer information that enables semantic analysis, which is lacking in SLUNG.</Paragraph> <Paragraph position="6"> We evaluated the parser in terms of its accuracy in identifying semantic roles of arguments of verbs.</Paragraph> <Paragraph position="7"> For each phrase which is in complement-head relation with some VP, a semantic role is assigned according to the type8 of the complement-head structure. The performance of our parser on the test tree-bank was 63.8%/57.8% in precision/recall of semantic roles.</Paragraph> <Paragraph position="8"> As most studies on syntactic parsing of Japanese have focused on bunsetsu-based dependency analysis, we also attempted an evaluation in this framework.9 In order to evaluate our parser by bunsetsu dependency, we converted the phrase structures of EDR and the output of our parser into dependency structures of the right-most content word of each bunsetsu. Bunsetsu boundaries of the EDR sentences were determined by using simple heuristic rules. The dependency accuracies and the sentential accuracies of our parser and Kanayama et. al.'s analyzer are shown in Table 2. (failure sentences are not counted for calculating accuracies.) Our results were still significantly lower than those of Kanayama et. al., which are the best reported dependency accuracies on EDR.</Paragraph> <Paragraph position="9"> This experiment revealed that the accuracy of our parser requires further improvement, although our grammar achieved high coverage. Our expectation is that incorporating grammar rules for complex structures which is ignored in the current implementation (e.g. control, relative clause, and coordination constructions) will improve the accuracy of the parser. In addition, we should investigate whether the semantic analysis our parser provides can contribute the performance of more application-oriented tasks such as information extraction.</Paragraph> </Section> class="xml-element"></Paper>