File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/p03-2033_evalu.xml
Size: 5,306 bytes
Last Modified: 2025-10-06 13:58:57
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-2033"> <Title>A Debug Tool for Practical Grammar Development</Title> <Section position="4" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Experiments and Discussion </SectionTitle> <Paragraph position="0"> We have applied willex to rental-XTAG, an HPSG-style grammar converted from the XTAG English grammar (The XTAG Research Group, 2001) by a grammar conversion (Yoshinaga and Miyao, 2001).1 The corpus used is MEDLINE abstracts with tags based on a slightly modified version of GDA-DTD2 (Hasida, 2003). The corpus is &quot;partially parsed&quot;; the attachments of prepositional phrases are annotated manually.</Paragraph> <Paragraph position="1"> The tags do not always specify the correct structures based on rental-XTAG (i.e., the grammar assumed by tags is different from rental-XTAG), so we prepared a POS/label conversion table. We can use tagged corpora based on various grammars different from the grammar that the parser is assuming by using POS/label conversion tables.</Paragraph> <Paragraph position="2"> We investigated 208 sentences (average 24.2 words) from 26 abstracts. 73 sentences were parsed successfully and got correct results. Thus the coverage was 35.1%.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Qualitative Evaluation </SectionTitle> <Paragraph position="0"> Willex received three major positive feedbacks from a user; first, the function of restricting partial results was helpful, as it allows human debuggers to check fewer results, second, the function to delete incorrect partial results manually was useful, because there are some cases that tags do not specify POSs/labels, and third, human debuggers could use the recording function to make notes to analyze them carefully later.</Paragraph> <Paragraph position="1"> However, willex also received some negative evaluations; the process of locating the cause of parsing failure in a sentence was found to be a bit troublesome. Also, willex loses its accuracy if the human debuggers themselves have trouble understanding the correct syntactical structure of a sentence.3 1Since XTAG and rental-XTAG generate equivalent parse results for the same input, debugging rental-XTAG means debugging XTAG itself.</Paragraph> <Paragraph position="2"> 2GDA has no tags which specify prepositional phrases, so we add <prep> and <prepp>.</Paragraph> <Paragraph position="3"> 3Thus, we divided the process of identifying grammar defects to two steps. First, a non-expert roughly classifies parsing errors and records temporary memorandums. Then, the non-expert shows typical examples of sentences in each class to experts and identifies grammar defects based on experts' inference. Here, we can make use of the recording function of We found from these evaluations that the functions of willex can be used effectively, though more automation is needed.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Quantitative Evaluation </SectionTitle> <Paragraph position="0"> Figure 3 shows the decrease in partial parsing trees caused by using the tagged corpus. (Data of 10 sentences among the 208 sentences are shown.) The graph shows that human workload was reduced by using the tagged corpus.</Paragraph> <Paragraph position="1"> number of partial results length of a sentence (number of words) without any info.</Paragraph> <Paragraph position="2"> with chunk info.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Defects of rental-XTAG </SectionTitle> <Paragraph position="0"> the defects of rental-XTAG # no lexical entry 62 cannot handle reduced relative 35 cannot handle V-V coordination 22 Adjective does not post-modify NP 9 cannot parse &quot;, but not&quot; 4 cannot handle objective to-infinitive 3 &quot;, which ...&quot; does not post-modify NP 3 cannot handle reduced as-relative clause 2 cannot parse &quot;greater than&quot;(&quot;>&quot;) 2 misc. 17 From this table, it is inferred that (1) lack of lexical entries, (2) inability to parse reduced relative and willex.</Paragraph> <Paragraph position="1"> (3) inability to parse coordinations of verbs are serious problems of rental-XTAG.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.4 Conflicts Between the Modified GDA and </SectionTitle> <Paragraph position="0"> rental-XTAG Conflicts between rental-XTAG and the grammar on which the modified GDA based cause parsing failures. Statistics of the conflicts is shown in Table 3.</Paragraph> <Paragraph position="2"> adjectival phrase verbal phrase 36 bracketing except &quot;,&quot; 10 bracketing of &quot;,&quot; 8 treatment of omitted words 2 misc. 5 These conflicts cannot be resolved by a simple POS/label conversion table. One resolution is inserting a preprocess module that deletes and moves tags which cause conflicts.</Paragraph> <Paragraph position="3"> We do not consider these conflicts as grammar defects but the difference of grammars to be absorbed in the conversion phase.</Paragraph> </Section> </Section> class="xml-element"></Paper>