File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-1115_evalu.xml
Size: 4,781 bytes
Last Modified: 2025-10-06 13:59:45
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1115"> <Title>Using String-Kernels for Learning Semantic Parsers</Title> <Section position="8" start_page="918" end_page="919" type="evalu"> <SectionTitle> 4.2 Results </SectionTitle> <Paragraph position="0"> Figure 6 shows the results on the CLANG corpus. KRISP performs better than either version of SILT and performs comparable to WASP. Although SCISSOR gives less precision at lower recall values, it gives much higher maximum recall.</Paragraph> <Paragraph position="1"> However, we note that SCISSOR requires more supervision for the training corpus in the form of semantically annotated syntactic parse trees for the training sentences. CHILL could not be run beyond 160 training examples because its Prolog implementationrunsoutofmemory. For160training examples it gave 49.2% precision with 12.67% re- null corpus. KRISP achieves higher precisions than WASP on this corpus. Overall, the results show that KRISP performs better than deterministic rule-based semantic parsers like CHILL and SILT and performs comparable to other statistical semantic parsers like WASP and SCISSOR.</Paragraph> <Section position="1" start_page="918" end_page="918" type="sub_section"> <SectionTitle> 4.3 Experiments with Other Natural </SectionTitle> <Paragraph position="0"> for different natural languages.</Paragraph> <Paragraph position="1"> three other natural languages: Spanish, Turkish and Japanese. Since KRISP's learning algorithm does not use any natural language specific knowledge, it is directly applicable to other natural languages. Figure 8 shows that KRISP performs competently on other languages as well.</Paragraph> </Section> <Section position="2" start_page="918" end_page="919" type="sub_section"> <SectionTitle> 4.4 Experiments with Noisy NL Sentences </SectionTitle> <Paragraph position="0"> Any real world application in which semantic parserswouldbeusedtointerpretnaturallanguage of a user is likely to face noise in the input. If the userisinteractingthroughspontaneousspeechand the input to the semantic parser is coming form the output of a speech recognition system then there are many ways in which noise could creep in the NL sentences: interjections (like um's and ah's), environment noise (like door slams, phone rings etc.), out-of-domain words, grammatically ill-formed utterances etc. (Zue and Glass, 2000).</Paragraph> <Paragraph position="1"> As opposed to the other systems, KRISP's stringkernel-based semantic parsing does not use hardmatching rules and should be thus more flexible and robust to noise. We tested this hypothesis by running experiments on data which was artificially corrupted with simulated speech recognition errors. null The interjections, environment noise etc. are likely to be recognized as real words by a speech recognizer. To simulate this, after every word in a sentence, with some probability Padd, an extra word is added which is chosen with probability proportional to its word frequency found in the British National Corpus (BNC), a good representative sample of English. A speech recognizer may sometimes completely fail to detect words, so with a probability of Pdrop a word is sometimes dropped. A speech recognizer could also introduce noise by confusing a word with a high frequency phonetically close word. We sim- null creasing amounts of noise in the test sentences.</Paragraph> <Paragraph position="2"> ulate this type of noise by substituting a word in the corpus by another word, w, with probability ped(w)[?]P(w), wherepisaparameter,ed(w)isw's editdistance(Levenshtein, 1966)fromtheoriginal word and P(w) is w's probability proportional to its word frequency. The edit distance which calculates closeness between words is character-based rather than based on phonetics, but this should not make a significant difference in the experimental results.</Paragraph> <Paragraph position="3"> Figure 9 shows the results on the CLANG corpus with increasing amounts of noise, from level 0 to level 4. The noise level 0 corresponds to no noise. The noise parameters, Padd andPdrop, were varied uniformly from being 0 at level 0 and 0.1 at level 4, and the parameter p was varied uniformly from being 0 at level 0 and 0.01 at level 4. We are showing the best F-measure (harmonic mean of precision and recall) for each system at different noise levels. As can be seen, KRISP's performancedegradesgracefullyinthepresenceofnoise null while other systems' performance degrade much faster, thus verifying our hypothesis. In this experiment, only the test sentences were corrupted, we get qualitatively similar results when both training and test sentences are corrupted. The results are also similar on the GEOQUERY corpus.</Paragraph> </Section> </Section> class="xml-element"></Paper>