File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/c02-1047_evalu.xml
Size: 3,924 bytes
Last Modified: 2025-10-06 13:58:46
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1047"> <Title>Towards a Noise-Tolerant, Representation-Independent Mechanism for Argument Interpretationa0</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> Our evaluation consisted of an automated experiment where the system interpreted noisy versions of its own arguments. These arguments were generated from different sub-nets of its domain BN, and they were distorted at the BN level and at the NL level. At the BN level, we changed the beliefs in the nodes, and we inserted and deleted nodes and arcs. At the NL level, we distorted the wording of the propositions in the resultant arguments. All 2We are implementing a more principled model for sentence comparison which yields more accurate probabilities.</Paragraph> <Paragraph position="1"> these distortions were performed for BNs of different sizes (3, 5, 7 and 9 arcs). Our measure of performance is the edit-distance between the original BN used to generate an argument, and the BN produced as the interpretation of this argument. For instance, two BNs that differ by one arc have an edit-distance of 2 (one addition and one deletion), while a perfect match has an edit-distance of 0.</Paragraph> <Paragraph position="2"> Overall, our results were as follows. Our system produced an interpretation in 86% of the 5400 trials. In 75% of the 5400 cases, the generated interpretations had an edit-distance of 3 or less from the original BN, and in 50% of the cases, the interpretations matched perfectly the original BN. Figure 3 depicts the frequency of edit distances for the different BN sizes under all noise conditions. We plotted edit-distances of 0, a31 a31 a31 , 9 and a0a2a1 , plus the category NI, which stands for &quot;No Interpretation&quot;. As shown in Figure 3, the 0 edit-distance has the highest frequency, and performance deteriorates as BN size increases. Still, for BNs of 7 arcs or less, the vast majority of the interpretations have an edit distance of 3 or less. Only for BNs of 9 arcs the number of NIs exceeds the number of perfect matches.</Paragraph> <Paragraph position="3"> We also tested each kind of noise separately, maintaining the other kinds of noise at 0%. All the distortions were between 0 and 40%. We performed 1560 trials for word noise, arc noise and node insertions, and 2040 trials for belief noise, which warranted additional observations. Figures 4, 5 and 6 show the recognition accuracy of our system (in terms of average edit distance) as a function of arc, belief and word noise percentages, respectively. The performance for the different BN sizes (in arcs) is also shown. Our system's performance for node insertions is similar to that obtained for belief noise (the graph was not included owing to space limitations). Our results show that the two main factors that affect recognition performance are BN size and word noise, while the average edit distance remains stable for belief and arc noise, as well as for node insertions (the only exception occurs for 40% arc noise and size 9 BNs). Specifically, for arc noise, belief noise and node insertions, the average</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="evalu"> <SectionTitle> (2040 trials) </SectionTitle> <Paragraph position="0"> edit distance was 3 or less for all noise percentages, while for word noise, the average edit distance was higher for several word-noise and BN-size combinations. Further, performance deteriorated as the percentage of word noise increased.</Paragraph> <Paragraph position="1"> The impact of word noise on performance reinforces our intention to implement a more principled sentence comparison procedure (Section 4.1), with the expectation that it will improve this aspect of our system's performance.</Paragraph> </Section> class="xml-element"></Paper>