File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/c00-2109_evalu.xml
Size: 7,276 bytes
Last Modified: 2025-10-06 13:58:33
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2109"> <Title>Backward Beam Search Algorithm for Dependency Analysis of Japanese</Title> <Section position="6" start_page="756" end_page="759" type="evalu"> <SectionTitle> 4 Experiments </SectionTitle> <Paragraph position="0"> In this section, experiments and evaluations will be reported. We use the Kyoto University Corpus (version 2) (Kurohashi el.el, 1{)97), a hand created Japanese corpus with POS-tags, bunsetsu segments and dependency information.</Paragraph> <Paragraph position="1"> The sentences in the articles from January 1, 1994 to January 8, 1994 (7,960 sentences) a.re used t'or tim training of the ME model, and the sente, nccs in the artMes of Janum'y 9, 1994: (1,246 sentences) are used for the ewduation.</Paragraph> <Paragraph position="2"> The seid;ences ill the articles of Jalluary 10, 1994 are kept for future evaluations.</Paragraph> <Section position="1" start_page="756" end_page="757" type="sub_section"> <SectionTitle> 4.1 Basic Result </SectionTitle> <Paragraph position="0"> The evahlation result of our systenl is shown ill the Kyoto University corpus. The bealn search width is sol; to 1, in other words, the systeln runs deterministically. Here, 'dependency accuracy' is the percentage of correctly analyzed dependencies out of all dependencies. 'Sentence accuracy' is the i)ercentage of the sentences in which all the dependencies are analyzed correctly. Table 2 shows the dependency accuracy and sentence accuracy for bemn widths 1 through 20. The difference is very small, but the best accuracy is obtained when the beain width is 11 (fbr the dependency accuracy), and 2 and 3 (tbr the sentence accuracy). This proves that there are cases where the analysis with the highest product of probabilities is not correct, but the analysis decide(1 at each stage is correct. This is a very interesting result of our experiment, and it is related to assulnption 4 regarding Japanese dependency, lnentioned earlier.</Paragraph> <Paragraph position="1"> This suggests that when we analyze a .Japanese sentence backwards, we can do it deterministically without great loss of accuracy. Table 3 shows where the mlalysis with bemn width 1 appears among the analyses with bealn width 200. It shows that most deterministic analyses appear as tile best analysis in the non-deterministic analyses. Also, mnong the deteraninistic analyses which are correct (503 Selltences), 498 sentences (99.0%) have the same mmlysis at the best rank in the 200-beam-width analyses. (Followed by 3 sentences at the see-. end, 1 sentence each at the third and fifth rank.) It means that in most of the cases, the mmlysis Length of the input sentence in segments The beam search width Candidate list; C for each segment keeps the top W partial analyses from that segment to the last segment.</Paragraph> <Paragraph position="2"> <Initial Operation> The second segment from the end depends on the last segment. This analysis is stored in C\[Length-l\].</Paragraph> <Paragraph position="3"> <Inductive Operation> Assume the analysis up to the (M+l)-th segment has been finished. For each candidate ~c ' in C\[M+i\], do the following operation. Compute the possible dependencies of the M-th segment compatible with 'c'. For each dependency, create a new candidate Cd~ by adding the dependency to 'c'. Calculate the probability of 'd'. If C\[M\] has fewer than W entries, add ~d ~ to C\[M\]; else if the probability of Cd~ > the probability of the least probable entry of C\[M\], replace this entry by 'd'; else ignore 'd ' When the operation finishes for all candidates in C\[M+i\], proceed to the analysis of the (M-l)-th segment. Repeat the operation until the first segment is analyzed. The best analysis for the sentence is the best candidate in C\[1\].</Paragraph> <Paragraph position="4"> with the highest probability at each stage also has the highest probability as a whole. This is related to assumption 4. The best analysis with the left context and the best analysis without tile left context are the same 95% of the time in general, and 99% of the time if the analysis is correct. These numbers are much higher than our human experinmnt mentioned in the earlier footnote (note that the number here is the percentage in terms of sentences, and the number in the footnote is the percentage in terms of segnmnts.) It means that we may get good accuracy even without left contexts in analyzing Japanese dependencies.</Paragraph> </Section> <Section position="2" start_page="757" end_page="758" type="sub_section"> <SectionTitle> 4.3 N-Best accuracy </SectionTitle> <Paragraph position="0"> As we can generate N-best results, we measured N-best sentence accuracy. Figure 3 shows the N-best accuracy. N-best accuracy is the percentage of tile sentences which have the correct analysis among its top N analyses. By setting a large beam width, we can observe N-best accuracy. The table shows the N-best accuracy when the beam width is set, to 20. When we set N = 20, 78.5% of the sentences have the correct analysis in the top 20 analyses. If we have an ideal sysl;(ml for finding th(~ COl'lCCi; mmlysis a,lnOllgPS th(;ln~ which maS, 11.%O SCllltl,lll;ic O1&quot; COlll,(;x{; inforlllt~I;io\]\]~ we can have a v(Ty a(:(;Hr~d;e an alyzer.</Paragraph> <Paragraph position="1"> \~TC Call llltl,l((; two interesting observations trom the result. The ac(:uracy of the 1--best mmlysis is about 40%, which is more tlm.n half of t, he accura(:y of 20-1)est analysis. This shows that although the system is not 1)erfb, ct, the computation of the 1)rolml)ilities is t)rol)ably good in order l;o find the correct mmlysis at the top rank.</Paragraph> <Paragraph position="2"> The other point is that the accm'aey is saturated at m'omM 80%. Iml)rovemel,t over 80% seelns very dit\[icult even if we use a very large bemn width W. (lf we set; W to the number of all possible combinations, which means almost L! for sentence length L, we (21M gC{; 100(~0 N-best accm'aey, lint this is not worth eonsidel'ing.) This suggests tlmt wc h~we missed something important. In part;icular, from our investigation of the result, we believe that (:oordinate structure is one of the most important factors to iml)rove the accuracy. This remains one area of fllturc work.</Paragraph> </Section> <Section position="3" start_page="758" end_page="759" type="sub_section"> <SectionTitle> 4.4 Speed of the analysis </SectionTitle> <Paragraph position="0"> Based on the f'(n'nml algorithm, the analysis tinle can be estimated as t)rot)orl;ional to the square, of the inl)ut sentence length. Figure 4: shows the relationshi I) between the analysis time and the sentence length when wc set the beam width to 1. We use a Sun Ultra10 machine and the process size is about 8M byte.</Paragraph> <Paragraph position="1"> We can see that the actual analyzing time al- null and mmlyzing time most follows the quadratic curve. The ~verage amflysis time is 0.03 second and the ~werage sentence lengl:h is 10 segments. The analysis time for the longest sentence (41 segments) is 0.29 second. W\; have not ot)l;imized the In'ogram in terms of speed aim there is room to shrink /;he process size.</Paragraph> </Section> </Section> class="xml-element"></Paper>