File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/w01-0707_evalu.xml
Size: 4,252 bytes
Last Modified: 2025-10-06 13:58:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-0707"> <Title>Probabilistic Models for PP-attachment Resolution and NP Analysis</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 7 Discussion </SectionTitle> <Paragraph position="0"> There are two main conclusions we can draw from the preceding results. The first one is that the results are disappointing, in so far as we were not able to really outperform our baseline. The second one is that the best results are achieved with the complete model integrating subcategorization information.</Paragraph> <Paragraph position="1"> With respect to our model, the difference between experiment 1 and experiment 2 shows that the closest sister brings valuable information to establish the best parse of the chain of nuclei. Even though this information was derived from ambiguous configurations, the extraction heuristics we used does capture actual dependencies, which validates our assumptions 6 and 7. The integration of subcategorization frame information in experiments 3 and 4 does not improve the results, indicating that most of information is already carried in the corresponding version of the general model by bigram lexical statistics. Furthermore, the results obtained with sucategorization information only for parsing V N P sequences do not compare well with an approach solely based on bigram statistics, thus validating the hypothesis behind most work in probabilistic parsing that world knowledge can be approximated, up to a certain extent, by bigram statistics. null The main jump in performance is achieved with the use of semantic classes. All the experiments involving semantic classes yield results over the baseline, thus indicating the well-foundedness of models making use of them. Even though our semantic resource is incomplete (out of 70000 different tokens our corpus comprises3, only 20000 have an entry in our semantic lexicon), its coverage is still sufficient to constrain word distributions and partly solve the data sparseness problem. The results obtained in previous works relying on semantic classes are above ours (around 0.82 3This huge number of tokens can be explained by the fact that the lexicon used for tokenization and tagging integrates many multi-word expressions which are not part of the semantic lexicon for (Brill and Resnik, 1994) and 0.77 for (Lauer and Dras, 1994)), but a direct comparison is difficult inasmuch as only three-word sequences (V N P, for (Brill and Resnik, 1994) and N N N for (Lauer and Dras, 1994)) were used for evaluation in those works, and the language studied is English. However, it may well be the case that the semantic resource we use does not compare well, in terms of coverage and homogeneity, with WordNet, the semantic resource usually used.</Paragraph> <Paragraph position="2"> Several choices we made in the course of developing our model and estimating its parameters need now to be more carefully assessed in light of these first results. First of all, our choice to stick with (almost) accurate information, if it leads to good results for the estimation of the probability of generating the preposition of a nucleus given its parent nucleus, may well lead us to rely too often on the smoothing parameters only when estimating other probabilities. This may well be the case for the probability in (12) where bi-gram statistics extracted with a windowing approach may prove to be more suited to the task. Furthermore, the Laplace smoothing, even though appealing from a theoretical point of view since it can be formalized as assuming a prior over our distributions, may not be fully adequate in the case where the denominator is always low compared to the normalizing contraint, a situation we encounter for equation (12).</Paragraph> <Paragraph position="3"> This may result in over-smoothing and thus prevent our model from accurately discriminating between alternate parses. Lastly, (Lauer and Dras, 1994) uses a prior over the graphs defined by parse trees to score the different parses. We have assumed a uniform prior over graphs, but the results obtained with our baseline clearly indicate that we should weigh them differently.</Paragraph> </Section> class="xml-element"></Paper>