File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/p02-1016_concl.xml

Size: 1,414 bytes

Last Modified: 2025-10-06 13:53:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1016">
  <Title>Active Learning for Statistical Natural Language Parsing</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> We have examined three entropy-based uncertainty scores to measure the usefulness of a sample to improving a statistical model. We also de ne a distance for sentences of natural languages. Based on this distance, we are able to quantify concepts such as sentence density and homogeneity of a corpus. Sentence clustering algorithms are also developed with the help of these concepts. Armed with uncertainty scores and sentence clusters, we have developed sample selection algorithms which has achieved signi cant savings in terms of labeling cost: we have shown that we can use one-third of training data of random selection and reach the same level of parsing accuracy. null While we have shown the importance of both condence score and modeling the distribution of sample space, it is not clear whether or not it is the best way to combine or reconcile the two. It would be nice to have a single number to rank candidate sentences. We also want to test the algorithms developed here on other domains (e.g., Wall Street Journal corpus). Improving speed of sentence clustering is also worthwhile.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML