File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/p02-1016_abstr.xml
Size: 1,545 bytes
Last Modified: 2025-10-06 13:42:24
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1016"> <Title>Active Learning for Statistical Natural Language Parsing</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> It is necessary to have a (large) annotated corpus to build a statistical parser. Acquisition of such a corpus is costly and time-consuming.</Paragraph> <Paragraph position="1"> This paper presents a method to reduce this demand using active learning, which selects what samples to annotate, instead of annotating blindly the whole training corpus.</Paragraph> <Paragraph position="2"> Sample selection for annotation is based upon representativeness and usefulness . A model-based distance is proposed to measure the difference of two sentences and their most likely parse trees. Based on this distance, the active learning process analyzes the sample distribution by clustering and calculates the density of each sample to quantify its representativeness. Further more, a sentence is deemed as useful if the existing model is highly uncertain about its parses, where uncertainty is measured by various entropy-based scores.</Paragraph> <Paragraph position="3"> Experiments are carried out in the shallow semantic parser of an air travel dialog system.</Paragraph> <Paragraph position="4"> Our result shows that for about the same parsing accuracy, we only need to annotate a third of the samples as compared to the usual random selection method.</Paragraph> </Section> class="xml-element"></Paper>