File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1035_intro.xml
Size: 3,321 bytes
Last Modified: 2025-10-06 14:02:06
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1035"> <Title>Classifying Ellipsis in Dialogue: A Machine Learning Approach</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The phenomenon of sluicing|bare wh-phrases that exhibit a sentential meaning|constitutes an empirically important construction which has been understudied from both theoretical and computational perspectives. Most theoretical analyses (e.g. (Ross, 1969; Chung et al., 1995)), focus on embedded sluices considered out of any dialogue context. They rarely look at direct sluices|sluices used in queries to request further elucidation of quanti ed parameters (e.g. (1a)). With a few isolated exceptions, these analyses also ignore a class of uses we refer to (following (Ginzburg and Sag, 2001) (G&S)) as reprise sluices. These are used to request clari cation of reference of a constituent in a partially understood utterance, as in (1b).</Paragraph> <Paragraph position="1"> (1) a. Cassie: I know someone who's a good kisser. Catherine: Who? [KP4, 512]1 b. Sue: You were getting a real panic then.</Paragraph> <Paragraph position="2"> Angela: When? [KB6, 1888] Our corpus investigation shows that the combined set of direct and reprise sluices constitutes 1This notation indicates the British National Corpus le (KP4) and the sluice sentence number (512). more than 75% of all sluices in the British National Corpus (BNC). In fact, they make up approx. 33% of all wh-queries in the BNC.</Paragraph> <Paragraph position="3"> In previous work (Fern andez et al., to appear), we implemented G&S's analysis of direct sluices as part of an interpretation module in a dialogue system. In this paper we apply machine learning techniques to extract rules for sluice classi cation in dialogue.</Paragraph> <Paragraph position="4"> In Section 2 we present our corpus study of classifying sluices into dialogue types and discuss the methodology we used in this study.</Paragraph> <Paragraph position="5"> Section 3 analyses the distribution patterns we identify and considers possible explanations for these patterns. In Section 4 we identify a number of heuristic principles for classifying each sluice dialogue type and formulate these principles as probability weighted Horn clauses. In Section 5, we then use the predicates of these clauses as features to annotate our corpus samples of sluices, and run two machine learning algorithms on these data sets. The rst machine learner used, SLIPPER, extracts optimised rules for identifying sluice dialogue types that closely resemble our Horn clause principles. The second, TiMBL, uses a memory-based machine learning procedure to classify a sluice by generalising over similar environments in which the sluice occurs in a training set. Both algorithms performed well, yielding similar success rates of approximately 90%. This suggests that the features in terms of which we formulated our heuristic principles for classifying sluices were well motivated, and both learning algorithms that we used are well suited to the task of dialogue act classi cation for fragments on the basis of these features. We nally present our conclusions and future work in Section 6.</Paragraph> </Section> class="xml-element"></Paper>