File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0708_intro.xml
Size: 2,601 bytes
Last Modified: 2025-10-06 14:01:00
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0708"> <Title>Memory-Based Learning for Article Generation</Title> <Section position="4" start_page="0" end_page="43" type="intro"> <SectionTitle> * Visiting CSLI, Stanford University (2000). </SectionTitle> <Paragraph position="0"> t Visiting CSLI, Stanford University (1999-2000).</Paragraph> <Paragraph position="1"> hour. However, given the input sentences, it is not clear how to decide not to generate an article for the subject NP in the output sentence.</Paragraph> <Paragraph position="2"> Another important application is in the field known as augmentative and alternative communication (AAC). In particular, people who have lost the ability to speak sometimes use a text-to-speech generator as a prosthetic device. But most disabilities which affect speech, such as stroke or amyotrophic lateral sclerosis (ALS or Lou Gehrig's disease), also cause some more general motor impairment, which means that prosthesis users cannot achieve a text input rate comparable to normal typing speeds even if they are able to use a keyboard. Many have to rely on a slower physical interface (headstick, head-pointer, eye-tracker etc). We are attempting to use a range of NLP technology to improve text input speed for such users. Article choice is particularly important for this application: many AAC users drop articles and resort to a sort of telegraphese, but this causes degradation in comprehension of synthetic speech and contributes to its perception as unnatural and robot-like. Our particular goal is to be able to use an article generator in conjunction with a symbolic generator for AAC (Copestake, 1997; Carroll et al., 1999).</Paragraph> <Paragraph position="3"> In this paper we investigate the use of corpus data to collect statistical generalizations about article use in English so as to be able to generate them automatically. We use data from the Penn Treebank as input to a memory-based learner (TiMBL 3.0; Daelemans et al., 2000) that is used to predict whether to generate the or alan or no article. 1 We discuss a variety of lexical, syntactic and semantic features that play an important role in automated article generation, and compare our results with other researchers'. null The paper is structured as follows. Section 2 relates our work to that of others. Section 3 introduces the features we use. Section 4 introduces the learning method we use. We discuss our results in Section 5 and suggest some directions for future research, then conclude with some final remarks in Section 6.</Paragraph> </Section> class="xml-element"></Paper>