File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1096_intro.xml
Size: 6,312 bytes
Last Modified: 2025-10-06 14:01:22
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1096"> <Title>Wordformand class-based prediction of the components of German nominal compounds in an AAC system</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> N-gram language modeling techniques have been successfully embedded in a number of natural language processing applications, including word predictors for augmentative and alternative communication (AAC). N-gram based techniques rely crucially on the assumption that the large majority of words to be predicted have also occurred in the corpus used to train the models.</Paragraph> <Paragraph position="1"> Productive word-formation by compounding in languages such as German, Dutch, the Scandinavian languages and Greek, where compounds are commonly written as single orthographic words, is problematic for this assumption.</Paragraph> <Paragraph position="2"> Productive compounding implies that a sizeable number of new words will constantly be added to the language. Such words cannot, in principle, be contained in any already existing training corpus, no matter how large. Moreover, the training corpus itself is likely to contain a sizeable number of newly formed compounds that, as such, will have an extremely low frequency, causing data sparseness problems.</Paragraph> <Paragraph position="3"> New compounds, however, differ from other types of new/rare words in that, while they are rare, they can typically be decomposed into more common smaller units (the words that were put together to form them). For example, in the corpus we analyzed, Abend 'evening' and Sitzung 'session', the two components of the German compound Abendsitzung 'evening session', are much more frequent words than the latter. Thus, a natural way to handle productively formed compounds is to treat them not as primitive units, but as the concatenation of their components.</Paragraph> <Paragraph position="4"> A model of this sort will be able to predict newly formed compounds that never occurred in the training corpus, as long as they can be analyzed as the concatenation of constituents that did occur in the training corpus. Moreover, a model of this sort avoids the specific type of data sparseness problems caused by newly formed compounds in the training corpus, since it collects statistics based on their (typically more frequent) components.</Paragraph> <Paragraph position="5"> Building upon previous work (Spies, 1995; Carter et al., 1996; Fetter, 1998; Larson et al., 2000), Baroni et al. (2002) reported encouraging results obtained with a model in which two-element nominal German compounds are predicted by treating them as the concatenation of a modifier (left element) and a head (right element).</Paragraph> <Paragraph position="6"> Here, we report of further improvements to this model that we obtained by adding a class-based bi-gram term to head prediction. As far as we know, this it the first time that semantic classes automatically extracted from the training corpus have been used to enhance compound prediction, independently of the domain of application of the prediction model.</Paragraph> <Paragraph position="7"> Moreover, we present the results of preliminary experiments we conducted in the integration of compound predictions and simple word predictions within the AAC word prediction task.</Paragraph> <Paragraph position="8"> The remainder of this paper is organized as follows. In section 2, we describe the AAC word prediction task. In section 3, we describe the basic properties of German compounds. In section 4, we present our split compound prediction model, focusing on the new class-based head prediction component. In section 5, we report the results of simulations run with the enhanced compound prediction model. In section 6, we report about our preliminary experiments with the integration of compound and simple word prediction. Finally, in section 7, we summarize the main results we obtained and indicate directions for further work.</Paragraph> <Paragraph position="9"> 2 Word prediction for AAC Word prediction systems based on n-gram statistics are an important component of AAC devices, i.e., software and possibly hardware typing aids for disabled users (Copestake, 1997; Carlberger, 1998).</Paragraph> <Paragraph position="10"> Word predictors provide the user with a prediction window, i.e. a menu that, at any time, lists the most likely next word candidates, given the input that the user has typed until the current character.</Paragraph> <Paragraph position="11"> If the word that the user intends to type next is in the prediction window, the user can select it from there. Otherwise, the user will keep typing letters, until the target word appears in the prediction window (or until she finishes typing the word).</Paragraph> <Paragraph position="12"> The (percentage) keystroke savings rate (ksr) is a standard measure used in AAC research to evaluate word predictors. The ksr can be thought of as the percentage of keystrokes that a &quot;perfect&quot; user would save by employing the relevant word predictor to type the test set, over the total number of keystrokes that are needed to type the test set without using the word predictor.</Paragraph> <Paragraph position="13"> Usually, the ksr is defined by</Paragraph> <Paragraph position="15"> where: ki is the number of input characters actually typed, ks is the number of keystrokes needed to select among the predictions presented by the model and kn is the number of keystrokes that would be needed if the whole text was typed without any prediction aid. Typically, the user will need one keystroke to select among the predictions , and thus we assume that ks equals 1.1 1In the split compound model, the user needs one keystroke to select the modifier and one keystroke to select the head. The ksr is influenced not only by the quality of the prediction model but also by the size of the prediction window. In our simulations, we use a 7 word prediction window.</Paragraph> <Paragraph position="16"> Ksr is not a function of perplexity, but it is generally true that there is an inverse correlation between ksr and perplexity (Carlberger, 1998).</Paragraph> </Section> class="xml-element"></Paper>