File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/c04-1022_relat.xml
Size: 2,384 bytes
Last Modified: 2025-10-06 14:15:39
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1022"> <Title>Automatic Learning of Language Model Structure</Title> <Section position="7" start_page="0" end_page="0" type="relat"> <SectionTitle> 6 Related Work </SectionTitle> <Paragraph position="0"> Various previous studies have investigated the feasibility of using units other than words for language modeling (e.g. (Geutner, 1995; C arki et al., 2000; Kiecza et al., 1999)). However, in all of these studies words were decomposed into linear sequences of morphs or morph-like units, using either linguistic knowledge or data-driven techniques. Standard language models were then trained on the decomposed representations. The resulting models essentially express statistical relationships between morphs, such as stems and a xes. For this reason, a context larger than that provided by a trigram is typically required, which quickly leads to data-sparsity. In contrast to these approaches, factored language models encode morphological knowledge not by altering the linear segmentation of words but by encoding words as parallel bundles of features.</Paragraph> <Paragraph position="1"> The general possibility of using multiple conditioning variables (including variables other than words) has also been investigated by (Dupont and Rosenfeld, 1997; Gildea, 2001; Wang, 2003; Zitouni et al., 2003). Mostly, the additional variables were general word classes derived by data-driven clustering procedures, which were then arranged in a backo lattice or graph similar to the present procedure. All of these studies assume a xed path through the graph, which is usually obtained by an ordering from more speci c probability distributions to more general distributions. Some schemes also allow two or more paths to be combined by weighted interpolation. FLMs, by contrast, allow di erent paths to be chosen at run-time, they support a wider range of combination methods for probability estimates from di erent paths, and they o er a choice of different discounting options at every node in the backo graph. Most importantly, however, the present study is to our knowledge the rst to describe an entirely data-driven procedure for identifying the best combination of parameter choices. The success of this method will facilitate the rapid development of FLMs for di erent tasks in the future.</Paragraph> </Section> class="xml-element"></Paper>