File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/h94-1119_metho.xml
Size: 4,522 bytes
Last Modified: 2025-10-06 14:13:56
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1119"> <Title>RESEARCH IN NATURAL LANGUAGE PROCESSING</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> RESEARCH IN NATURAL LANGUAGE PROCESSING </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> PROJECT GOALS </SectionTitle> <Paragraph position="0"> The main objective is to develop robust methods for the understanding and generation of both written and spoken human language, including but not limited to English. Penn is pursuing development of: (1) New mathematical and computational frameworks which are highly constrained, yet adequate to allow a simple, concise description of complex linguistic phenomena. These new frameworks are tested by the explicit encoding within each framework of a wide range of phenomena across a diverse set of human languages. (2) Both statistical and symbolic learning methods which automatically extract and effectively utilize the implicit linguistic knowledge in the Penn Treebank and the corpora of the Linguistic Data Consortium. These techniques have been tested against the performance of the best current methods.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> RECENT RESULTS </SectionTitle> <Paragraph position="0"> * In a lexicalized grammar such as the lexicalized tree-adjoining grammar (LTAG), each lexical item is associated with one or more elementary trees (structures), called supevtags. We have developed techniques to eliminate or substantially reduce the supertag assignment ambiguity by using local lexical dependencies and their distribution, prior to parsing. After this step only explicit indication of substitutions and adjoinings must be indicated to complete parsing. Preliminary experiments on short fragments show a success rate of 88%, with experiments continuing on full sentences from WS3 material. null</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> * The Information-Based Intonation Synthesis (IBIS) </SectionTitle> <Paragraph position="0"> spoken reply system has been extended by a richer semantics for the assignment of stress on the basis of contrast in the domain of discourse. Synthesis of spoken responses as speech waves bearing an intonation contour appropriate to the context of utterance has thereby been considerably improved.</Paragraph> <Paragraph position="1"> * A weakly supervised symbolic learning algorithm called Error Based Transformation Learning has been developed that matches or beats the performance of the best standard methods for a range of key language analysis tasks. This method has also been used for part of speech tagging for several languages other than English with very good results.</Paragraph> <Paragraph position="2"> A new algorithm for word-sense determination performs as least as well as existing algorithms, while only using only a window of five words around the target word, as opposed to 100 words for these existing methods.</Paragraph> </Section> <Section position="5" start_page="0" end_page="476" type="metho"> <SectionTitle> PLANS </SectionTitle> <Paragraph position="0"> Apply part-of-speech disambiguation strategies to the disambiguation of lexical category assignmments words in a combinatory categorial parser.</Paragraph> <Paragraph position="1"> Port the IBIS spoken response generator to the larger domain involved in the task of critiquing of Medical Diagnosis by an expert system.</Paragraph> <Paragraph position="2"> Explore statistical morphology induction, lexical disambiguation, and language modeling with stochastic dependency grammars.</Paragraph> <Paragraph position="3"> Test the XTAG system on a corpus and build a TAG parsed corpus to serve as the basis for statistical experiments with the TAG grammar and parser.</Paragraph> <Paragraph position="4"> Contribute to a model of limited processing for discourse, using LDC corpora as the basis for an empirical analysis of bottom-up cues to discourse structure, such as variation in the forms of referring expressions, and prosodic marking by topline and baseline variation.</Paragraph> <Paragraph position="5"> Develop part-of-speech taggers and morphological learners for a range of languages other than English. Develop the 'strategic' or discourse-planning component of the spoken reply system.</Paragraph> </Section> class="xml-element"></Paper>