File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/h93-1115_metho.xml

Size: 4,995 bytes

Last Modified: 2025-10-06 14:13:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1115">
  <Title>THE PENMAN PROJECT ON KNOWLEDGE-BASED MACHINE TRANSLATION</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE PENMAN PROJECT ON
KNOWLEDGE-BASED MACHINE TRANSLATION
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PROJECT GOALS
</SectionTitle>
    <Paragraph position="0"> The joint development, together with the ULTRA project at New Mexico State University and the Center for Machine Translation at Carnegie Mellon University, of an integrated knowledge-based machine-aided translation system called PANQLOSS. The ISI-specific work includes the development of English sentence generation and sentence planning capabilities and the construction of an Ontology of concepts to act as the semantic lexicon for all modules of the system as a whole. In addition, we continue to enhance Penman's existing generation technology, to collect and develop ancillary knowledge sources and software (such as grammars or bilingual dictionaries and lexicons for German, Japanese, Spanish, and Chinese), and to maintain and distribute Penman.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
RECENT RESULTS
</SectionTitle>
    <Paragraph position="0"> During the past year, the generation component of PAN-CLOSS was installed; PANGLOSS was tested during the first DARPA MT evaluation. This work necessitated the development of code to transfer the output of New Mexico's ULTRA parser to a form suitable for Penman.</Paragraph>
    <Paragraph position="1"> More recently, Penman Project members have been working on the semi-automated construction and acquisition of an Ontology for PANGLOSS. A high-level taxonomy of the basic concepts required for the processing of ULTRA, the CMU software, and Penman was synthesized out of several sources; this 400-odd node taxonomy we call the Ontology Base (OB). Current work involves migrating wordsense names from LDOCE into WordNet using several automatic techniques and then taxonomizing fragments of WordNet under the OB; at the present time, approx. 11,000 concepts have been so taxonomized and another 10,000 are awaiting final placement. Our goal is an Ontology organized under the OB of approx.</Paragraph>
    <Paragraph position="2"> 50,000 items. Toward this goal we acquired WordNet from Princeton and an online copy of Roget's thesaurus.</Paragraph>
    <Paragraph position="3"> Ramping up toward making the Ontology support processing of other languages, we have been collecting multilingual resources of various types. We have acquired an online Japanese-English dictionary (approx. 50,000 entries with phrases), several Chinese-English online dictionaries (approximately equal total size), and are in the process of acquiring the Collins bilingual Spanish-English dictionary. We have also established X-windows based display capabilities for Japanese and Chinese, ineluding a Japanese emacs editor and dictionary access intertgce.</Paragraph>
    <Paragraph position="4"> In other work, the core mapping engine of the Sentence Planning module of PANGLOSS has been constructed and is currently being debugged. The Sentence Planner converts representations of texts written in the Pangloss Interlingua into SPL expressions suitable for Penman.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="421" type="metho">
    <SectionTitle>
PLANS FOR THE COMING YEAR
</SectionTitle>
    <Paragraph position="0"> Three principal efforts are planned for the coming year: the construction of the 50,000-node Ontology, the development of English, Japanese, and Spanish lexicons associated with the Ontology, and the development and implementation of several microtheories for use in sentence planning.</Paragraph>
    <Paragraph position="1"> The main problem in Ontology construction is the automated acquisition under Ontology nodes of semantic information, as used during semantic analysis and lexical selection. A number of methods of extracting such information from dictionaries, text corpora, and other resources are being developed, as well as a system to assist the acquisition of remaining information by humans. A problem in automatically constructing lexicons of various languages is the association of a wordsense in a dictionary with its correct Ontology item (if such exists) or the creation of a new Ontology item and its correct placement in the Ontology. Variations of the algorithms used for associating LDOCE wordsenses with WordNet items will be used for this task, operating on the bilingual dictionaries we have collected.</Paragraph>
    <Paragraph position="2"> The main problems facing the Sentence Planner are the development of microtheories for lexical selection, reference (including pronominalization), and theme development, to ensure high quality and coherent output.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML