File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/h92-1124_metho.xml
Size: 4,906 bytes
Last Modified: 2025-10-06 14:13:14
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1124"> <Title>IN-DEPTH KNOWLEDGE-BASED MACHINE TRANSLATION</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> IN-DEPTH KNOWLEDGE-BASED MACHINE TRANSLATION </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> PROJECT GOALS </SectionTitle> <Paragraph position="0"> The development of ap integrated knowledge-based machine-aided translation system called PANGLOSS in collaboration with the Center for Machine 'Ikanslation (CMT) at CMU and the Computing Research Laboratory (CRL) at New Mexico State University. The IS1 part of the collaboration is focused initially on providing the system's output capabilities, primarily in English and then in other languages, including (some of) German, Chinese, and Japanese. Additional tasks are the maintenance and continued distribution of the Penman sentence generator and text planner and the development of ancillary knowledge sources and software.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> RECENT RESULTS </SectionTitle> <Paragraph position="0"> Members of the project have participated in several aspects of the design and setting up of PANGLOSS and in the overall MT effort. Three major efforts are: 1. Incorporation of language generation: In the first-year version of PANGLOSS, the ULTRA analyzer of CRL is linked to the Penman generator, both being embedded in the Translator's Workstation (TWS) that includes several browsing, editing, and other user facilities. A process for converting ULTRA output to Penman input has been developed and is being debugged. Approximately 80 ULTRA output sentences (each with approximately 13 variant parses) have been used as test suite; at present the conversion+Penman system produces roughly 25% correct throughput, 35% identifiable errors (which will be trapped and sent to the user for correction), 15% Penman grammar shortcomings, and 25% miscellaneous problems, mostly involving representational inconsistencies. Current work is focusing on extending the grammar, developing ways of interacting with the user, and ironing out the inconsistencies. Also, work on acquiring the system substrate to support PAN- null GLOSS at IS1 has been performed; including software acquisition and various licensing requirements.</Paragraph> <Paragraph position="1"> 2. Interlingua construction: The PANGLOSS Inter-lingua Committee recently began constructing an Inter null lingua, using as starting point the terminologies developed by the three partners, namely ONTOS, the ontology developed at the CMT, IR, the Intermediate Representation terminology used at the CRL, and the Penman Upper Model. Both ONTOS and IR have already been used to support Interlingual machine translation, while variants of the Upper Model suited for German, Japanese, and Chinese are under construction at GMD/IPSI (Germany) and the University of Sydney (Australia). An initial specification of the Interlingua has been developed. A set of issues to be addressed next, including the notation and the substrate for the Interlingua, has been drawn up.</Paragraph> <Paragraph position="2"> 3. Committee work: An overall MT Coordinating Committee (MTCC) has been formed and a set of spe null cialized committees with specific tasks have been created under its supervision. The first MTCC meeting will be held at IBM Yorktown Heights on February 2627. The machine translation effort's Evaluation Committee has produced three documents, one outlining the general methodology of evaluation, one describing the particulars of the upcoming MT system evaluation, and one describing the particulars of the Dry Run evaluation held of the IBM system CANDIDE in February.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> PLANS FOR THE COMING YEAR </SectionTitle> <Paragraph position="0"> Version Alpha of PANGLOSS will be completed, tested, and evaluated. An early test with real-world users will take place in March.</Paragraph> <Paragraph position="1"> The aspects in which Penman needs strengthening to handle the needs of the domain will be addressed, including portions of the grammar and the lexicon. A lexicon of at least 5,000 words will be in place in Penman by June. The kinds of human assistance possible during input preparation and generation will be worked out in detail and incorporated in the TWS. Penman and its ancillary resources will be embedded into the TWS.</Paragraph> <Paragraph position="2"> The first version PANGLOSS interlingua, including an initial domain model, will be put in place.</Paragraph> <Paragraph position="3"> The MT system evaluation will be organized and take place around end-April or mid-May.</Paragraph> </Section> class="xml-element"></Paper>