File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/73/c73-1022_metho.xml

Size: 8,178 bytes

Last Modified: 2025-10-06 14:11:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="C73-1022">
  <Title>JIM M2KTHIAS COOPERATIVE FILE IMPROVEMENT AND USE OF A COMPUTEI~-BASED CHINESE/ENGLISH DICTIONARY -</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
JIM M2KTHIAS
COOPERATIVE FILE IMPROVEMENT AND USE
OF A COMPUTEI~-BASED CHINESE/ENGLISH DICTIONARY -
</SectionTitle>
    <Paragraph position="0"> The CETA (Chinese-English Translation Assistance) Group is an independent organization formed to coordinate development of Chinese to English translation aids and data analysis techniques. It began as an ad hoc body of individuals from State, Commerce, Labor, Office of Education, Defense, Intelligence, Voice of America, Foreign Service Institute, Defense Language Institute, National Science Foundation and Library of Congress. Extension of interest into the scholarly community has broadened academic dimensions to include 43 US and international universities. CETA is developing a computer-based Chinese-English dictionary of current standard terms. It is also exploring tangential topics such as computer processing of Chinese research data, machine translation, and use of the CETA Dictionary file in an on-line computer aid system.</Paragraph>
    <Paragraph position="1"> Academic research and development of computer operations in , United States' universities has led to capability of computer generation of Chinese characters. Using this capability, CETA printed a 90,000 term dictionary file of Chinese-English entries and has developed a cooperative international process for refining and enriching the file. This process is called the C~TA File Improvement System. It is founded on government/academic/private cooperation, designed to edit existing material and add new material. The improvement system is based on collective improvement of the file through a wide sharing of linguistic tasks and the use of computers to store the data and process changes.</Paragraph>
    <Paragraph position="2"> Thus far, thirty-seven government and forty-three academic linguists and language specialists have committed themselves to review an improvement of the file in return for which they receive the printed copy of the dictionary plus change pages as they are generated. Over 51,000 suggested improvements have been submitted and evaluated and are awaiting update. The File Improvement System proceeds by cycles in which progressively more rigid standards of review are applied.</Paragraph>
    <Paragraph position="3">  The ftle will be reprinted in three to five year cycles with change pages issued during interim periods so that participants can share maximum benefits at all times.</Paragraph>
    <Paragraph position="4"> When CETA examined the problem of producing a dictionary, it was concluded that significant results could be achieved only by sharing the many tasks involved. It was a forbidding problem, however, the potential for improving dictionaries without waiting 20 years for new editions was a meaningful incentive. The CETA Group issued a hard copy of the 90,000 term Chinese-English listing called The CETA Computer-Based Chinese-English Dictionary. It was produced as a &amp;quot;living&amp;quot; file that could be changed constantly. It was printed by computer - the principal advantages of which were ability to print Chinese characters without typesetting and economy of effort in manipulating the data. The computer could sort in different sequences, make corrections or additions at will, extract particular subsets, and produce a hard copy image of file materials. In a word, it was possible to take the present computer-produced manuscript and give parts of it to volunteers to review and correct or add information. Also it was possible to develop methods for the reviewer to easily prepare changes and for CETA to evaluate and then update the manuscript fde.</Paragraph>
    <Paragraph position="5"> The first cycle of file review for gross error and duplication has been completed. The reviewers were given a set of instructions to guide them in review of the dictionary material and the preparation of changes or additions. The steps required to process improvements to the CETA Dictionary are, briefly stated, receipt of suggestions for change or addition, preparation for keypunch, computer generation of a prooflist showing original as well as changed entries, manual review of the prooflist, computer selection of approved changes, and update of the computer dictionary file. The application of these steps assures that all changes to the master file will be examined at least once and questionable changes can be held for later review to avoid delaying update actions. As mechanism, the improvement system is quite smooth and under ideal conditions it is possible to change the computer file in a matter of minutes. Under the less than ideal conditions that usually prevail, it is still possible to update and provide current information within a few months rather than the usual 10 year dictionary building and 20 year reissue cycles.</Paragraph>
    <Paragraph position="6"> Currently CETA has received and prepared for update a total of 51,000 changes to the 90,000 term file. Since there are more additions than deletions, the new file will be larger by a few percent. More im-</Paragraph>
    <Paragraph position="8"> . Flg. 1. Computer Printed Chinese Characters.</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
278 JIM MATHIAS
</SectionTitle>
    <Paragraph position="0"> portant, the greatest error will have been removed and the fde will be prepared for the next cycle which will emphasize the further enrichment of the lexical content, addition of grammatic information, incorporation of restrictive and stylistic labels, and identification of agglutinated phrases. The second printing of the dictionary manuscript will include Pin Yin romanization with tone and telecode numbers as well as the customary English gloss and source information. The character vector file has been significantly updated so that it now contains capabilities of drawing approximately 10,500 characters. It will be continually updated through the dictionary review cycles. See Figure 1 Computer Printed Chinese Characters.</Paragraph>
    <Paragraph position="1"> The fde will also be available as the core of an on-line computer aid. Prototype computer aid functions have been developed which illustrate the ways in which a computer file can be used in an interactive mode to help a translator. They use input by telecode and romanization and graphic input is simulated. A cathode ray tube is used to display Chinese characters, romanizations (Pin Yin, Wade-Giles, Yale), the radical number plus residual stroke cotmt, English meaning for the string and meaning for segments of the strings. Also developed is an automatic segmenting function which is the operation of breaking a string of characters into single characters and into segments of continuous characters (for synthesis of meaning form component parts).</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
COOPERATIVE FILE IMPROVEMENT 279
</SectionTitle>
    <Paragraph position="0"> CETA hopes to test this system further using a refined data base for evaluation of its potential for shared access by a wide government and academic community.</Paragraph>
    <Paragraph position="1"> CETA started with a poor dictionary but it was machineable. There are a lot of good dictionaries that are not machine readable and, therefore, difficult to change or consolidate. CETa is putting these things together by use of a wholly unique method; the voluntary cooperation of interested government and academic scholars and language specialists. The reward to participants is: 1) awareness of contribution to a worthwhile effort, 2) an up-to-date hard copy of the CETA computer file containing all the latest contributions by all participants and 3) use of the CETA Secretariat Office to search out and exchange information of common concern. The only cost is willingness to share in the work of CETA.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML