File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/69/c69-4201_intro.xml

Size: 10,505 bytes

Last Modified: 2025-10-06 14:04:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="C69-4201">
  <Title>A Universal Graphic Character Writer</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
I. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> This paper describes a system which simulates the function of a typewriter for all languages because of its unique nature in the c~In E and displaying of the graphic characters. Characters, Ineludlng alphabets, of all languages are encoded on a grid. However, for the internal representation in the memory, only the X-Y coordinates of each straight llne segment are recorded, which will provide sufficient information to reconstruct this character.</Paragraph>
    <Paragraph position="1"> The test program for thl8 system is written in CDC 3600 FORTRAN (a variation of FORTRAN IV) which will generate a plot tape to be used on a Calcump plotter for producing these characters. Actually, with suweminor modifications, this program can be run on any cumputer and output on any plotter available. The advantage in using the plotter is that it produces directly a clear, hard copy at a very reasonable cost. Natural languages which have been tested in the program are Chinese and English. French, German, Hindi, Hebrew, Italian, Japanese, Korean~ Russian, and Spanish are to be tested in the near future.</Paragraph>
    <Paragraph position="2"> II. CHAEACTERREPRESENTATION Different sizes of grids may be used to define the coordinates representing each character. For non-alphabetic languages, a 16 x 16 grid is proved to be sufficient for a good recoding of the character. For alphabetic languages, a 5 x 8 grid will be adequate  -4to accommodate all the letters of the languages. Each grid point in the grid is assigned a pair of values according to its relative position in the grid, that is, the coordinates of the grid point. The rows of the grid points are numbered from the bottom up, from 0 to 15 or 7 as the Y coordinate, The columns of the grid points are numbered from the left to right, from 0 to 15 or 4 as the X coordinate. The character is to be fitted into the grid with one restriction that all the starting points, turning points, and ending points of a line or iine segment have to be on one of the grid points. The coordinates of these grid points are recorded and to be stored in the memory for later retrieval. A curve of a stroke is treated as many short straight line segments. A llne is defined here with only one starting point and one ending point, and may or may not have one or more turning points between them.</Paragraph>
    <Paragraph position="3"> Coordinates of a line should always be recorded in the sequence as the starting point, the turning point or points, and the ending point. However, the sequence of coordinate groups of lines are immaterial for the character representation.</Paragraph>
    <Paragraph position="4"> The character representation method used by Hayashl, Duncan and Kuno of Harvard University is not .quite the same as described above. Instead of recording the starting point, 'turning point and ending point of a continuous llne, they virtually recorded the</Paragraph>
    <Paragraph position="6"> starting point and ending point of every line segment. That is, the turning point in a continuous line is used twice both as the ending point of the previous line segment and the starting point of the follewing line segment. This method does simplify the programming task for generating the character, but it also increases the number of coordinates to be recorded and stored for retrieval, resulting in inefficient character generation. Taking their example of a Chinese character for &amp;quot;BRAVE&amp;quot; (~), thirty-two pairs of coordinates are required to be stored for character generation. For the method described in the previous paragraph, only twenty-three coordinate pairs are necessary to accomplish the same task. If the English letter '~' is taken as an example, eight coordinate pairs are required for the Harvard method, of which, only five coordinate pairs are necessary to reconstruct the character excluding the three repeated coordinate pairs.</Paragraph>
    <Paragraph position="7">  a 16 x 16 grid with tWenty-three pairs of coordinates: 4 2,14), (12,14), ( 8,12); (6,13), ( 9,11); ( 3, 6), ( 3,11); (13,il), (i3, 6); (3, 9), (13, 9); (3, 7), 413, 7); (8,11), ( 8, 7), ( 7, 4), 4 5, 2), 4 1, 5), 414, 5), (13, 2), 411, 0),  five pairs of coordinates: The English letter '~' on a 5 x 8 grid wlth I</Paragraph>
    <Paragraph position="9"> There are other methods to represent a character. One method uses a 256 x 256 or some other size cell grid. Each cell is one bit in size to record the presence of the character strokes, lines or dots as the one bit, and the non-presence of them as the zero bit.</Paragraph>
    <Paragraph position="10"> Then~ by resembling these bits into a pattern, it is used for the recognition or generation of a character. This method is employed for almost all commercial character recognition machines to identify English letters, symbols and numerals most often in printed form.</Paragraph>
    <Paragraph position="11"> However, this is not practical for the non-alphabetic languages because it requires a huge memory to store all these character cell-patterns. For example, one commercial machine actually uses a 2,048 x 2,048 cell grid to recognize an English letter with good precision. In addition, this method is not used for character generation because of the foreseeable programming complications and the time-consuming computer operations.</Paragraph>
    <Paragraph position="12"> Another method is to choose some special codes for identifying the characters. An arbitrary code system can be used for this purpose, such as the telegraphic code of Chinese characters. They are four decimal-digit codes ranging from 0000 to 9999 for representing the I0,000 Chinese characters in use. They are rather arbitrarily assigned except that characters with certain stroke patterns are grouped together and the sequence arranged is closely  -8associated with the sequence appearing in most Chinese dictionaries. Another good example is the Binary Coded Decimal used in computers to specify the individual English letters, numerals and symbols.</Paragraph>
    <Paragraph position="13"> An alternative is to assign a code according to certain pattern or the arrangements of the strokes such as the method used by the IBM Sinowriter which was revised later by Itek Corporation and renamed as the Chicoder. It also uses four digit codes but the first two digits are alphanumerics and the last two digits are nmnerals ranging only from one to five. The advantage of this method is that it eases the memory retention of codes by the operator.</Paragraph>
    <Paragraph position="14"> These two code assiEmaent methods can also be applied to other languages as well. However, if characters are to be generated directly from those identifying codes then the coding of those characters is actually in some sort machine language for system macros and it could be very tedious and complicated. Therefore, these two methods are not recommended for generating characters but only for identifying characters durlng retrleval phase.</Paragraph>
    <Paragraph position="15"> ITT. CHARACTER GENERATION The codes for identifying different characters in various languages are just different forms of identification such as the illustrations of letters, numerals and special symbols on the keys</Paragraph>
    <Paragraph position="17"> of every type~iter keyboard. Once the code is recognized either through the input media or through some internal transformation, the coordinate group associated with this particular code will be retrieved for character generation.</Paragraph>
    <Paragraph position="18"> Character generation through programming is basically a procedure of initiating proper subroutine calls to plot a straight llne between two pairs of coordinates at a specified position.</Paragraph>
    <Paragraph position="19"> The difference between character generation through CRT beam displaying and through plott'er pen drawing is a matter of different subroutine calls for activating different hardware output devices.</Paragraph>
    <Paragraph position="20"> The character generation method employed in the test program is elaborated to the extent that once the plotter pen is lowered for drawing, it is maintained at that position untll the drawing of the current continuous line is completed and then it is lifted and set ready for the movement of the pen to the starting point Of the next continuous line. In the case of a CRT display, once the beam is turned on for displaying a line, it will not be turned off until the end of this continuous llne is reached. Thus, the coordinate greup of a continuous line is treated as a unit for the generating purpose. The first pair of coordinates in a coordinate group~ i.e., the coordinate pair of the starting point of a line, initiates the pen or beam to be moved to the position specified</Paragraph>
    <Paragraph position="22"> by this pair of coordinates, and then to be lmre~ed or turned on at that position. The second pair and all the suceeding pairs of coordinates up to the second to the last pai~, i.e., the coordinate pairs of the turning points, will each activate one movement of the pen or beam to the specified coordinate pOsition forming a straight llne segment of the continuous line. The last pair of coordinates, i.e., the c~dinate pair of the ending point, will move the pen or beam to the specified coordinate position and then lift the pen or turn off the beam at that pOsition.</Paragraph>
    <Paragraph position="23"> The Harvard method is simpler and easier in l~~ in the sense that a generating unit contains only two pairs of coordinates: the first pair as the starting point of a straight line, and the next, or the last, pair as the ending point, which work the same way as indicated in the test progz-am. However, since there is no turning points involved, a nOn-straight line must be broken into many short m~ht lines with repeated coordinates to indicate both the ending of the previous line segment and the starting of the following line segment. Thtm a non-straight line with N straight line segments will have to he drawn or disp1~yed N times with the plotter pen moved, loeered, moved again, and lifted every time, or with the CRT beam moved, turned on, moved again, and turned off every time.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML