File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2106_metho.xml
Size: 9,156 bytes
Last Modified: 2025-10-06 14:09:19
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2106"> <Title>Building a Graphetic Dictionary f Character Look Up Based on Brush Str the Display of Kanji as P</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 History of Kanji and Variations in Kanji Forms </SectionTitle> <Paragraph position="0"> Chinese characters were developed in the second millennium before Christ. Kanji were introduced in Japan in the fourth century AD.</Paragraph> <Paragraph position="1"> There are kanji dictionaries with tens of thousands of kanji, but in Japan there were never more than about 6,000 in actual use.</Paragraph> <Paragraph position="2"> Developments after WW II lead to a separation of kanji forms in Japan, China, Taiwan and Korea. Japan and even more mainland China introduced shortened forms of kanji, while e.g. Taiwan uses the traditional kanji.</Paragraph> <Paragraph position="3"> The Japanese Ministry for Education selected a number of around 2000 kanji for official and general public use. It also published important to be able to write them by hand too.</Paragraph> <Paragraph position="4"> For learners of Japanese writing kanji by hand is still one of the best ways to memorize them. To recognize Japanese handwriting one has to identify the original strokes which are often joined, seeming hardly to be individual strokes. On computer screens strokes are also often almost unrecognizable.</Paragraph> <Paragraph position="5"> Most lexica don't give information concerning the stroke order, and if they do, they only do so for a small number of characters.</Paragraph> <Paragraph position="6"> Normal kanji lexica contain little material on stroke forms and their variations in kanji components. Ordinary paper lexica are of little help if one is able to recognize only parts of a kanji. One has to recognize the whole kanji in order to be able to look it up.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Kanji Strokes, Stroke Groups and </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Path Data </SectionTitle> <Paragraph position="0"> In this project, kanji are considered as graphic information, hence, they are analysed in different ways. This new data is combined with existing data.</Paragraph> <Paragraph position="1"> A very abstract and basic way to analyse kanji consists in a graphetic approach which leads to the recognition of graphemes.</Paragraph> <Paragraph position="2"> Graphemes are the smallest meaning distinguishing units. In the case of kanji this can be stroke length (as in and ), angle of the stroke (as in , and ), stroke direction (as in and ), or ending of a stroke (as in and ).</Paragraph> <Paragraph position="3"> A more concrete analysis which also takes the act of writing into account would use strokes as basic units of kanji. A stroke is a graphical element that can be drawn e.g. with a brush or a pencil without interruptions.</Paragraph> <Paragraph position="4"> Most kanji consist of more than one stroke.</Paragraph> <Paragraph position="5"> Our analysis of strokes uses 25 basic forms of brush strokes for kanji. It considers stroke direction, bending of the strokes, stroke endings (blunt or with a short bend) and so on. The stroke forms are numbered and every stroke of a kanji is assigned with the corresponding number of its stroke form.</Paragraph> <Paragraph position="6"> Strokes can be grouped together not only to build full kanji but also to combine smaller units which frequently occur in kanji. We call these smaller units grapheme elements. Many kanji dictionaries use a subset of such grapheme elements to classify characters ( bushu , engl. &quot;radicals&quot;). For the time being our analysis of the grapheme elements uses mostly existing kanji or given radicals.</Paragraph> <Paragraph position="7"> The data concerning stroke forms, grapheme elements and relative position can, of course, be used for kanji look up.</Paragraph> <Paragraph position="8"> To display the collected information concerning a kanji and its components, graphical data is needed. This is achieved in our case using a vector graphics software (Adobe Illustrator). Here kanji strokes are represented by paths. The stroke order is identical with the order of the path input.</Paragraph> <Paragraph position="9"> To allow for later review and to have more flexible data, numbers for the stroke order are put beside the strokes.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Possible Applications of the Data </SectionTitle> <Paragraph position="0"> The data presented here allows new ways to look up kanji:</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 An Example: Automatic Animation of Kanji Strokes </SectionTitle> <Paragraph position="0"> The path data created with Illustrator was exported into Scalable Vector Graphics format.</Paragraph> <Paragraph position="1"> SVG is an application of XML proposed by the World Wide Web Consortium. It provides a clear description of the graphical data well suited for the task at hand.</Paragraph> <Paragraph position="2"> The graphical description of a kanji consists mostly of an ordered list of strokes. In SVG, we represent a stroke by a path element. For instance, the first stroke of the kanji</Paragraph> <Paragraph position="4"> <path d=&quot;M21.38, 19.75, c3.31, 1.47, 8.54, 6.05, 9.37, 8.34&quot;/> The d attribute of the path element contains the path data in a compact form. This data is a list of drawing commands that an SVG renderer will execute to draw the path. The path data for every stroke will consist of a sequence of Bezier curves, which are parametric curves defined by four control points. Several paths can be grouped together under a group element, which allows the association of groups of paths (i.e., lists of strokes) with every grapheme element of a kanji. It is then possible to deal directly with grapheme elements in the graphic representation of the kanji, in order to highlight such elements (as in figure 2) or to link them to other SVG files--e.g. clicking on the left component of would link it to the kanji ( mizu , &quot;water&quot;), which is this component's standard form.</Paragraph> <Paragraph position="5"> The SVG data available so far is static.</Paragraph> <Paragraph position="6"> Our goal is to present it in a dynamic fashion, showing strokes one by one, in the order and the direction in which they should be drawn.</Paragraph> <Paragraph position="7"> We will add an animated child element to every path in the static SVG file to create its animated counterpart. The animate element controls the moment at which the path is drawn, and the shape it should take.</Paragraph> <Paragraph position="8"> Unfortunately there is no special command in SVG to draw a path progressively. A solution is to divide every path in several smaller ones, and to draw each segment one after another, giving the impression of an invisible pen drawing the kanji. Our division strategy is to segment every curve in a path into a fixed number of elements. That number of element is set to a power of two, because dividing Bezier curves into two is very easy to do. Longer strokes will consist of more curves than shorter ones, and it will take more to time to draw them; the distribution of the control points along the curves makes the animation look quite natural.</Paragraph> <Paragraph position="9"> At the end, an animation is controlled by two parameters: the number of segments into which a curve is split and the time between the drawing of two strokes. Modifying these values will make the drawing slower or faster, and more or less smooth.</Paragraph> <Paragraph position="10"> The first stroke of our example kanji will now look like shown below. The animation will start at time 0; it lasts for 0.45 seconds and it will iterate over the values given by the value's attribute. The d attribute in the path parent element will take these successive values over time.</Paragraph> <Paragraph position="11"> <path d=&quot;&quot;> <animate attributeName=&quot;d&quot;</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Future Work </SectionTitle> <Paragraph position="0"> Based on the existing data, it is easy to develop further data concerning variations in stroke order or kanji form.</Paragraph> <Paragraph position="1"> Especially the stroke descriptions could be used for better graphical character recognition. It may even lead to software that is able to recognize incorrect input, and is capable of explaining the user how to correct it. So far we deal only with Japanese kanji, but, of course, the same approach could be used for other characters like hiragana and katakana or the traditional and the shortened Chinese characters.</Paragraph> </Section> </Section> class="xml-element"></Paper>