File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-3016_metho.xml

Size: 10,089 bytes

Last Modified: 2025-10-06 14:09:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-3016">
  <Title>Portable Translator Capable of Recognizing Characters on Signboard and Menu Captured by Built-in Camera</Title>
  <Section position="4" start_page="0" end_page="61" type="metho">
    <SectionTitle>
2 System design
</SectionTitle>
    <Paragraph position="0"> Figure 1 overviews the system architecture. After the user takes a picture by the built-in camera of a PDA, the picture is sent to a controller in a remote server. At the server side, the picture is sent to the OCR module which usually outputs many character candidates. Next, the word recognizer identifies word sequences in the candidates up to the number specified by the user. Recognized words are sent to the language translator.</Paragraph>
    <Paragraph position="1"> The PDA is linked to the server via wireless com- null munication. The current OCR software is Windows-based while the other components are Linux programs. The PDA uses Windows.</Paragraph>
    <Paragraph position="2"> We also implemented the system for mobile phones using the i-mode and FOMA devices provided by NTT-DoCoMo.</Paragraph>
  </Section>
  <Section position="5" start_page="61" end_page="63" type="metho">
    <SectionTitle>
3 Each component
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="61" end_page="61" type="sub_section">
      <SectionTitle>
3.1 Appearance-based full search OCR
</SectionTitle>
      <Paragraph position="0"> Research into the recognition of characters in natural scenes has only just begun (Watanabe et al., 1998; Haritaoglu, 2001; Yang et al., 2002; Wu et al., 2004). Many conventional approaches first extract character regions and then classify them into each character category. However, these approaches often fail at the extraction stage, because many pictures are taken under less than desirable conditions such as poor lighting, shading, strain, and distortion in the natural scene. Unless the recognition target is limited to some specific signboard (Wu et al., 2004), it is hard for the conventional OCR techniques to obtain sufficient accuracy to cover a broad range of recognition targets.</Paragraph>
      <Paragraph position="1"> To solve this difficulty, Kusachi et al. proposed a robust character classifier (Kusachi et al., 2004).</Paragraph>
      <Paragraph position="2"> The classifier uses appearance-based character reference pattern for robust matching even under poor capture conditions, and searches the most probable Figure 2: Many character candidates raised by appearance-based full search OCR: Rectangles denote regions of candidates. The picure shows that candidates are identified in background regions too.</Paragraph>
      <Paragraph position="3"> region to identify candidates. As full details are given in their paper (Kusachi et al., 2004), we focus here on just its characteristic performance.</Paragraph>
      <Paragraph position="4"> As this classifier identifies character candidates from anywhere in the picture, the precision rate is quite low, i.e. it lists a lot of wrong candidates. Figure 2 shows a typical result of this OCR. Rectangles indicate erroneous candidates, even in background regions. On the other hand , as it identifies multiple candidates from the same location, it achieves high recall rates at each character position (over 80%) (Kusachi et al., 2004). Hence, if character positions are known, we can expect that true characters will be ranked above wrong ones, and greater word recognition accuracies would be achieved by connecting highly ranked characters in each character position.</Paragraph>
      <Paragraph position="5"> This means that location estimation becomes important. null</Paragraph>
    </Section>
    <Section position="2" start_page="61" end_page="62" type="sub_section">
      <SectionTitle>
3.2 Word recognition
</SectionTitle>
      <Paragraph position="0"> Modern PDAs are equipped with styluses. The direct approach to obtaining character location is for the user to indicate them using the stylus. However, pointing at all the locations is tiresome, so automatic estimation is needed. Completely automatic recognition leads to extraction errors so we take the middle approach: the user specifies the beginning and ending of the character string to be recognized and translated. In Figure 3, circles on both ends of the string denote the user specified points. All the locations of characters along the target string are estimated from these two locations as shown in Figure  specified by the user with stylus. All the character locations (four locations) are automatically estimated. null  Once the user has input the end points, assumed to lie close to the centers of the end characters, the automatic location module determines the size and position of the characters in the string. Since the characters have their own regions delineated by rectangles and have x,y coordinates (as shown in Figure 2), the module considers all candidates and rates the arrangement of rectangles according to the differences in size and separation along the sequences of rectangles between both ends of the string. The sequences can be identified by any of the search algorithms used in Natural Language Processing like the forward Dynamic Programming and backward A* search (adopted in this work). The sequence with the highest score, least total difference, is selected as the true rectangle (candidate) sequence. The centers of the rectangles are taken as the locations of the characters in the string.</Paragraph>
      <Paragraph position="1">  The character locations output by the automatic location module are not taken as specifying the correct characters, because multiple character candidates are possible at the same location. Therefore, we identify the words in the string by the probabilities of character combinations. To increase the accuracy, we consider all candidates around each estimated location and create a character matrix, an example of which is shown in Figure 4. At each location, we rank the candidates according to their OCR scores, the highest scores occupy the top row.</Paragraph>
      <Paragraph position="2"> Next, we apply an algorithm that consists of similar character matching, similar word retrieval, and word sequence search using language model scores</Paragraph>
      <Paragraph position="4"> are bound to each estimated location to make the matrix. Bold characters are true.</Paragraph>
      <Paragraph position="5"> (Nagata, 1998).</Paragraph>
      <Paragraph position="6"> The algorithm is applied from the start to the end of the string and examines all possible combinations of the characters in the matrix. At each location, the algorithm finds all words, listed in a word dictionary, that are possible given the location; that is, the first location restricts the word candidates to those that start with this character. Moreover, to counter the case in which the true character is not present in the matrix, the algorithm identifies those words in the dictionary that contain characters similar to the characters in the matrix and outputs those words as word candidates. The connectivity of neighboring words is represented by the probability defined by the language model. Finally, forward Dynamic Programming and backward A* search are used to find the word sequence with highest probability. The string in the Figure 3 is recognized as &amp;quot;a26a28a27a28a29a31a30 .&amp;quot;</Paragraph>
    </Section>
    <Section position="3" start_page="62" end_page="63" type="sub_section">
      <SectionTitle>
3.3 Language translation
</SectionTitle>
      <Paragraph position="0"> Our system currently uses the ALT-J/E translation system which is a rule-based system and employs the multi-level translation method based on constructive process theory (Ikehara et al., 1991). The string in Figure 3 is translated into &amp;quot;Emergency telephones.&amp;quot; null As target language pairs will increased in future, the translation component will be replaced by statistical or corpus based translators since they offer quicker development. By using this client-server architecture on the network, we can place many task specific translation modules on server machines and flexibly select them task by task.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="63" end_page="63" type="metho">
    <SectionTitle>
4 Preliminary evaluation of character
</SectionTitle>
    <Paragraph position="0"> recognition Because this camera base system is primarily for inputting character sets, we collected 19 pictures of signboards with a 1.2 mega pixel CCD camera for a preliminary evaluation of word recognition performance. Both ends of a string in each picture were specified on a desk-top personal computer for quick performance analysis such as tallying up the accuracy. Average string length was five characters. The language model for word recognition was basically a word bigram and trained using news paper articles.</Paragraph>
    <Paragraph position="1"> The base OCR system returned over one hundred candidates for every picture. Though the average character recall rate was high, over 90%, wrong candidates were also numerous and the average character precision was about 12%.</Paragraph>
    <Paragraph position="2"> The same pictures were evaluated using our method. It improved the precision to around 80% (from 12%). This almost equals the precision of about 82% obtained when the locations of all characters were manually indicated (Table1). Also the accuracy of character location estimation was around 95%. 11 of 19 strings (phrases) were correctly recognized.</Paragraph>
    <Paragraph position="3"> The successfully recognized strings consisted of characters whose sizes were almost the same and they were evenly spaced. Recognition was successful even if character spacing almost equaled character size. If a flash is used to capture the image, the flash can sometimes be seen in the image which can lead to insertion error; it is recognized as a punctuation mark. However, this error is not significant since the picture taking skill of the user will improve with practice.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML