File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/c04-1176_relat.xml

Size: 1,954 bytes

Last Modified: 2025-10-06 14:15:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1176">
  <Title>Automatic Construction of Japanese KATAKANA Variant List from Large Corpus</Title>
  <Section position="3" start_page="1" end_page="1" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> There are some related work for the problems with Japanese spelling variations. In (Shishibori and Aoe, 1993), they have proposed a method for generating Japanese KATAKANA variants by using replacement rules, such as (be) -(ve) and(chi) -(tsi).</Paragraph>
    <Paragraph position="1"> Here, &amp;quot;-&amp;quot; represents &amp;quot;substitution.&amp;quot; For example, when we apply these rules to &amp;quot; (Venezia),&amp;quot; three different spellings are generated as variants, such as &amp;quot;,&amp;quot; &amp;quot; ,&amp;quot; and &amp;quot;.&amp;quot; Kubota et al. have extracted Japanese KATAKANA variants by first transforming KATAKANA words to directed graphs based on rewrite rules and by then checking whether the directed graphs contain the same labeled path or not (Kubota et al., 1993). A part of their rewrite rules is shown in Table 2. For example, when applying these rules to &amp;quot; (Kuwait),&amp;quot; &amp;quot;aac,&amp;quot; &amp;quot;bac,&amp;quot; &amp;quot;dac&amp;quot; are generated as variants.</Paragraph>
    <Paragraph position="3"> In (Shishibori and Aoe, 1993) and (Kubota et al., 1993), they only paid attention to applying their replacement or rewrite rules to words themselves and didn't pay attention to their contexts. Therefore, they wrongly decide that &amp;quot;&amp;quot;isavariantof&amp;quot;.&amp;quot; Here, &amp;quot;  &amp;quot;represents&amp;quot;wave&amp;quot;and&amp;quot;&amp;quot;represents &amp;quot;web.&amp;quot; In our method, we will decide if &amp;quot; &amp;quot;and&amp;quot;&amp;quot; convey the same meaning or not using a semantic similarity based on their contexts.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML