File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1098_metho.xml

Size: 11,253 bytes

Last Modified: 2025-10-06 14:13:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1098">
  <Title>A PARSER COPING WITH SELF-REPAIRED JAPANESE UTTERANCES AND LARGE CORPUS-BASED EVALUATION</Title>
  <Section position="3" start_page="0" end_page="593" type="metho">
    <SectionTitle>
I~ELATED WORKS
</SectionTitle>
    <Paragraph position="0"> (Hindle 1983) and (Langer 1990) proposed parsers coping with self-repaired utterances.</Paragraph>
    <Paragraph position="1"> But they assumed that an interruption point has already been detected. Hindle thought prosodic cues carl be used in detection, but it is not clear if they can always succeed.</Paragraph>
    <Paragraph position="2"> Langer thought editing expressions can be used, but they are not always used in selfrepair. null Recently, (Shriberg, Bear, and l)owdlng 1992) proposed a pattern matching method and used it ill GEMINI system(Dowding et al. 1993). Tills is similar to our method, but the corpus(MADCOW 1992) used is less spontarleous than ours. (Subjects pressed a button to begin speaking to the system) (Nakatani and Hirschberg 1993) proposed a speech-first method in which prosodic cues are used mainly. We also think prosodic cues are important. But wc think people use linguistic cues mainly because they can understand self-repaired utterances in transcripts. All these works are done on English.</Paragraph>
    <Paragraph position="3"> (Langer also treats Germany) Because there are many syntactic differences (e.g., left l)ranching v.s. right branching), it is not  clear if their approach is applicable Japanese.</Paragraph>
  </Section>
  <Section position="4" start_page="593" end_page="593" type="metho">
    <SectionTitle>
OUTLINE OF SEt,UP
</SectionTitle>
    <Paragraph position="0"> to Fig.1 shows the outline of SERUP. Normal Parser is a parser that parses well-formed utterances. When Normal Parser fails to parse an utterance, the utterance is passed to SR-reconstructor that detects a self-repair in it and translates it into well-formed version. The translated utterance is returned to Normal Parser and parsed again. Because an utterance can contain two or more self-repairs, translation is repeated until Normal Parser succeeds in parsing or translation fails. In the latter case, the utterance has another ill-formedness or self-repair that the SR-reconstructor cannot cope with.</Paragraph>
    <Paragraph position="1"> There are two main problems in translation. One is to det, errnine an interruption point, and the other is to determine a reparandum. If these two problems can be solved, then the process of translation is carried out as follows.</Paragraph>
    <Paragraph position="2">  1. Remove editing expressions such as er, rio, I mean.</Paragraph>
    <Paragraph position="3"> 2. Supersede the reparandum with repair  part.</Paragraph>
    <Paragraph position="4"> For more detail of SERUP, see (Sagawa, Ohnishi, and Sugie 1993).</Paragraph>
  </Section>
  <Section position="5" start_page="593" end_page="595" type="metho">
    <SectionTitle>
CLUES TO TRANSLATION
</SectionTitle>
    <Paragraph position="0"> In this dlapter, we will describe a classification of self-repaired utterances. They are  classified by clues usable to determine an interruption point and a reparandum.</Paragraph>
    <Paragraph position="1"> Table 1 shows the classification. Categories printed in italics have no clue, i.e., SERUP fails to parse utterances in those categories. null with repetition A self-repair is mostly made in a way to repair a word or a phrase just before an interruption(Levelt 1988). So words or phrases around an interruption are in the same category. For example, in \[Ex.l\] speaker repairs a prepositional phrase &amp;quot;from green left to pink&amp;quot; to &amp;quot;from blue left to pink&amp;quot;, It is rare that he/she just repairs a noun &amp;quot;green&amp;quot; to &amp;quot;blue&amp;quot;. In such self-repairs, a repetition of a word or a phrase often exists. In self-repairs which are intended to correct an error (such as \[Ex.1\]), words or phrases around the error may be repeated..</Paragraph>
    <Paragraph position="2"> In \[gx.1\], &amp;quot;from&amp;quot; and &amp;quot;left to pink&amp;quot; are repeated. In sell-repairs which are intended to add some information to the item just mentioned, the item may be repeated as in \[Ex.a\]. \[Ex.3\] \[ want a fight, one way flight (from (Shrlberg, Bear, and l)owding 1992)) ILl this example a word &amp;quot;flight&amp;quot; is repeated. A repetition is made with the same constituent or an item in tile same category, such as &amp;quot;orange&amp;quot; with &amp;quot;apl)le&amp;quot;.</Paragraph>
    <Paragraph position="3"> There are four possible structures around an int, errupl, ion of a self-repa.ir with a repet;ition. l&amp;quot;ig.2 shows them.</Paragraph>
    <Paragraph position="4">  A is a case of a simple repetition. B, C and l) are cases in which some words exist between repetition. With cases II and C, positions of repetition directory indicate where an interruption occurs and which is a reparandum, but with D case, do not.</Paragraph>
    <Paragraph position="5"> SERUI ~ can cope with cases A, tt and C.</Paragraph>
    <Paragraph position="6"> with syntactic break A self-repalr comes with an interruption of utterance. Because an interruption may occur anywhere in an utterance (even within a word), self-repaired utterance can contain a syntactic break.</Paragraph>
    <Paragraph position="7"> If this 1)reak can be detected, we can identify an interruption point.</Paragraph>
    <Paragraph position="8"> same fl'agment repetition When a sl)eaker interrupts an utterance within a word, a fl'agment of the interrupted word is left. But he/she sometimes starts the repair with a word that begins with the slune fragment as in \[Ex.4\].</Paragraph>
    <Paragraph position="9"> \[F,x.4\] ten, tenji tanntou ,m kata t,o '.l'his can be treated as A repetition, but to investigate a within-word interruption~ we treated it as a separate category.</Paragraph>
    <Paragraph position="10"> In this case, an interruption point is just after a repeated fragment. And if within-word interruptions are only made to repair an interrupted word, a rtq)armldum can be identified as the repeated fragment with unknown word Sometimes a fi'agment left clm be detected as an unknown word. For example, if a word &amp;quot;ketueki(blood)&amp;quot; is interrul)ted and a fi'agmeat &amp;quot;ketue&amp;quot; is left, this fragment (:an be detected because there is no Japanese word &amp;quot;ketue'.</Paragraph>
    <Paragraph position="11"> In this case, an interruption point is just after an unknown word. And the repa.randum can be determined if the same condition as the above case is sufficed.</Paragraph>
    <Paragraph position="12"> with isolated word A fragment left by a within-wor.'l interruption is not alwa.ys detected as the same fl:a,g~ meat repetition or an unknown word. For example, it fragment &amp;quot;hen&amp;quot; can be left when &amp;quot;hontou&amp;quot;(real) is interrupted, but this string can l)e a wor(1 meaning &amp;quot;book&amp;quot;.</Paragraph>
    <Paragraph position="13"> But such a word is always &amp;quot;isolated&amp;quot;, tlutt is, both two subtree.s in fig'l fail.</Paragraph>
    <Paragraph position="14"> In this (:as(.', an interruption point is just after an isolated word. An(l repara.ndum can be determined if the same condition as the above ('ase is sufficed.</Paragraph>
    <Paragraph position="15"> without repetition of a stem Because Japanese inflectional morphology is complicated, speakers often make inflection errors. To rel)alr such errors a speaker often starts a relmi,&amp;quot; without repetition of a stem as i,, \[l':x..q ,,or as i,, \[I,:x.6\].</Paragraph>
    <Paragraph position="16"> \[ICx.5\] itada i, ker, ,o ka \[F,x.f;\] itada i, ita.da keru ,o kn In these examph'.s, &amp;quot;ita.da&amp;quot; is it stem and the Sl:)eak(w first tries to say &amp;quot;itada ita&amp;quot; or &amp;quot;ita(la i re&amp;quot; and then changes to &amp;quot;ita(la keru&amp;quot;. I,t the case of \[l';x.6\], a repetition of a stem can be used as a (:lue. In the. case of \[Ex.5\], existence of an affix without a stem indicates an interruption point and a reparandum.</Paragraph>
    <Paragraph position="17"> fresh start l&amp;quot;resh start is a rel)air with a complet(;ly diffe.r(mt utterance. A fragment of utterance I)efore interrulfl.iotl is ignored. SI';ltUP tries the detection of fresh stm't if all possible (:lues are n()t, fou,(l. It tries to pa.rse the fragment of utterance without a first word of it. it rel)eats this trial until I)ar~ing succeeds.</Paragraph>
    <Paragraph position="18">  others SERUP cannot cope with utterances of all these categories.</Paragraph>
    <Paragraph position="19"> changed to well-formed A self-repaired utterance is occasionally parsed successfully as a well-formed utterance that has a meaning that the speaker does not intend. For example, in \[Ex.7\], a fragment &amp;quot;kyou&amp;quot; of a word &amp;quot;kyousan &amp;quot;(cosponsorship) is treated as a word &amp;quot;kyou &amp;quot;(today), and parsed successfully but the meaning of it is &amp;quot;cosponsor today&amp;quot;. \[Ex.7\] kyou, kyonsan suru Some of these utterances can be detected as an error in semantic interpreter. And wc think prosodic cues can be used effectively, because a fragment &amp;quot;kyou&amp;quot; and a word &amp;quot;kyou &amp;quot; is pronounced differently. So far, SERUP cannot cope with such utterances, because it uses well-formed first method.</Paragraph>
    <Paragraph position="20"> dividing word In \[Ex.8\] the speaker starts repair within word.</Paragraph>
    <Paragraph position="21"> \[Ex.8\] junji, bi ni desu ne The speaker tries to say &amp;quot;junbi ni desu ne&amp;quot;, but makes a lexical error &amp;quot;junji&amp;quot;. IIe starts the repair with a fragment &amp;quot;bi&amp;quot; of &amp;quot;junbi', instead of a complete word &amp;quot;junbi&amp;quot;. This is a very rare case.</Paragraph>
    <Paragraph position="22"> repetition with different category null Speakers occasionally repair with different category of words. A human listener can draw some inference and find relation between words, but automatic detection is difficult. null ambiguous repair In \[gx.9\], it is ambiguous what kind of self-repair is made.</Paragraph>
    <Paragraph position="23"> \[gx.9\] apointo wo, nl, er, suuzitu tyuu ni The speaker may repair a particle &amp;quot;wo&amp;quot; with &amp;quot;ni&amp;quot;, or repair a fragment &amp;quot;ni&amp;quot; of a word &amp;quot;nisanniti&amp;quot; that has the same meaning of &amp;quot;suuzltu&amp;quot;(some days). We cannot solve this anablguity automatically.</Paragraph>
  </Section>
  <Section position="6" start_page="595" end_page="595" type="metho">
    <SectionTitle>
LARGE COIl.PUS-BASED
ANALYSIS
</SectionTitle>
    <Paragraph position="0"> To investigate effectiveness of SERUP we analyzed a large corpus called ADD(Ehara et al. 1990). ADD contains one million words of dialogues about registration to an international conference over telephone. ADD is created at ATR Interpreting Telephony Laboratories. null There are 1.,082 self-repairs in the corpus. With these self-repairs, we investigate the categories they belong to. Table 1 shows the result.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML