File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/91/e91-1019_evalu.xml

Size: 2,459 bytes

Last Modified: 2025-10-06 14:00:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="E91-1019">
  <Title>AUTOMATIC LEARNING OF WORD TRANSDUCERS FROM EXAMPLES</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
EXPERIMENTS
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Morphological analysis
</SectionTitle>
      <Paragraph position="0"> As a preliminary experiment, the morphological analysis automaton was learned on a set of 738 French words ending with the morpheme &amp;quot;/sme&amp;quot; and :associated with their decomposition into two morphemes, the first being a noun or an adjective. For example, we had the pair &lt;&amp;quot;athl~ttsme&amp;quot;,&amp;quot;athl~te+isme&amp;quot;&gt;. With a 400 states only automaton, the correct decomposition was found amongst the 10 most probable outputs for 97.6% of the training examples !.</Paragraph>
      <Paragraph position="1"> Grapheme-to -phoneme transcription The case of grapheme-to-phoneme transcription is a straightforward application of the transduction model. String w is the graphetnic form, e.g. &amp;quot;absten/r&amp;quot; and w' lWe are aware that a more precise assessment of the method would use a test set different fi'om the training set. We plan to perform such a test in the near future.</Paragraph>
      <Paragraph position="2"> is its transcription into phonemes, e.g. &amp;quot;apsteniR&amp;quot; or &amp;quot;absteniR&amp;quot;. Here the training set may feature such pairs as &lt;w, w'&gt; and &lt;w, w&amp;quot;&gt; where w' ~ w&amp;quot;.</Paragraph>
      <Paragraph position="3"> The automaton was learned on a set of 1170 acronyms associated to their phonemic form which was described in a coarse phonemic alphabet where, for example, open or closed /o/ are not dlstinguished. Acronyms raise an interesting problem in that some should be spelled letter by letter (&amp;quot;ACL&amp;quot;) whereas others may be pronounced (&amp;quot;COLING&amp;quot;). This experiment was thus intended to show that the model may take into account its input as a whole.</Paragraph>
      <Paragraph position="4"> With a 400 states only automaton, more than 50% of the training examples were correctly transcribed when only the most probable output was considered. This figure may be improved by augmenting the number of states in which case the learning phase becomes much longer.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML