File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/88/c88-1046_evalu.xml

Size: 2,160 bytes

Last Modified: 2025-10-06 14:00:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-1046">
  <Title>Word Boundary Identification fro m Phoneme Sequence Constraints in Automatic Continuous Speech Recognition</Title>
  <Section position="8" start_page="228" end_page="229" type="evalu">
    <SectionTitle>
5. Results II
</SectionTitle>
    <Paragraph position="0"> The statistics on the automatically inserted # boundaries are shown in Table IV.</Paragraph>
    <Paragraph position="1">  application of the morphology, expansion and elimination rules. ;The results show that 645/1411 (45.7%) of the target word boundaries were correctly detected. This is an increase of around 9% compared with the result obtained prior to the application of the rules described in the preceding section. 24 # boundaries were inserted at inappropriate points, either because of the presence of  because of the presence of reduced forms in the utterances that we had not derived by rule, or because of lexical items that had not been included in the word-lexicon. All 21 inserted # symbols that corresponded to morpheme boundaries were inserted medially in compounds (e.g. how#ever, there#fore), while all automatically inserted # symbols that had occurred at stem/inflectional suffix boundaries (/s..i..m..#..z/ for seems) were converted to M or M? symbols using the morphology rules described above.</Paragraph>
    <Paragraph position="2"> An approximate measure of the probability of a word boundary being incorrectly inserted can be made as follows. Firstly, since it was our intention that the algorithm should insert # symbols not only between words but also within compounds, the target number of boundaries to be identified can be considered to be 1411 (the number of word boundaries in the utterances) plus 78 (the number of boundaries occurring within compounds), i'.e. 1489. Of these (see Table IV), 645 + 21 = 666 (44.7%) were correctly inserted. The probability of a word boundary being incorrectly inserted, either as a result of a reduced form which was not derived by rule, or because of the omission of a word from the Word-lexicon, is given by: (36) (24/(666 ~}- 24) x 100) %</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML