File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0117_metho.xml
Size: 8,042 bytes
Last Modified: 2025-10-06 14:14:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0117"> <Title>A Natural Language Correction Model for Continuous Speech Recognition 1</Title> <Section position="5" start_page="169" end_page="171" type="metho"> <SectionTitle> 4. DETAILED ALGORITHM </SectionTitle> <Paragraph position="0"> In this section we discuss the major steps in the C-Box algorithm. Some further details are discussed in subsequent sections.</Paragraph> <Paragraph position="1"> Step 1. The process is started by collecting a sufficient amount of training text data. The training data must contain parallel text samples: a manually verified true transcription, and the output of the ASR system on the same voice input. We estimate that about a day worth of dictation is sufficient to initiate the training process, some 300 reports or about 150 KBytes of text. In addition, a fair-sized text corpus of historical, fully verified transcriptions is required. In the experiments described in this paper we used about 1000 reports (400 KBytes) of parallel text, plus an additional 5881 correct transcriptions corpus (total 3.25 MBytes). The training was performed on 800 reports, and tested on 200 reports.</Paragraph> <Paragraph position="2"> Step 2. The parallel text samples are aligned, sentence by sentence on the word level This is best achieved using content-word islands, then expanding onto the remaining words, while using some common-sense heuristics in deciding the replacement string correspondences: misaligned sections while different in print, are nonetheless close &quot;phonetic variants&quot;. We use character-level alignment for support, especially if the replacement scope is uncertain. Additionally, we allow for known spelling variants to align, e.g., 'T' for &quot;one&quot;, or &quot;Xray&quot; for &quot;X - ray&quot; or &quot;X-ray&quot;, etc. Step 3. When a misalignment is detected, the system lexicon is automatically checked for missing words, i.e., we check to see if any given transcription error arose because the speaker had used some word or words that are unknown to the speech recognition system. This is accomplished by looking up the misaligned words in a broad-coverage medical terminology lexicon 1. Since such words will be eventually entered into the SRS lexicon, thus eliminating transcription errors, we decided not to generate correction rules for them, at this time.</Paragraph> <Paragraph position="3"> Step 4. Misalignect sections give rise to preliminary context-free string replacement &quot;rules&quot;, L = R, where L is a section in the ASR output, and R is the corresponding section of the true man- null 1. We used UMLS Lexicon and Metathesaurus, available from the National Library of Medicine, as well as a cornmercial Radiology/Oncology spell-checker.</Paragraph> <Paragraph position="5"> scription. For example, the replacement rules mere were made = the remainder and this or = the suppon are obtained by aligning the following two sentences: sl: THERE WERE MADE OF THIS OR UNES AND TUBES ARE UNCHANGED IN POSITION.</Paragraph> <Paragraph position="6"> s2: THE REMAINDER OF THE SUPPORT UNES AND TUBES ARE UNCHANGED IN POSITION.</Paragraph> <Paragraph position="7"> Context-free &quot;rules&quot; are derived from perceived differences in alignment of a specific pair of sentences, but they would not necessarily apply elsewhere.</Paragraph> <Paragraph position="8"> Step 5. The candidate rules derived in the previous step are validated by observing their applicability across the training collection (which consists of the parallel text training data as well as a much larger radiology text corpus). To do so, we collect all occurrences of string L in the ASR output (e.g., &quot;there were made&quot;), and determine how many times the rule z ~ R is supported in the parallel sample. This gives rise to the support set for a given rule, which we will denote sP(z, ~ R). The remaining occurrences of L constitute the refute set, ~(L ~ R).</Paragraph> <Paragraph position="9"> We also consider substrings of L for validation purposes, e.g.: sl: AT THE TRACK OF TUMOR ON CHANGE IN ...</Paragraph> <Paragraph position="10"> s2: ENDOTRACHEAL TUBES ARE UNCHANGED IN ...</Paragraph> <Paragraph position="11"> can be used to validate the rule on change = unchanged, even though the misaligned sections are much longer here. In fact, one way of proceeding is to work with shorter-L rules first. This in turn leads to breaking down some unwieldy and unlikely candidate rules, such as the one that may arise from aligning two or more consecutive transcription errors together, as illustrated by the above example.</Paragraph> <Paragraph position="12"> The initial validation is based on estimated distribution of L within SP and RF sets, and it may look as follows:</Paragraph> <Paragraph position="14"> This says that the word FUSION has been mistakenly generated 43 times in the training sample of the SRS output, and this constitutes at least 84% of its total observed occurrences. That is, there were 8 other correct occurrences of FUSION within the corpus of all valid reports (i.e., 43 = 0.s4), and at best all of these would translate correctly when read into the SRS. Out of the 43+8 43 cases, 41 are mistranscripfions of the word EFFUSION, therefore the context-flee rule fusion = effusion could be proposed. This rule is not perfect, since it will potentially mJscorrect other occurrences of FUSION, but its application can be expected to reduce an overall transcription error rate on similar data samples.</Paragraph> <Paragraph position="15"> Step 6. In many cases, clear cut context-free rules, like the one given above axe hard to come by.</Paragraph> <Paragraph position="16"> Clearly, rules with validity weights of 50% or less are useless, but even those below 75-80% may be of little value. One possible way to produce higher quality rules is to specialize them by adding context. In other words, we refine rules by fitting them more closely to the existing evidence. This can be done by identifying contrasting features within the text surrounding L's occurrences that could help us to better differentiate between SP and RF sets. If such a feature or features are found, they will be used to restate t,=~ R as a context-sensitive rule, XLY = XRY, where X and Y are context features. The context is added, initially words, then possibly some non-terminals, one element at a time, on either side of L. The revised rules are re-validated, and the cycle is repeated until no further progress is possible.</Paragraph> <Paragraph position="17"> As an example, consider the following pair of sentences: sl: PORTABLE FROM VIEW OF THE CHEST.</Paragraph> <Paragraph position="18"> S2: PORTABLE FRONTAL VIEW OF THE CHEST.</Paragraph> <Paragraph position="19"> The misalignment gives rise to the context-free correction rule from = frontal. As we validate this rule against the training corpus we find some supporting evidence, but also many cases where the rule can't apply, like the one below: ... ARE UNCHANGED IN POSITION FROM THE PRIOR EXAMINATION.</Paragraph> <Paragraph position="20"> However, adding a one-word context of the word VIEW, i.e., replacing the context-free rule with a context-sensitive from view = frontal view produces a very good correction rule. More advanced possibilities include adding non-terminals to the rules, as in had CD horn = at CD hours, where CD stands for any number (here the word tagged with CD part-of-speech).</Paragraph> <Paragraph position="21"> Step 7. The above process may lead to alignment re-adjustments, as suggested in step 5. Upon re-alignment, additional rules may be postulated, while other rules may be invalidated. This would require another cycle of rule validation. This, again, is repeated until no further progress is possible, that is no further changes to the rule set result. The final set of context-sensitive correction rules is generated.</Paragraph> </Section> class="xml-element"></Paper>