File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/h94-1042_evalu.xml
Size: 5,185 bytes
Last Modified: 2025-10-06 14:00:14
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1042"> <Title>INTEGRATED TECHNIQUES FOR PHRASE EXTRACTION FROM SPEECH</Title> <Section position="6" start_page="231" end_page="232" type="evalu"> <SectionTitle> 4. RESULTS </SectionTitle> <Paragraph position="0"> This approach was first applied in the Gisting system (Rohlicek, et al. 1992), where the goal was to extract flight IDs from off-the-air recordings of ATC communications.</Paragraph> <Paragraph position="1"> In this application, the input is extremely noisy and recognition performance is generally quite poor. We report the general word recognition accuracy and flight ID recognition accuracy for both the combined phrase structure and n-gram language models (as described in section 2), and just n-grams. The training data consists of 2090 transcribed controller transmissions. Testing data consists of 469 transmissions of average length 16. The results are presented for controller transmissions where the start and end times of the transmissions are known.</Paragraph> <Paragraph position="2"> As shown in Table 1, the overall word accuracy was improved only slightly (70% to 72%), which was expected since we only modeled a small portion of the domain.</Paragraph> <Paragraph position="3"> However, the best result was in the fraction of flight IDs detected, where we halved the miss rate (from 11% down to The next set of experiments we ran focused on comparing general word accuracy with word accuracy in the targeted portion of the domain (i.e. that portion covered by the grammar). Using a different ATC dataset (still operational data, but recorded in the tower rather than off the air), we compared bi-grams with our combined rule based and n-gram approach The grammar covered approximately 68% of the training data. We tested not only the overall word accuracy, but also the word accuracy in those portions of the text that were modeled by the grammar.</Paragraph> <Paragraph position="4"> approach.</Paragraph> <Paragraph position="5"> As shown in Table 2, not only was there an improvement in the overall word score using the integrated vs. the bi-gram language model, we can see that the improvement in accuracy in the targeted portion of the domain was much greater in the integrated approach.</Paragraph> <Paragraph position="6"> Our third set of experiments focused on the information extraction portion of the system. We evaluated the ability of the parser to extract two kinds of commands from the output of recognition. In these experiments, we took truth to be the performance of the parser on the transcribed text, since we did not have truth annotated for these phrases in our test data. (It has been our experience in working w~h flight IDs, which were annotated, that in the ATC domain the phrases are regular enough that the parser will extract nearly 100% of the information in the targeted categories. The errors that occur are generally caused by restarts, speech errors, or transcription errors.) Using the same training and test conditions as the first set of experiments described above 1, we extracted phrases for tower clearances using the grammar partially shown above (Figure 2), and direction orders, which generally consisted of a direction to turn and some heading. The test set consisted of 223 controller utterances and we scored as correct only exact matches, where the same referent object was found and all of the fields matched exactly. Results are shown in Table Three.</Paragraph> <Paragraph position="7"> We observe that the precision and recall for direction orders is drastically better than that for tower clearances, even though the grammars for the two are very similar in size. One difference, which we would like to explore further, is 1 Note on difference was that these tests were done on recognition results after automatic segmentation and classification according to pilot and controller, which generally decrease recognition accuracy.</Paragraph> <Paragraph position="8"> that the direction orders grammar was part of the language model which was used for recognition, whereas tower clearances were not modelled by the phrase grammar, only the n-gram. To know if this was a factor, we need to compare the actual word recognition accuracy for these two phrase types.</Paragraph> <Paragraph position="9"> In looking at the results for tower clearances, we found that although the exact match score was very low, there were many partial matches, where for example the runway and or the action type (takeoff, land, etc.) were found correctly, even though the entire tower clearance was not recognized. In order to take into account these partial matches, we rescored the precision and recall, counting each individual piece of information (runway, action, and clearance), so that an exact match gets a score of 3 and partial matches score a 1 or 2. Using this measure, we got a significantly improved performance: precision 64.4 and recall 63.8.</Paragraph> <Paragraph position="10"> These results highlight one of the main the advantage of this approach, that even with errorful input, useful information can be found.</Paragraph> </Section> class="xml-element"></Paper>