File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/p02-1021_concl.xml
Size: 2,386 bytes
Last Modified: 2025-10-06 13:53:18
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1021"> <Title>Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts</Title> <Section position="8" start_page="7" end_page="8" type="concl"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> The results of this study suggest that using Maximum Entropy modeling for abbreviation disambiguation is a promising avenue of research as well as technical implementation for text normalization tasks involving abbreviations. Several observations can be made about the results of this study. First of all, the accuracy results on the small pilot sample of 6 abbreviations as well as the larger sample with 69 abbreviations are quite encouraging in light of the fact that the training of the ME models is largely unsupervised .</Paragraph> <Paragraph position="1"> With the exception of having to have a database of acronym/abbreviations and their expansions which has to be compiled by hand. However, once such list is compiled, any amount of data can be used for training with no manual annotation.</Paragraph> <Paragraph position="2"> Another observation is that it appears that using section-level context is not really beneficial to abbreviation expansion disambiguation in this case. The results, however, are not by any means conclusive. It is entirely possible that using section headings as indicators of discourse context will prove to be beneficial on a larger corpus of data with more than 69 abbreviations.</Paragraph> <Paragraph position="3"> The abbreviation/acronym database in the UMLS tends to be more comprehensive than most practical applications would require. For example, the Mayo Clinic regards the proliferation of abbreviations and acronyms with multiple meanings as a serious patient safety concern and makes efforts to ensure that only the &quot;approved&quot; abbreviations (these tend to have lower ambiguity) are used in clinical practice, which would also make the task of their normalization easier and more accurate. It may still be necessary to use a combination of the UMLS's and a particular clinic's abbreviation lists in order to avoid missing occasional abbreviations that occur in the text but have not made it to the approved clinic's list. This issue also remains to be investigated.</Paragraph> </Section> class="xml-element"></Paper>