File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1311_metho.xml
Size: 19,787 bytes
Last Modified: 2025-10-06 14:08:35
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1311"> <Title>Lyudmila</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Methods </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Overview of NLP System </SectionTitle> <Paragraph position="0"> MedLEE is composed of several different modules where each module processes and transforms the text in accordance with a particular aspect of language until a final structured output form is obtained. The structured output consists of primary units of clinical information (i.e. findings, procedures, and medications), along with corresponding modifiers (e.g. body locations, degree, certainty).</Paragraph> <Paragraph position="1"> Figure 1 shows an example of a simplified version of structured output that is generated as a result of processing the sentence there is evidence of severe pulmonary congestion with question mild consolidation changes.</Paragraph> <Paragraph position="2"> The output that is generated represents two primary clinical findings, congestion and changes.</Paragraph> <Paragraph position="3"> The first finding has a body location modifier lung, stemming from pulmonary, a certainty modifier high, stemming from evidence of, and a degree modifier high, stemming from severe. In the second finding, the certainty modifier moderate, corresponds to question, the degree modifier low corresponds to mild, and the descriptor corresponds to consolidation. Values for degree and certainty modifiers were automatically mapped to a small set of values in order to facilitate subsequent retrieval. The actual form of output generated by MedLEE is XML, but Figure 1 shows a compatible and more readable form.</Paragraph> <Paragraph position="4"> Below is a brief overview of the system. More detailed descriptions were previously published (Friedman et al., 1994). When MedLEE was originally developed, it was intended to be used in conjunction with decision support applications, where high precision was critical. Therefore, it was initially designed to maximize precision and required a complete parse. However, subsequent clinical applications required high recall, and we discovered that flexibility was critical. Currently, MedLEE attempts to find a complete parse and only resorts to partial parsing when a full parse cannot be obtained. When generating the structured output, the method that was used to obtain the parse is saved along with the structured output so that the user can filter in or out findings accordingly. null Preprocessor - The preprocessor recognizes sentence boundaries, and also performs lexical lookup in order to recognize and categorize words, phrases, and abbreviations, and to specify their target forms. The lexicon was manually developed using clinical experts because of the need for high precision. In a study we used the UMLS (Unified Medical Language System) (Lindberg, Humphreys, and McCray, 1993), a controlled vocabulary developed and maintained by the National Library of Medicine, to automatically generate a lexicon. This lexicon was subsequently used by MedLEE instead of the MedLEE lexicon to process a set of reports. Results showed a significant loss of precision (from 93% to 86%) and recall (from 81% to 60%) when using the UMLS lexicon (Friedman, et al., 2001). Terms with ambiguous senses may be disambiguated in this stage based on contextual information. The preprocessor can also handle tagged text so that lexical definitions can be specified in the text, bypassing the need for lexical lookup for cases where the text is already tagged.</Paragraph> <Paragraph position="5"> This feature is particularly useful for handling local terminology (such as the names of local facilities), as well as for resolving domain specific ambiguities.</Paragraph> <Paragraph position="6"> Parser - The parser uses a grammar and lexicon to identify and interpret the structure of the sentence, and to generate an intermediate structure based on grammar specifications. The grammar is a set of rules based on semantic and syntactic co-occurrence patterns. Development of manual rules finding: congestion body_location: lung certainty: high degree: high finding: changes certainty: moderate degree: low descriptor: consolidation Figure 1 - Sample output in simplified form for the sentence there is evidence of severe pulmonary congestion with question mild consolidation changes.</Paragraph> <Paragraph position="7"> are costly, and we are currently investigating stochastic methods to help extend the grammar automatically. null Composer - The composer is needed to compose multi-word phrases that appear separately in the input sentence to facilitate retrieval later on. For example, the discontiguous words spleen and enlarged in spleen appears enlarged would be mapped to a phrase enlarged spleen so that a subsequent retrieval could look for that phrase rather than the individual components.</Paragraph> <Paragraph position="8"> Encoder - The encoder maps the target terms in the intermediate structure to a standard clinical vocabulary (i.e. enlarged spleen is mapped to the preferred vocabulary concept splenomegaly) in the UMLS.</Paragraph> <Paragraph position="9"> Chunker - The chunker increases sensitivity by using alternative strategies to break up and structure the text if the initial parsing effort fails.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Design of Feasibility Study </SectionTitle> <Paragraph position="0"> A two-year crossover design study was conducted independently of this NLP effort (03/01/200101/31/2002, 03/01/2002-01/31/2003) in two neonatal intensive care units (NICU) in New York City to study the impact of hand hygiene products on healthcare acquired infection: * NICU-A: a 40-bed care unit, which cares for acutely ill neonates, including those requiring surgery for complex congenital anomalies and extra corporeal membrane oxygenation * NICU-B: a 50-bed unit associated with a large infertility treatment practice A trained infection control practitioner (ICP), using the CDC National Nosocomial Infection Surveillance System (NNIS) definitions, performed the surveillance for infections in both units. Cases were reviewed manually, including analysis of computerized radiology, pathology and microbiology reports as well as chart reviews and interviews with patient care providers. The diagnosis of infection was validated with the physician coinvestigator from each unit.</Paragraph> <Paragraph position="1"> As part of this study, we evaluated the feasibility of using the NLP system (MedLEE) to automatically identify potential cases of healthcare-associated pneumonia in neonates. The NLP system was not changed, but medical logic rules that accessed the NLP output had to be developed. The rules were developed by a medical expert based on modifications to a previous rule to detect pneumonia in adults (Hripcsak et al., 1995). Modifications were made in accordance with the CDC NNIS definition of healthcare-associated pneumonia in neonates. The final rule was then adapted to function properly with the output generated by MedLEE. For example, the rule looks for 38 different findings or modifier-finding combinations, such as pneumatocele and persistent opacity, and then filters out findings that are not applicable because they occur with certain modifiers (e.g. no, rule out, cannot evaluate, resolved, a total of 62 modifier). Therefore the automated monitoring system consists of two components: a) the MedLEE NLP system, and b) medical rules that access the output generated by MedLEE. In this first phase, the medical expert defined the rules broadly, to identify reports consistent with pneumonia (and not only healthcare-associated pneumonia) with the intention of continuing the effort if performance in identifying all forms of pneumonia was satisfactory. This means that the automated system could not differentiate between pneumonia and healthcare-associated pneumonia at this point.</Paragraph> <Paragraph position="2"> There were no probabilities associated with findings or combination of findings. The second phase of the study will use the results present in this work to refine the rules in order to differentiate between healthcare-associated and other types of pneumonia. null All chest radiograph reports of neonates admitted to NICU-A were processed using the automated monitoring system. To better assess true performance, no corrections were made to the reports despite misspellings and even the inclusion of other types of reports in the same electronic files as the chest radiograph reports. For instance, it is not uncommon to have a combined chest-abdomen radiograph in a neonate.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Results </SectionTitle> <Paragraph position="0"> During the 2 years of the study, from the total of 1,688 neonates admitted to the NICU-A, 1,277 neonates had 7,928 chest radiographs. Based on the experts' evaluation, only 7 neonates had healthcare-associated pneumonia at least one point during the hospital stay. Cases were definitively confirmed by cultures. These patients had a total of 168 chest radiographs, but only 13, which were associated with the 7 patients, were positive because they contracted pneumonia at some point after their admission.</Paragraph> <Paragraph position="1"> The automated system found the presence of pneumonia in 125 chest radiographs that were associated with 82 patients, including 6 of the 7 patients identified by the experts. The missed case was a neonate with cardiac problems, and the chest radiograph did not show findings of healthcare-associated pneumonia. A pulmonary biopsy performed subsequently showed findings which were consistent with healthcare-associated pneumonia.</Paragraph> <Paragraph position="2"> For healthcare-associated pneumonia, the sensitivity (recall) of the automated system was 85.7%, while specificity (false positive rate) was 94.1%, and the positive predictive value (precision) was only 7.32%.</Paragraph> <Paragraph position="3"> One of the authors (EAM), who is a board certified pediatric intensive care physician, manually analyzed the false positive cases (e.g. errors in precision), and found that several of the false positive cases actually had radiographic findings corresponding to pneumonia. Other errors require expert review of the entire patient charts to determine whether or not healthcare-associated pneumonia was present.</Paragraph> <Paragraph position="4"> The expert reviewer (EAM) also encountered several occurrences of a missed abbreviation (&quot;BPD&quot;). Another common error was the misspelling of terms.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> Natural language processing has the potential to extract valuable data from narrative reports. The significance is that a vast amount of NLP structured data could then be exploited by automated tools, such as decision support systems. Automated alerts (Dexter et al., 2001; Hripcsak et al., 1990; Kuperman et al., 1999; Rind et al., 1994) require coded clinical data to do an intelligent analysis of patient status or condition. An automated tool, which notifies appropriate personnel about patients with a particular condition or infection facilitates timely and adequate response, including treatment, medication prophylaxis, and isolation.</Paragraph> <Paragraph position="1"> Conditions such as healthcare-associated pneumonia carry significant rates of morbidity and mortality. Surveillance of respiratory infection in these patients is a challenge, and especially in neonates admitted to neonatal intensive care units.</Paragraph> <Paragraph position="2"> Isolated positive cultures alone do not distinguish between bacterial colonization and respiratory infection. Surveillance based on radiology and laboratory findings can be valuable as a complement to daily manual chart review and clinical rounds.</Paragraph> <Paragraph position="3"> An NLP system cannot be used in a clinical environment without an infrastructure to support its use. At the NYPH, a clinical event monitor (Hripcsak et al., 1996) based on Arden Syntax for Medical Logic Modules - MLM (Hripcsak et al., 1990; Hripcsak et al., 1994) provides clinical decision support. When a clinical event occurs (such as uploading of a radiograph reports), appropriate medical logic modules are triggered based on the type of event. However, in order to be used by the monitoring system, narrative data must be coded.</Paragraph> <Paragraph position="4"> We envision the integration and use of this automated NLP system to facilitate surveillance of healthcare-associated pneumonia in a real clinical environment. An additional issue is that the data from the NLP system has to be represented in a way that can be manipulated by the clinical information system, and easily retrieved by the medical rules. Therefore it is not enough to evaluate an NLP system in isolation of a clinical application.</Paragraph> <Paragraph position="5"> The NLP system may perform very well in isolation, but the rules that access the data may be very complex. They may involve complex inferencing, or may be difficult to write because of the representation generated by the NLP system.</Paragraph> <Paragraph position="6"> For healthcare-associated pneumonia, sensitivity (recall) and specificity (rate of true negatives) were appropriate for the clinical application (87.7% and 94.1% respectively), but the positive predictive value (precision) was low (7.32%), as expected in this phase. Low precision was primarily due to the broad rule that was used to detect pneumonia, and was not due to the NLP system itself. This rule now needs to be refined to detect only healthcare-associated pneumonia, and distinguish among radiograph findings moderately or highly suggestive of healthcare-associated pneumonia. That would require substantial effort involving manual chart review by an expert.</Paragraph> <Paragraph position="7"> Additional data from other sources, such as laboratory results, should also be combined with radiograph findings to add precision to the automated system. This will be done in the future as well as an evaluation. The data from NICU-B was reserved as a test set for this purpose.</Paragraph> <Paragraph position="8"> The MedLEE system was not adapted in any way for this effort. Additionally, the rules were based on expert knowledge but there was no training of the rules because of the sparseness of the data. One type of NLP error was caused by a missed abbreviation BPD. A straightforward solution would be to include the abbreviation in the lexicon, but, this will create problems because of the ambiguous nature of the abbreviation. BPD has multiple meanings, including broncopulmonary dysplasia, borderline personality disorder, biparietal diameter, bipolar disorder, and biliopancreatic diversion, among others. This is not surprising since abbreviations are known to be highly ambiguous (Aronson and Rindfleshch, 1994; Nadkarni, Chen, and Brandt, 2001), and are widespread in clinical text. In chest radiographs of neonates, BPD generally denotes broncopulmonary dysplasia, a condition that predisposes the patient to respiratory infection. In other types of radiology reports, for instance abdominal echography, BPD generally means biparietal diameter, a measure of the gestation age. Word sense disambiguation is a difficult problem, which is widely discussed in the computational linguistics literature. A review of methods for word sense disambiguation is presented by Ide and colleagues (Ide and Veronis, 1998). In the clinical setting, an important part of the solution will involve identifying the particular domain and use of special purpose domain-specific disambiguators that tag ambiguous abbreviations and specify their appropriate sense prior to parsing, based on the domain and other contextual information. Defining the appropriate domain granularity will be important, but may be a difficult task because the granularity may vary with the abbreviation. For example, in the case of radiographic reports, possibly the domain should involve all chest x-rays or only chest x-rays of neonates, or the specific type of reports.</Paragraph> <Paragraph position="9"> In this study, we wanted to first evaluate the feasibility of automated surveillance based on NLP in a real clinical situation. The situation that presented itself was important but only involved a small population of positive cases. The results that were obtained are not meant to be definitive but to expose the issues associated with the use of an automated system that uses NLP in a real environment,. This study established a relationship with clinicians who need this technology. It is this collaboration, which is critical for furthering use of and validation of NLP in the clinical domain. In this study, for instance, upon reviewing our results, the infection control practitioner felt she may have missed some cases when following her typical manual surveillance, and would welcome the assistance of an automated system, even if it generated a manageable amount of false positives (false alerts). We do not know what that amount should be, but estimate that an amount in the range of a few false positives per week would be acceptable.</Paragraph> <Paragraph position="10"> In that case, the 82 false positives, accounting for 2 years of cases, would be very acceptable. This would need further studying.</Paragraph> <Paragraph position="11"> Routine surveillance of infectious diseases in hospitals is generally accomplished by manual review of charts and clinical rounds by the ICPs. In case of suspected infection, the data are collected using surveillance protocols that target inpatients at high risk of infection. The CDC NNIS definition for healthcare-associated pneumonia is a 2-page written protocol with two different criteria. It is well known that interpretation of guidelines and protocols vary among health care providers, even within the same institution. A recent study on surveillance of ventilator-associated pneumonia (VAP) in very-low-weight infants retrospectively compare VAP surveillance diagnoses made by the hospital ICPs with those made by a panel of experts with the same clinical, laboratory, and radiologic data corroborates the variation among experts (Cordero, et al., 2000). An accurate NLP system, which codes reports consistently, should improve data collection for surveillance.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> Surveillance of infectious disease is critical for health care but manual methods are costly, inconsistent, and error prone. An automated system using natural language processing would be an invaluable tool that could be used to improve surveillance, including emerging infectious diseases and biothreats. We performed a feasibility study in conjunction with an infectious disease control study to detect the presence of healthcare-associated pneumonia in neonates. The results showed that an automated system consisting of NLP and clinical rules could be used for automated surveillance. Further work will include refinement of the rules, further evaluation, integration with the clinical environment, and identification of other surveillance applications.</Paragraph> </Section> class="xml-element"></Paper>