XML Viewer - p06-2087

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2087_metho.xml
Size: 12,735 bytes
Last Modified: 2025-10-06 14:10:30
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2087">
  <Title>Argumentative Feedback: A Linguistically-motivated Term Expansion for Information Retrieval</Title>
  <Section position="5" start_page="676" end_page="677" type="metho">
    <SectionTitle>
3 Data and metrics
</SectionTitle>
    <Paragraph position="0"> To test our hypothesis, we used the OHSUMED collection (Hersh et al., 1994), originally developed for the TREC topic detection track, which is the most popular information retrieval collection for evaluating information search in library corpora. Alternative collections (cf. (Savoy, 2005)), such as the French Amaryllis collection, are usually smaller and/or not appropriate to evaluate our argumentative classifier, which can only process English documents. Other MEDLINE collections, which can be regarded as similar in size or larger, such as the TREC Genomics 2004 and 2005 collections are unfortunately more domain-specific since information requests in these collection are usually targeting a particular gene or gene product.</Paragraph>
    <Paragraph position="1"> Among the 348,566 MEDLINE citations of the OHSUMED collection, we use the 233,455 records provided with an abstract. An example of a MEDLINE citation is given in Table 1: only Title, Abstract, MeSH and Chemical (RN) fields of MEDLINE records were used for indexing. Out of the 105 queries of the OHSUMED collection, only 101 queries have at least one positive relevance judgement, therefore we used only this subset for our experiments. The sub-set has been randomly split into a training set (75 queries), which is used to select the different parameters of our retrieval model, and a test set (26 queries), used for our final evaluation.</Paragraph>
    <Paragraph position="2"> As usual in information retrieval evaluations, the mean average precision, which computes the precision of the engine at different levels (0%, 10%, 20%... 100%) of recall, will be used in our experiments. The precision of the top returned  Title: Computerized extraction of coded findings from free-text radiologic reports. Work in progress.</Paragraph>
    <Paragraph position="3"> Abstract: A computerized data acquisition tool, the special purpose radiology understanding system (SPRUS), has been implemented as a module in the Health Evaluation through Logical Processing Hospital Information System.</Paragraph>
    <Paragraph position="4"> This tool uses semantic information from a diagnostic expert system to parse free-text radiology reports and to extract and encode both the findings and the radiologists' interpretations. These coded findings and interpretations are then stored in a clinical data base. The system recognizes both radiologic findings and diagnostic interpretations. Initial tests showed a true-positive rate of 87% for radiographic findings and a bad data rate of 5%. Diagnostic interpretations are recognized at a rate of 95% with a bad data rate of 6%. Testing suggests that these rates can be improved through enhancements to the system's thesaurus and the computerized medical knowledge that drives it.</Paragraph>
    <Paragraph position="5"> This system holds promise as a tool to obtain coded radiologic data for research, medical audit, and patient care.</Paragraph>
    <Paragraph position="6">  and keyword fields as provided by MEDLINE librarians: major concepts are marked with *; Subheadings and checktags are removed.</Paragraph>
    <Paragraph position="7"> document, which is obviously of major importance is also provided together with the total number of relevant retrieved documents for each evaluated run.</Paragraph>
  </Section>
  <Section position="6" start_page="677" end_page="679" type="metho">
    <SectionTitle>
4 Methods
</SectionTitle>
    <Paragraph position="0"> To test our experimental hypothesis, we use the Rocchio algorithm as baseline. In addition, we also provide the score obtained by the engine before the feedback step. This measure is necessary to verify that feedback is useful for querying the OHSUMED collection and to establish a strong baseline. While Rocchio selects the features to be added to the original queries based on pure statistical analysis, we propose to base our feature expansion also on argumentative criteria. That is, we overweight features appearing in sentences classified in a particular argumentative category by the argumentative categorizer. null</Paragraph>
    <Section position="1" start_page="677" end_page="677" type="sub_section">
      <SectionTitle>
4.1 Retrieval engine and indexing units
</SectionTitle>
      <Paragraph position="0"> The easyIR system is a standard vector-space engine (Ruch, 2004), which computes state-of-the-art tf.idf and probabilistic weighting schema. All experiments were conducted with pivoted normalization (Singhal et al., 1996a), which has recently shown some effectiveness on MEDLINE corpora (Aronson et al., 2006).</Paragraph>
      <Paragraph position="1"> Query and document weighings are provided in Equation (1): the dtu formula is applied to the documents, while the dtn formula is applied to the query; t the number of indexing terms, dfj the number of documents in which the term tj; pivot and slope are constants (fixed at pivot = 0.14, slope = 146).</Paragraph>
      <Paragraph position="2"> dtu: wij = (Ln(Ln(tfij)+1)+1)*idfj(1[?]slope)*pivot+slope*nti dtn: wij = idfj *(Ln(Ln(tfif)+1)+1) (1) As already observed in several linguistically-motivated studies (Hull, 1996), we observe that common stemming methods do not perform well on MEDLINE collections (Abdou et al., 2006), therefore indexing units are stored in the inverted file using a simple S-stemmer (Harman, 1991), which basically handles most frequent plural forms and exceptions of the English language such as -ies, -es and -s and exclude endings such as -aies, -eies, -ss, etc. This simple normalization procedure performs better than others and better than no stemming. We also use a slightly modified standard stopword list of 544 items, where strings such as a, which stands for alpha in chemistry and is relevant in biomedical expressions such as vitamin a.</Paragraph>
    </Section>
    <Section position="2" start_page="677" end_page="678" type="sub_section">
      <SectionTitle>
4.2 Argumentative categorizer
</SectionTitle>
      <Paragraph position="0"> The argumentative classifier ranks and categorizes abstract sentences as to their argumentative classes. To implement our argumentative categorizer, we rely on four binary Bayesian classifiers, which use lexical features, and a Markov model, which models the logical distribution of the argumentative classes in MEDLINE abstracts. A comprehensive description of the classifier with feature selection and comparative evaluation can be found in (Ruch et al., 2005a) To train the classifier, we obtained 19,555 explicitly structured abstracts from MEDLINE. A  Abstract: PURPOSE: The overall prognosis forpatientswithcongestiveheartfailureispoor.</Paragraph>
      <Paragraph position="1"> Defining specific populations that might demonstrate improved survival has been difficult [...] PATIENTS AND METHODS: We identified 11 patientswith severe congestiveheart failure (average ejection fraction 21.9 +/- 4.23% (+/- SD) who developed spontaneous, marked improvement over a period of follow-up lasting 4.25 +/1.49 years [...] RESULTS: During the follow-up period, the average ejection fraction improved in 11 patients from 21.9 +/- 4.23% to 56.64 +/- 10.22%. Late follow-up indicates an average ejection fraction of 52.6 +/- 8.55% for the group [...] CONCLUSIONS: We conclude that selected patients with severe congestive heart failure can markedly improve their left ventricular function in association with complete resolution of heart failure [...]  classification. The harmonic means between recall and precision score (or F-score) is in the range of 85% for the combined system.</Paragraph>
      <Paragraph position="2"> conjunctive query was used to combine the following four strings: PURPOSE:, METHODS:, RESULTS:, CONCLUSION:. From the original set, we retained 12,000 abstracts used for training our categorizer, and 1,200 were used for fine-tuning and evaluating the categorizer, following removal of explicit argumentative markers. An example of an abstract, structured with explicit argumentative labels, is given in Table 2. The per-class performance of the categorizer is given by a contingency matrix in Table 3.</Paragraph>
    </Section>
    <Section position="3" start_page="678" end_page="678" type="sub_section">
      <SectionTitle>
4.3 Rocchio feedback
</SectionTitle>
      <Paragraph position="0"> Various general query expansion approaches have been suggested, and in this paper we compared ours with that of Rocchio. In this latter case, the system was allowed to add m terms extracted from the k best-ranked abstracts from the original query. Each new query was derived by applying the following formula (Equation 2):</Paragraph>
      <Paragraph position="2"> Qprime denotes the new query built from the previous query Q, and wij denotes the indexing term weight attached to the term tj in the document Di. By direct use of the training data, we determine the optimal values of our model: m = 10, k = 15. In our experiments, we fixed a = 2.0, b = 0.75. Without feedback the mean average precision of the evaluation run is 0.3066, the Rocchio feedback (mean average precision = 0.353) represents an improvement of about 15% (cf. Table 5), which is statistically2 significant (p &lt; 0.05).</Paragraph>
    </Section>
    <Section position="4" start_page="678" end_page="679" type="sub_section">
      <SectionTitle>
4.4 Argumentative selection for
</SectionTitle>
      <Paragraph position="0"> feedback To apply our argumentation-driven feedback strategy, we first have to classify the top-ranked abstracts into our four argumentative moves: PURPOSE, METHODS, RESULTS, and CON-CLUSION. For the argumentative feedback, different m and k values are recomputed on the training queries, depending on the argumentative category we want to over-weight. The basic segment is the sentence; therefore the abstract is split into a set of sentences before being processed by the argumentative classifier. The sentence splitter simply applies as set of regular expressions to locate sentence boundaries. The precision of this simple sentence splitter equals 97% on MEDLINE abstracts. In this setting only one argumentative category is attributed to each sentence, which makes the decision model binary.</Paragraph>
      <Paragraph position="1"> Table 4 shows the output of the argumentative classifier when applied to an abstract. To determine the respective value of each argumentative contents for feedback, the argumentativecategorizerparseseachtop-rankedabstract. null These abstracts are then used to generate four groups of sentences. Each group corresponds to a unique argumentative class. Each argumentative index contains sentences classified in one of four argumentative classes. Because argumen- null CONCLUSION (00160116) The highly favorable pathologic stage (RI-RII, 58%) and the fact that the majority of patients were alive and disease-free suggested a more favorable prognosis for this type of renal cell carcinoma.</Paragraph>
      <Paragraph position="2"> METHODS (00160119) Tumors were classified according to well-established histologic criteria to determine stage of disease; the system proposed by Robson was used.</Paragraph>
      <Paragraph position="3"> METHODS (00162303) Of 250 renal cell carcinomas analyzed, 36 were classified as chromophobe renal cell carcinoma, representing 14% of the group studied.</Paragraph>
      <Paragraph position="4"> PURPOSE (00156456) In this study, we analyzed 250 renal cell carcinomas to a) determine frequency of CCRC at our Hospital and b) analyze clinical and pathologic features of CCRCs. PURPOSE (00167817) Chromophobe renal cell carcinoma (CCRC) comprises 5% of neoplasms of renal tubular epithelium. CCRC may have a slightly better prognosis than clear cell carcinoma, but outcome data are limited.</Paragraph>
      <Paragraph position="5"> RESULTS (00155338) Robson staging was possible in all cases, and 10 patients were stage 1) 11 stage II; 10 stage III, and five stage IV.</Paragraph>
      <Paragraph position="6">  rizer when applied to an argumentatively structured abstract after removal of explicit markers. For each row, the attributed class is followed by the score for the class, followed by the extracted text segment. The reader can compare this categorization with argumentative labels as provided in the original abstract (PMID 12404725).</Paragraph>
      <Paragraph position="7"> tative classes are equally distributed in MEDLINE abstracts, each index contains approximately a quarter of the top-ranked abstracts collection.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML