File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1101_metho.xml

Size: 8,925 bytes

Last Modified: 2025-10-06 14:07:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1101">
  <Title>Adapting a synonym database to specific domains</Title>
  <Section position="5" start_page="4" end_page="4" type="metho">
    <SectionTitle>
3 Sample application
</SectionTitle>
    <Paragraph position="0"> The described methodology was applied to the aviation domain. We used the Aviation Safety Information System (ASRS) corpus (h'etp://asrs. arc.nasa.gov/) as our aviation specific corpus. The resulting domain-specific database is being used in an IR application that retrieves documents relevant to user defined queries, expressed as phrase patterns, and identifies portions of text that are instances of the relevant phrase patterns.</Paragraph>
    <Paragraph position="1"> The application makes use of Natural Language Processing (NLP) techniques (tagging and partial parsing) to annotate documents.</Paragraph>
    <Paragraph position="2"> User defined queries are matched against such annotated corpora. Synonyms are used to expand occurrences of specific words in such queries. In the following two sections we describe how the pruning process was performed and provide some results.</Paragraph>
    <Section position="1" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
3.1 Adapting Wordnet to the
</SectionTitle>
      <Paragraph position="0"> aviation domain A vocabulary of relevant query terms was made available by a user of our IR application and was used in our ranking of synonymy relations. Manual pruning was performed on the 1000 top ranking terms, with which 6565 synsets were associated overall. The manual pruning task was split between two human evaluators. The evaluators were programmers members of our staff. They were English native speakers who had acquaintance with our IR application and with the goals of the manual pruning process, but no specific training or background on lexicographic or WordNetrelated tasks. For each of the 1000 terms, the evaluators were provided with a sample of 100 (at most) sentences where the relevant word occurred in the ASRS corpus. 100 of the 1000 manually checked clusters (i.e.</Paragraph>
      <Paragraph position="1"> groups of synsets referring to the same head term) were submitted to both evaluators (576 synsets overall), in order to check the rate of agreement of their evaluations. The evaluators were allowed to leave synsets unanswered, when the synsets only contained the head term (and at least one other synset in the cluster had been deemed correct). Leaving out the cases when one or both evaluators skipped the answer, there remained 418 synsets for which both answered. There was agreement in 315 cases (75%) and disagreement in 103 cases (25%). A sample of senses on which the evaluators disagreed is shown in (11). In each case, the term being evaluated is the first in the synset.</Paragraph>
      <Paragraph position="2"> (11) a. {about, around} (in the area or vicinity) b. {accept, admit, take, take on} (admit into a group or community) null c. {accept, consent, go for} (give an affirmative reply to) d. {accept, swallow} (tolerate or accommodate oneself to) e. {accept, take} (be designed to hold or take) f. {accomplished, effected, established} null (settled securely and unconditionally) null g. {acknowledge, know, recognize} (discern) h. {act, cognitive operation, cognitive process, operation, process} (the performance of some composite cognitive activity) i. {act, act as, play} (pretend to have certain qualities or state of mind) j. {action, activeness, activity} (the state of being active) k. {action, activity, natural action, natural process} (a process existing in or produced by nature (rather than by the intent of human beings)) It should be noted that the 'yes' and 'no' answers were not evenly distributed between the evaluators. In 80% of the cases of disagreement, it was evaluator A answering 'yes' and evaluator B answering 'no'. This seems to suggest than one of the reasons for disagreement was a different degree of strictness in evaluating. Since the evaluators matched a sense against an entire corpus (represented by a sample of occurrences), one common situation may have been that a sense did occur, but very rarely. Therefore, the evaluators may have applied different criteria in judging how many occurrences were needed to deem a sense correct. This discrepancy, of course, may compound with the fact that the differences among WordNet senses can sometimes be very subtle.</Paragraph>
      <Paragraph position="3"> Automatic pruning was performed on the entire WordNet database, regardless of whether candidates had already been manually checked or not. This was done for testing purposes, in order to check the results of automatic pruning against the test set obtained from manual pruning. Besides associating ASRS frequencies with all words in synsets and glosses, we also computed frequencies for collocations (i.e. multi-word terms) appearing in synsets. The input to automatic pruning was constituted by 10352 polysemous terms appearing at least once in ASRS the corpus. Such terms correspond to 37494 (term, synset) pairs. Therefore, the latter was the actual number of pruning candidates that were ranked.</Paragraph>
      <Paragraph position="4"> The check of WordNet senses against ASRS senses was only done unidirectionally, i.e. we only checked whether WordNet senses were attested in ASRS. Although it would be interesting to see how often the appropriate, domain-specific senses were absent from WordNet, no check of this kind was done. We took the simplifying assumption that Word-Net be complete, thus aiming at assigning at least one WordNet sense to each term that appeared in both WordNet and ASRS.</Paragraph>
    </Section>
    <Section position="2" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
3.2 Results
</SectionTitle>
      <Paragraph position="0"> In order to test the automatic pruning performance, we ran the ranking procedure on a test set taken from the manually checked files. This file had been set apart and had not been used in the preliminary tests on the automatic pruning algorithm. The test set included 350 clusters, comprising 2300 candidates. 1643 candidates were actually assigned an evaluation during manual pruning. These were used for the test. We extracted the 1643 relevant items from our ranking list, then we incrementally computed precision and recall in terms of the items that had been manually checked by our human evaluators. The results are shown in figure 1. As an example of how this figure can be interpreted, taking into consideration the top 20% of the ranking list (along the X axis), an 80% precision (Y axis) means that 80% of the items encountered so far had been removed in manual pruning; a 27% recall (Y axis) means that 27% of the overall manually removed items have been encountered so far.</Paragraph>
      <Paragraph position="1"> The automatic pruning task was intentionally framed as a ranking problem, in order to leave open the issue of what pruning threshold would be optimal. This same approach was taken in the IR application in which the pruning procedure was embedded. Users are given the option to set their own pruning threshold (depending on whether they focus more on precision or recall), by setting a value specifying what precision they require. Pruning is performed on the top section of the ranking list that guarantees the required precision, according to the correlation between precision and amount of pruning shown in figure 1.</Paragraph>
      <Paragraph position="2"> A second test was designed to check whether there is a correlation between the levels of confidence of automatic and manual pruning. For this purpose we used the file that had been manually checked by both human evaiuators. We took into account the candidates that had been removed by at least one evaluator: the candidates that were removed by both evaluators were deemed to have a high level of confidence, while those removed by only one evaluator were deemed to have a lower level of confidence. Then we checked whether the two classes were equally distributed in the automatic pruning ranking list, or whether higher confidence candidates tended to be ranked higher than lower confidence ones. The results are shown in figure 2, where the automatic pruning recall for each class is shown. For any given portion of the ranking list higher confidence candidates (solid lines) have a significantly higher recall than lower confidence candidates (dot null ted line).</Paragraph>
      <Paragraph position="3"> Finally, table 3 shows the result of applying the described optimization techniques alone, i.e. without any prior pruning, with respect to the ASRS corpus. The table shows how many synsets and how many word-senses are contained in the full Wordnet database and in its optimized version. Note that such reduction does not involve any loss of accuracy.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML