File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0313_metho.xml
Size: 16,182 bytes
Last Modified: 2025-10-06 14:14:38
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0313"> <Title>A Corpus-Based Approach for Building Semantic Lexicons</Title> <Section position="4" start_page="118" end_page="118" type="metho"> <SectionTitle> 3 Experimental Results </SectionTitle> <Paragraph position="0"> We performed experiments with five categories to evaluate the effectiveness and generality of our approach: energy, financial, military, vehicles, and weapons. The MUC-4 development corpus (1700 texts) was used as the text corpus (MUC-4 Proceedings, 1992). We chose these five categories because they represented relatively different semantic classes, they were prevalent in the MUC-4 corpus, and they seemed to be useful categories.</Paragraph> <Paragraph position="1"> For each category, we began with the seed word lists shown in Figure 1. We ran the bootstrapping algorithm for eight iterations, adding five new words to the seed word list after each cycle. After the final iteration, we had ranked lists of potential category words for each of the five categories. The top 45 words 3 from each ranked list are shown in Figure 2.</Paragraph> <Paragraph position="2"> While the ranked lists are far from perfect, one can see that there are many category members near the top of each list. It is also apparent that a few additional heuristics could be used to remove many of the extraneous words. For example, our number processor failed to remove numbers with commas (e.g., 2,000). And the military category contains several ordinal numbers (e.g., lOth 3rd 1st) that could be easily identified and removed. But the key question is whether the ranked list contains many true category members. Since this is a subjective question, we set up an experiment involving human judges.</Paragraph> <Paragraph position="3"> For each category, we selected the top 200 words from its ranked list and presented them to a user.</Paragraph> <Paragraph position="4"> We presented the words in random order so that the user had no idea how our system had ranked the words. This was done to minimize contextual effects (e.g., seeing five category members in a row might make someone more inclined to judge the next word as relevant). Each category was judged by two people independently. 4 The judges were asked to rate each word on a scale from 1 to 5 indicating how strongly it was associated with the category. Since category judgements can be highly subjective, we gave them guidelines to help establish uniform criteria. The instructions that were given to the judges are shown in Figure 3.</Paragraph> <Paragraph position="5"> We asked the judges to rate the words on a scale from 1 to 5 because different degrees of category membership might be acceptable for different applications. Some applications might require strict cat3Note that some of these words are not nouns, such as boardedand U.S.-made. Our parser tags unknown words as nouns, so sometimes unknown words are mistakenly selected for context windows.</Paragraph> </Section> <Section position="5" start_page="118" end_page="119" type="metho"> <SectionTitle> 4 The judges were members of our research group but </SectionTitle> <Paragraph position="0"> not the authors.</Paragraph> <Paragraph position="1"> Energy: Limon-Covenas a oligarchs spill staples poles Limon Barrancabermeja Covenas 200,000 barrels oil Bucaramanga pipeline prices electric pipelines towers Cano substation transmission rates pylons pole infrastructure transfer gas fuel sale lines companies power tower price gasoline industries insurance Arauca stretch inc industry forum nationalization supply electricity controls Financial: monetary fund nationalization attractive circulation suit gold branches manager bank advice invested banks bomb_explosion investment invest announcements content managers insurance dollar savings product employee accounts goods currency reserves amounts money shops farmers maintenance CRITERIA: On a scale of 0 to 5, rate each word's strength of association with the given category using the following criteria. We'll use the category ANIMAL as an example.</Paragraph> </Section> <Section position="6" start_page="119" end_page="119" type="metho"> <SectionTitle> 5: CORE MEMBER OF THE CATEGORY: </SectionTitle> <Paragraph position="0"> If a word is clearly a member of the category, then it deserves a 5. For example, dogs and sparrows are members of the ANIMAL category. null</Paragraph> </Section> <Section position="7" start_page="119" end_page="119" type="metho"> <SectionTitle> 4: SUBPART OF MEMBER OF THE CATEGORY: </SectionTitle> <Paragraph position="0"> If a word refers to a part of something that is a member of the category, then it deserves a 4. For example, feathers and tails are parts of ANIMALS.</Paragraph> </Section> <Section position="8" start_page="119" end_page="119" type="metho"> <SectionTitle> 3: STRONGLY ASSOCIATED WITH THE CATEGORY: </SectionTitle> <Paragraph position="0"> If a word refers to something that is strongly associated with members of the category, but is not actually a member of the category itself, then it deserves a 3. For example, zoos and nests are strongly associated with ANIMALS.</Paragraph> </Section> <Section position="9" start_page="119" end_page="119" type="metho"> <SectionTitle> 2: WEAKLY ASSOCIATED WITH THE CATEGORY: </SectionTitle> <Paragraph position="0"> If a word refers to something that can be associated with members of the category, but is also associated with many other types of things, then it deserves a 2. For example, bowls and parks are weakly associated with ANIMALS.</Paragraph> </Section> <Section position="10" start_page="119" end_page="119" type="metho"> <SectionTitle> 1: NO ASSOCIATION WITH THE CATEGORY: </SectionTitle> <Paragraph position="0"> If a word has virtually no association with the category, then it deserves a 1. For example, tables and moons have virtually no association with ANIMALS.</Paragraph> </Section> <Section position="11" start_page="119" end_page="121" type="metho"> <SectionTitle> 0: UNKNOWN WORD: </SectionTitle> <Paragraph position="0"> If you do not know what a word means, then it should be labeled with a 0.</Paragraph> <Paragraph position="1"> IMPORTANT! Many words have several distinct meanings. For example, the word &quot;horse&quot; can refer to an animal, a piece of gymnastics equipment, or it can mean to fool around (e.g., &quot;Don't horse around!&quot;). If a word has ANY meaning associated with the given category, then only consider that meaning when assigning numbers. For example, the egory membership, for example only words like gun, rifle, and bomb should be labeled as weapons. But from a practical perspective, subparts of category members might also be acceptable. For example, if a cartridge or trigger is mentioned in the context of an event, then one can infer that a gun was used. And for some applications, any word that is strongly associated with a category might be useful to include in the semantic lexicon. For example, words like ammunition or bullets are highly suggestive of a weapon. In the UMass/MUC-4 information extraction system (Lehnert et al., 1992), the words ammunition and bullets were defined as weapons, mainly for the purpose of selectional restrictions.</Paragraph> <Paragraph position="2"> The human judges estimated that it took them approximately 10-15 minutes, on average, to judge the 200 words for each category. Since the instructions allowed the users to assign a zero to a word if they did not know what it meant, we manually removed the zeros and assigned ratings that we thought were appropriate. We considered ignoring the zeros, but some of the categories would have been severely impacted. For example, many of the legitimate weapons (e.g., M-16 and AR-15) were not known to the judges. Fortunately, most of the unknown words were proper nouns with relatively unambiguous semantics, so we do not believe that this process compromised the integrity of the experiment.</Paragraph> <Paragraph position="3"> Finally, we graphed the results from the human judges. We counted the number of words judged as 5's by either judge, the number of words judged as 5's or 4's by either judge, the number of words judged as 5's, 4's, or 3's by either judge, and the number of words judged as either 5's, 4's, 3's, or 2's. We plotted the results after each 20 words, stepping down the ranked list, to see whether the words near the top of the list were more highly associated with the category than words farther down. We also wanted to see whether the number of category words leveled off or whether it continued to grow. The results from this experiment are shown in Figures 4-8. With the exception of the Energy category, we were able to find 25-45 words that were judged as 4's or 5's for each category. This was our strictest test because only true category members (or sub-parts of true category members) earned this rating. Although this might not seem like a lot of category words, 25-45 words is enough to produce a reasonable core semantic lexicon. For example, the words judged as 5's for each category are shown in Figure 9.</Paragraph> <Paragraph position="4"> Figure 9 illustrates an important benefit of the corpus-based approach. By sifting through a large text corpus, the algorithm can find many relevant category words that a user would probably not en- null ter in a semantic lexicon on their own. For example, suppose a user wanted to build a dictionary of Vehicle words. Most people would probably define words such as car, truck, plane, and automobile. But it is doubtful that most people would think of words like gunships, fighter, carrier, and ambulances. The corpus-based algorithm is especially good at identifying words that are common in the text corpus even though they might not be commonly used in general.</Paragraph> <Paragraph position="5"> As another example, specific types of weapons (e.g., M-16, AR-15, M-60, or M-79) might not even be known to most users, but they are abundant in the MUC-4 corpus.</Paragraph> <Paragraph position="6"> If we consider all the words rated as 3's, 4's, or 5's, then we were able to find about 50-65 words for every category except Energy. Many of these words would be useful in a semantic dictionary for the category. For example, some of the words rated as 3's for the Vehicle category include: flight, flights, aviation, pilot, airport, and highways.</Paragraph> <Paragraph position="7"> Most of the words rated as 2's are not specific to the target category, but some of them might be useful for certain tasks. For example, some words judged as 2's for the Energy category are: spill, pole, tower, and fields. These words may appear in many different contexts, but in texts about Energy topics these words are likely to be relevant and probably should be defined in the dictionary* Therefore we expect that a user would likely keep some of these words in the semantic lexicon but would probably be very selective.</Paragraph> <Paragraph position="8"> Finally, the graphs show that most of the acquisition curves displayed positive slopes even at the end of the 200 words. This implies that more category words would likely have been found if the users had reviewed more than 200 words. The one exception, again, was the Energy category, which we will discuss in the next section* The size of the ranked lists ranged from 442 for the financial category to 919 for the military category, so it would be interesting to know how many category members would have been found if we had given the entire lists to our judges*</Paragraph> </Section> <Section position="12" start_page="121" end_page="122" type="metho"> <SectionTitle> 4 Selecting Categories and Seed Words </SectionTitle> <Paragraph position="0"> When we first began this work, we were unsure about what types of categories would be amenable to this approach. So we experimented with a number of different categories* Fortunately, most of them worked fairly well, but some of them did not. We do not claim to understand exactly what types of categories will work well and which ones will not, but our early experiences did shed some light on the strengths and weaknesses of this approach.</Paragraph> <Paragraph position="1"> In addition to the previous five categories, we also experimented with categories for Location, Commercial, and Person. The Location category performed very well using seed words such as city, town, and province. We didn't formally evaluate this category because most of the category words were proper nouns and we did not expect that our judges would know what they were. But it is worth noting that this category achieved good results, presumably because location names often cluster together in appositives, conjunctions, and nominal compounds* For the Commercial category, we chose seed words such as store, shop, and market* Only a few new commercial words were identified, such as hotel and restaurant* In retrospect, we realized that there were probably few words in the MUC-4 corpus that referred to commercial establishments. (The MUC-4 corpus mainly contains reports of terrorist and military events*) The relatively poor performance of the Energy category was probably due to the same problem* If a category is not well-represented in * the corpus then it is doomed because inappropriate words become seed words in the early iterations and quickly derail the feedback loop.</Paragraph> <Paragraph position="2"> The Person category produced mixed results* Some good category words were found, such as rebel, advisers, criminal, and citizen* But many of the words referred to Organizations (e.g., FMLN), groups (e.g., forces), and actions (e.g., attacks).</Paragraph> <Paragraph position="3"> Some of these words seemed reasonable, but it was hard to draw a line between specific references to people and concepts like organizations and groups that may or may not consist entirely of people* The Energy: oil electric gas fuel power gasoline electricity petroleum energy CEL Financial: monetary fund gold bank invested banks investment invest dollar currency money economies loans billion debts millions IMF commerce wealth inflation million market funds dollars debt Military: infantry brigade regiment brigades division ranks deserter troops commander corporal GN Navy Bracamonte soldier units patrols cavalry detachment officer patrol garrisons army paratroopers Atonal garrison battalion unit militias lieutenant Vehicle: C-47 A-37 tank pickup Cessna aircraft Boeing_727 airplane plane truck airplanes gunships fighter carrier tanks planes La_Aurora helicopters helicopter automobile jeep car boats trucks motorcycles ambulances train buses ships cars bus ship vehicle vehicles Weapon: AK-47 M-16 carbines AR-15 TNT rifles 9-mm grenades machineguns dynamite revolvers rifle submachineguns M-60 pistols pistol M-79 grenade mortars gun mortar submachinegun cannon RPG-7 firearms guns bomb machinegun weapons car_bombs car_bomb artillery tanks arms Figure 9: Words judged as 5's for each category large proportion of action words also diluted the list. More experiments are needed to better understand whether this category is inherently difficult or whether a more carefully chosen set of seed words would improve performance.</Paragraph> <Paragraph position="4"> More experiments are also needed to evaluate different seed word lists. The algorithm is clearly sensitive to the initial seed words, but the degree of sensitivity is unknown. For the five categories reported in this paper, we arbitrarily chose a few words that were central members of the category. Our initial seed words worked well enough that we did not experiment with them very much. But we did perform a few experiments varying the number of seed words. In general, we found that additional seed words tend to improve performance, but the results were not substantially different using five seed words or using ten. Of course, there is also a law of diminishing returns: using a seed word list containing 60 category words is almost like creating a semantic lexicon for the category by hand!</Paragraph> </Section> class="xml-element"></Paper>