File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-3017_metho.xml
Size: 11,777 bytes
Last Modified: 2025-10-06 14:09:06
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-3017"> <Title>Finding Anchor Verbs for Biomedical IE Using Predicate-Argument Structures</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Background </SectionTitle> <Paragraph position="0"> There are some works on acquiring extraction rules automatically. Sudo et al. (2003) acquired subtrees derived from dependency trees as extraction rules forIEingeneraldomains. Oneproblemoftheirsystem is that dependency trees cannot treat non-local dependencies, and thus rules acquired from the constructions are partial. Hatzivassiloglou and Weng (2002) used frequency of collocation of verbs and topical nouns and verb occurrence rates in several domains to obtain anchor verbs for biological interaction. They used only POSs and word positions to detect relations between verbs and topical nouns.</Paragraph> <Paragraph position="1"> Their performance was 87.5% precision and 82.4% recall. One of the reasons of errors they reported is failures to detect verb-noun relations.</Paragraph> <Paragraph position="2"> To avoid these problems, we decided to use PASs obtained by full parsing to get precise relations between verbs and their arguments. The obtained precise relations will improve precision. In addition, PASs obtained by full parsing can treat non-local dependencies, thus recall will also be improved.</Paragraph> <Paragraph position="3"> The sentence below is an example which supports advantage of full parsing. A gerund activating takes a non-local semantic subject IL-4 . In full parsing based on Head-Driven Phrase Structure Grammar (HPSG) (Sag and Wasow, 1999), the sub-ject of the whole sentence and the semantic subject of activating are shared, and thus we can extract the subject of activating .</Paragraph> <Paragraph position="4"> IL-4 may mediate its biological effects by activating a tyrosine-phosphorylated DNA binding protein. null 3 Anchor Verb Finding by PASs By using PASs, we extract candidates for anchor verbs from a sentence in the following steps: 1. Obtain all PASs of a sentence by a full parser. ThePASscorrespondnotonlytoverbal phrases but also other phrases such as prepositional phrases.</Paragraph> <Paragraph position="5"> 2. Select PASs which take one or more topical nouns as arguments.</Paragraph> <Paragraph position="6"> 3. From the selected PASs in Step 2, select PASs which include one or more verbs.</Paragraph> <Paragraph position="7"> 4. Extractacoreverb, whichistheinnermostver null bal predicate, from each of the chosen PASs.</Paragraph> <Paragraph position="8"> In Step 1, we use a probabilistic HPSG parser developed by Miyao et al. (2003), (2004). PASs obtained by the parser are illustrated in Figure 1.1 Boldwords are predicates. Arguments of the predicates are described in ARGn (n = 1,2,...). MOD-IFY denotes the modified PAS. Numbers in squares denote shared structures. Examples of core verbs are illustrated in Figure 2. We regard all arguments in a PAS are arguments of the core verb.</Paragraph> <Paragraph position="9"> Extraction of candidates for anchor verbs from the sentence in Figure 1 is as follows. Here, regions and molecules are topical nouns.</Paragraph> <Paragraph position="10"> In Step 1, we obtain all the PASs, (a), (b) and (c), in Figure 1.</Paragraph> <Paragraph position="11"> 1Here, named entities are regarded as chunked, and thus internal structures of noun phrases are not illustrated. Next, in Step 2, we check each argument of (a), (b) and (c). (a) is discarded because it does not have a topical noun argument.2 (b) is selected because ARG1 regions is a topical noun. Similarly, (c) is selected because of ARG1 molecules .</Paragraph> <Paragraph position="12"> And then, in Step 3, we check each POS of a predicate included in (b) and (c). (b) is selected because it has the verb interacts in 1 which shares the structure with (a). (c) is discarded because it includes no verbs.</Paragraph> <Paragraph position="13"> Finally, inStep 4, weextracta coreverbfrom (b).</Paragraph> <Paragraph position="14"> (b) includes 1 asMODIFY, and the predicate of 1 is the verb, interacts . So we extract it.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Experiments </SectionTitle> <Paragraph position="0"> We investigated the verbs and their arguments extracted by PAS method and POS pattern matching, whichislessexpressiveinanalyzingsentencestructures but would be more robust. For topical nouns and POSs, we used the GENIA corpus (Kim et al., 2003), a corpus of annotated abstracts taken from National Library of Medicine's MEDLINE database. We defined topical nouns as the names tagged as protein, peptide, amino acid, DNA, RNA, or nucleic acid. We chose PASs which take one or more topical nouns as an argument or arguments, and substrings matched by POS patterns which include topical nouns. All names tagged in the corpus were replaced by their head nouns in order to reduce complexity of sentences and thus reduce the task of the parser and the POS pattern matcher.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Implementation of PAS method </SectionTitle> <Paragraph position="0"> We implemented PAS method on LiLFeS, a unification-based programming system for typed feature structures (Makino et al., 1998; Miyao et al., 2000).</Paragraph> <Paragraph position="1"> The selection in Step 2 described in Section 3 is realized by matching PASs with nine PAS templates. Four of the templates are illustrated in Figure 3.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 POS Pattern Method </SectionTitle> <Paragraph position="0"> We constructed a POS pattern matcher with a partial verb chunking function according to (Hatzivassiloglou and Weng, 2002). Because the original matcher has problems in recall (its verb group detector has low coverage) and precision (it does not consider other words to detect relations between verb groups and topical nouns), we implemented 2(a) may be selected if the anaphora ( it ) is resolved. But we regard anaphora resolving is too hard task as a subprocess of finding anchor verbs.</Paragraph> <Paragraph position="2"> N: is a topical noun VG: is a verb group which is accepted by a finite state machinedescribedin(HatzivassiloglouandWeng, 2002) or one of {VB, VBD, VBG, VBN, VBP, VBZ} o: is 0 4 tokens which do not include {FW, NN, NNS, NNP, NNPS, PRP, VBG, WP, *} (Parts in Bold letters are added to the patterns of Hatzivassiloglou and Weng (2002).) our POS pattern matcher as a modified version of one in (Hatzivassiloglou and Weng, 2002).</Paragraph> <Paragraph position="3"> Figure 4 shows patterns in our experiment. The last verb of VG is extracted if all of Ns are topical nouns. Non-topical nouns are disregarded. Adding candidates for verb groups raises recall of obtained relations of verbs and their arguments. Restriction on intervening tokens to non-nouns raises the precision, although it decreases the recall.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Experiment 1 </SectionTitle> <Paragraph position="0"> We extracted last verbs of POS patterns and core verbs of PASs with their arguments from 100 abstracts (976 sentences) of the GENIA corpus. We took up not the verbs only but tuples of the verbs and their arguments (VAs), in order to estimate effect of the arguments on semantical filtering.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Results </SectionTitle> <Paragraph position="0"> The numbers of VAs extracted from the 100 abstracts using POS patterns and PASs are shown in</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.4 Experiment 2 </SectionTitle> <Paragraph position="0"> For the first 10 abstracts (92 sentences), we manually investigated whether extracted VAs are syntactically or semantically correct. The investigation was based on two criteria: appropriateness based on whether the extracted verb can be used for an anchor verb and correctness based on whether the syntactical analysis is correct, i.e., whether the arguments were extracted correctly.</Paragraph> <Paragraph position="1"> Based on human judgment, the verbs that represent interactions, events, and properties were selected as semantically appropriate for anchor verbs, and the others were treated as inappropriate. For example, identified in We identified ZEBRA protein. is not appropriate and discarded.</Paragraph> <Paragraph position="2"> We did not consider non-topical noun arguments for POS pattern method, whereas we considered them for PAS method. Thus decision on correctness is stricter for PAS method.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Results </SectionTitle> <Paragraph position="0"> The manual investigation results on extracted VAs from the 10 abstracts using POS patterns and PASs are shown in Table 2 and 3 respectively.</Paragraph> <Paragraph position="1"> POS patterns extracted more (98) VAs than PASs (75), but many of the increment were from incorrect POS pattern matching. By POS patterns, 43 VAs (44%) were extracted based on incorrect analysis. On the other hand, by PASs, 20 VAs (27%) were extracted incorrectly. Thus the ratio of VAs extracted by syntactically correct analysis is larger on PAS method.</Paragraph> <Paragraph position="2"> POS pattern method extracted 38 VAs of verbs not extracted by PAS method and 7 of them are correct. For PAS method, correspondent numbers are tail) 11 and 4 respectively. Thus the increments tend to be caused by incorrect analysis, and the tendency is greater in POS pattern method.</Paragraph> <Paragraph position="3"> Since not all of verbs that take topical nouns are appropriate for anchor verbs, automatic filtering is required. In the filtering phase that we leave as a future work, we can use semantical classes and frequenciesofargumentsoftheverbs. Theresultswith syntacticallyincorrectargumentswillcauseadverse effect on filtering because they express incorrect relationship between verbs and arguments. Since the numbers of extracted VAs after excluding the ones with incorrect arguments are the same (55) between PAS and POS pattern methods, it can be concluded that the precision of PAS method is higher. Although there are few (7) correct VAs which were extracted by POS pattern method but not by PAS method, we expect the number of such verbs can be reduced using a larger corpus.</Paragraph> <Paragraph position="4"> Examples of appropriate VAs extracted by only one method are as follows: (A) is correct and (B) incorrect, extracted by only POS pattern method, and (C) is correct and (D) incorrect, extracted by only PAS method. Bold words are extracted verbs or predicates and italic words their extracted arguments. null (A) This delay is associated with down-regulation of many erythroid cell-specific genes, including alpha- and beta-globin, band 3, band 4.1, and .... (B) ... show that several elements in the ... region of the IL-2R alpha gene contribute to IL-1 responsiveness, ....</Paragraph> <Paragraph position="5"> (C) The CD4 coreceptor interacts with non-polymorphic regions of ... molecules on non-polymorphic cells and contributes to T cell activation.</Paragraph> <Paragraph position="6"> (D) Whereas activation of the HIV-1 enhancer following T-cell stimulation is mediated largely through binding of the ... factor NF-kappa B to two adjacent kappa B sites in ....</Paragraph> </Section> </Section> class="xml-element"></Paper>