File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-1061_metho.xml
Size: 6,041 bytes
Last Modified: 2025-10-06 14:09:50
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1061"> <Title>Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE</Title> <Section position="3" start_page="491" end_page="492" type="metho"> <SectionTitle> 2 Previous work </SectionTitle> <Paragraph position="0"> A representative approach to relation extraction is the system of Zelenko et al. (2003), which attempts to identify binary relations in news text. In that system, each pair of entity mentions of the correct types in a sentence is classified as to whether it is a positive instance of the relation. Consider the binary relation employee of and the sentence &quot;John Smith, not Jane Smith, works at IBM&quot;. The pair (John Smith, IBM) is a positive instance, while the pair (Jane Smith, IBM) is a negative instance. Instances are represented by a pair of entities and their position in a shallow parse tree for the containing sentence. Classification is done by a support-vector classifier with a specialized kernel for that shallow parse representation.</Paragraph> <Paragraph position="1"> This approach -- enumerating all possible entity pairs and classifying each as positive or negative -- is the standard method in relation extraction. The main differences among systems are the choice of trainable classifier and the representation for instances. null For binary relations, this approach is quite tractable: if the relation schema is (t1, t2), the number of potential instances is O(jt1jjt2j), where jtj is the number of entity mentions of type t in the text under consideration.</Paragraph> <Paragraph position="2"> One interesting system that does not belong to the above class is that of Miller et al. (2000), who take the view that relation extraction is just a form of probabilistic parsing where parse trees are augmented to identify all relations. Once this augmentation is made, any standard parser can be trained and then run on new sentences to extract new relations. Miller et al. show such an approach can yield good results. However, it can be argued that this method will encounter problems when considering anything but binary relations. Complex relations would require a large amount of tree augmentation and most likely result in extremely sparse probability estimates. Furthermore, by integrating relation extraction with parsing, the system cannot consider long-range dependencies due to the local parsing constraints of current probabilistic parsers.</Paragraph> <Paragraph position="3"> The higher the arity of a relation, the more likely it is that entities will be spread out within a piece of text, making long range dependencies especially important.</Paragraph> <Paragraph position="4"> Roth and Yih (2004) present a model in which entity types and relations are classified jointly using a set of global constraints over locally trained classifiers. This joint classification is shown to improve accuracy of both the entities and relations returned by the system. However, the system is based on constraints for binary relations only.</Paragraph> <Paragraph position="5"> Recently, there has also been many results from the biomedical IE community. Rosario and Hearst (2004) compare both generative and discriminative models for extracting seven relationships between treatments and diseases. Though their models are very flexible, they assume at most one relation per sentence, ruling out cases where entities participate in multiple relations, which is a common occurrence in our data. McDonald et al. (2004a) use a rule-based parser combined with a rule-based relation identifier to extract generic binary relations between biological entities. As in predicate-argument extraction (Gildea and Jurafsky, 2002), each relation is always associated with a verb in the sentence that specifies the relation type. Though this system is very general, it is limited by the fact that the design ignores relations not expressed by a verb, as the employee of relation in&quot;John Smith, CEO of Inc. Corp., announced he will resign&quot;.</Paragraph> <Paragraph position="6"> Most relation extraction systems work primarily on a sentential level and never consider relations that cross sentences or paragraphs. Since current data sets typically only annotate intra-sentence relations, this has not yet proven to be a problem.</Paragraph> </Section> <Section position="4" start_page="492" end_page="492" type="metho"> <SectionTitle> 3 Definitions </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="492" end_page="492" type="sub_section"> <SectionTitle> 3.1 Complex Relations </SectionTitle> <Paragraph position="0"> Recall that a complex n-ary relation is specified by a schema (t1, . . . , tn) where ti 2 T are entity types.</Paragraph> <Paragraph position="1"> Instances of the relation are tuples (e1, . . . , en) where either type(ei) = ti, or ei =? (missing argument). The only restriction this definition places on a relation is that the arity must be known. As we discuss it further in Section 6, this is not required by our methods but is assumed here for simplicity. We also assume that the system works on a single relation type at a time, although the methods described here are easily generalizable to systems that can extract many relations at once.</Paragraph> </Section> <Section position="2" start_page="492" end_page="492" type="sub_section"> <SectionTitle> 3.2 Graphs and Cliques </SectionTitle> <Paragraph position="0"> An undirected graph G = (V, E) is specified by a set of vertices V and a set of edges E, with each edge an unordered pair (u, v) of vertices. Gprime = (V prime, Eprime) is a subgraph of G if V prime V and Eprime = f(u, v) : u, v 2 V prime,(u, v) 2 Eg. A clique C of G is a subgraph of G in which there is an edge between every pair of vertices. A maximal clique of G is a clique C = (VC, EC) such that there is no other clique Cprime = (VCprime, ECprime) such that VC VCprime.</Paragraph> </Section> </Section> class="xml-element"></Paper>