File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1017_intro.xml
Size: 2,657 bytes
Last Modified: 2025-10-06 14:01:36
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1017"> <Title>Exploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Syntactic structure helps us understand the semantic relationships between words. Given a text corpus, we can use knowledge about syntactic structures to obtain semantic knowledge. For example, Hearst (Hearst, 1992) learned hyponymy relationships by collecting words in lexico-syntactic expressions, such as &quot;NP, NP, and other NPs&quot;, and Roark and Charniak (Roark and Charniak, 1998) generated semantically related words by applying statistical measures to syntactic contexts involving appositives, lists, and conjunctions.</Paragraph> <Paragraph position="1"> Exploiting syntactic structures to learn semantic knowledge holds great promise, but can run into problems. First, lexico-syntactic expressions that explicitly indicate semantic relationships (e.g., &quot;NP, NP, and other NPs&quot;) are reliable but a lot of semantic information occurs outside these expressions. Second, general syntactic structures (e.g., lists and conjunctions) capture a wide range of semantic relationships. For example, conjunctions frequently join items of the same semantic class (e.g., &quot;cats and dogs&quot;), but they can also join different semantic classes (e.g., &quot;fire and ice&quot;). Some researchers (Roark and Charniak, 1998; Riloff and Shepherd, 1997) have applied statistical methods to identify the strongest semantic associations. This approach has produced reasonable results, but the accuracy of these techniques still leaves much room for improvement. null We adopt an intermediate approach that learns semantic lexicons using strong syntactic heuristics, which are both common and reliable. We have identified certain types of appositives, compound nouns, and identity (ISA) clauses that indicate specific semantic associations between words. We embed syntactic heuristics in a bootstrapping process and present empirical results demonstrating that this bootstrapping process produces high-quality semantic lexicons. In another set of experiments, we incorporate a co-training (Blum and Mitchell, 1998) mechanism to combine the hypotheses generated by different types of syntactic structures. Co-training produces a synergistic effect across different heuristics, substantially increasing the coverage of the lexicons while maintaining nearly the same level of accuracy. null</Paragraph> </Section> class="xml-element"></Paper>