File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1208_intro.xml
Size: 4,539 bytes
Last Modified: 2025-10-06 14:03:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1208"> <Title>Interpretation of Compound Nominalisations using Corpus and Web Statistics</Title> <Section position="4" start_page="54" end_page="54" type="intro"> <SectionTitle> 2 Background </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="54" end_page="54" type="sub_section"> <SectionTitle> 2.1 Compound Noun Interpretation </SectionTitle> <Paragraph position="0"> Compound nouns were seminally and thoroughly analysed by Levi (1978), who hand constructs a nine way set of semantic relations that she identies as broadly de ning the observed relationships between the compound head and modi er. Warren (1978) also inspects the syntax of compound nouns, to create a somewhat different set of twelve conceptual categories.</Paragraph> <Paragraph position="1"> Early attempts to automatically classify compound nouns have taken a semantic approach: Finin (1980) and Isabelle (1984) use role nominals derived from the head of the compound to ll a slot with the modi er. Vanderwende (1994) uses a rule based technique that scores a compound on possible semantic interpretations, while Jones (1995) implements a graph based uni cation procedure over semantic feature structures for the head. Finally, Rosario and Hearst (2001) make use of a domain speci c lexical resource to classify according to neural networks and decision trees.</Paragraph> <Paragraph position="2"> Syntactic classi cation, using paraphrasing, was rst used by Leonard (1984), who uses a prioritised rule based approach across a number of possible readings. Lauer (1995) employs a corpus statistical model over a similar paraphrase set based on prepositions. Lapata (2002) and Grover et al. (2005) again use a corpus statistical paraphrase based approach, but with verb argument relations for compound nominalisations attempting to de ne the relation as one of subject, direct object, or a number of prepositional objects in the latter.</Paragraph> </Section> <Section position="2" start_page="54" end_page="54" type="sub_section"> <SectionTitle> 2.2 Web as Corpus Approaches </SectionTitle> <Paragraph position="0"> Using the World Wide Web for corpus statistics is a relatively recent phenomenon; we present a few notable examples. Grefenstette (1998) analyses the plausibility of candidate translations in a machine translation task through Web statistics, and avoids some data sparseness within that context. Zhu and Rosenfeld (2001) train a language model from a large corpus, and use the Web to estimate low density trigram frequencies. Keller and Lapata (2003) show that Web counts can obviate data sparseness for syntactic predicate argument bigrams. They also observe that the noisiness of the Web, while unexplored in detail, does not greatly reduce the reliability of their results. Nakov and Hearst (2005) demonstrate that Web counts can aid in identifying the bracketing in higher arity noun compounds. Finally, Lapata and Keller (2005) evaluate the performance of Web counts on a wide range of natural language processing tasks, including compound noun bracketing and compound noun interpretation.</Paragraph> </Section> <Section position="3" start_page="54" end_page="54" type="sub_section"> <SectionTitle> 2.3 Con dence Intervals </SectionTitle> <Paragraph position="0"> Maximum likelihood statistics are not robust when many sparse vectors are under consideration, i.e.</Paragraph> <Paragraph position="1"> naively choosing the largest number may not be accurate in contexts when the relative value across samplings may be relevant, for example, in machine learning. As such, we apply a statistical test with con dence intervals (Kenney and Keeping, 1962), where we compare sample z-scores in a pairwise manner, instead of frequencies globally.</Paragraph> <Paragraph position="2"> The con dence interval P, for z-score n, is:</Paragraph> <Paragraph position="4"> t is chosen to normalise the curve, and P is strictly increasing on n, so we are only required to nd the largest z-score.</Paragraph> <Paragraph position="5"> Calculating the z-score exactly can be quite costly, so we instead use the binomial approximation to the normal distribution with equal prior probabilities and nd that a given z-score Z is:</Paragraph> <Paragraph position="7"> where f is the frequency count, u is the mean in a pairwise test, and s is the standard deviation of the test. A more complete derivation appears in Nicholson (2005).</Paragraph> </Section> </Section> class="xml-element"></Paper>