File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0603_intro.xml

Size: 2,115 bytes

Last Modified: 2025-10-06 14:03:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0603">
  <Title>Search Engine Statistics Beyond the n-gram: Application to Noun Compound Bracketing</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> An important but understudied language analysis problem is that of noun compound bracketing, which is generally viewed as a necessary step towards noun compound interpretation. Consider the following contrastive pair of noun compounds:  (1) liver cell antibody (2) liver cell line  In example (1) an antibody targets a liver cell, while (2) refers to a cell line which is derived from the liver. In order to make these semantic distinctions accurately, it can be useful to begin with the correct grouping of terms, since choosing a particular syntactic structure limits the options left for semantics. Although equivalent at the part of speech (POS) level, these two noun compounds have different syntactic trees. The distinction can be represented as a binary tree or, equivalently, as a binary bracketing: (1b) [ [ liver cell ] antibody ] (left bracketing) (2b) [ liver [cell line] ] (right bracketing) In this paper, we describe a highly accurate unsupervised method for making bracketing decisions for noun compounds (NCs). We improve on the current standard approach of using bigram estimates to compute adjacency and dependency scores by introducing the use of the kh2 measure for this problem. We also introduce a new set of surface features for querying Web search engines which prove highly effective. Finally, we experiment with paraphrases for improving prediction statistics. We have evaluated the application of combinations of these features to predict NC bracketing on two distinct collections, one consisting of terms drawn from encyclopedia text, and another drawn from bioscience text.</Paragraph>
    <Paragraph position="1"> The remainder of this paper describes related work, the word association models, the surface features, the paraphrase features and the results.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML