XML Viewer - h93-1049

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/93/h93-1049_relat.xml
Size: 2,475 bytes
Last Modified: 2025-10-06 14:16:05
<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1049">
  <Title>HYPOTHESIZING UNTAGGED TEXT WORD ASSOCIATION FROM</Title>
  <Section position="4" start_page="0" end_page="248" type="relat">
    <SectionTitle>
2. RELATED WORK
</SectionTitle>
    <Paragraph position="0"> Some previous work (e.g., \[Weischedel, et al., 1990\]) found verb-argument associations from bracketed text, such as that in TREEBANK; however, this paper, and related work has hypothesized word associations from untagged text.</Paragraph>
    <Paragraph position="1"> \[Hindle 1990\] confirmed that word association ratios can be used for measuring similarity between nouns. For example, &amp;quot;ship&amp;quot;, &amp;quot;plane&amp;quot;, &amp;quot;bus&amp;quot;, etc., were automatically ranked as similar to &amp;quot;boat&amp;quot;. \[Resnik 1992\] reported a word association ratio for identifying noun classes from a pre-existing hierarchy as selectional constraints on the object of a verb.</Paragraph>
    <Paragraph position="2"> \[Brown et.al. 1992\] proves that, under the assumption of a bi-gram class model, the perplexity of a corpus is minimized when the average mutual information between word classes is maximized. Based on that fact, they cluster words via a greedy search algorithm which finds a local maximum in average mutual information.</Paragraph>
    <Paragraph position="3"> Our algorithm considers joint frequencies of pairs of word groups (as \[Brown et. al. 1992\] does) in contrast to joint frequencies of word pairs as in \[Church and Hanks, 1990\] and \[Hindle 1990\]. Here a word group means any subset of the whole set of words. For example, &amp;quot;ship,&amp;quot; &amp;quot;plane,&amp;quot; &amp;quot;boat&amp;quot; and &amp;quot;car&amp;quot; may be a word group. The algorithm will find pairs of such word groups. Another similarity to \[Brown et. al. 1992\]'s clustering algorithm is the use of greedy search for a pair of word groups that occur significantly frequently, using an evaluation function based on mutual information between classes.</Paragraph>
    <Paragraph position="4"> On the other hand, unlike \[Brown et. al. 1992\], we assume some automatic syntactic analysis of the corpus, namely part-of-speech analysis and at least finite-state approximations to syntactic dependencies. Moreover, the clustering is done depth first, not breadth first as \[Brown et.  al. 1992\], i.e., clusters are hypothesized one by one, not in parallel.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML