File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/p00-1014_intro.xml

Size: 3,185 bytes

Last Modified: 2025-10-06 14:00:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="P00-1014">
  <Title>An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Resources
</SectionTitle>
    <Paragraph position="0"> The input to our algorithm includes a collocation database and a corpus-based thesaurus, both available on the Internet 2. Below, we briefly describe these resources.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Collocation database
</SectionTitle>
      <Paragraph position="0"> Given a word w in a dependency relationship (such as subject or object ), the collocation database is used to retrieve the words that occurred in that relationship with w , in a large corpus, along with their frequencies (Lin, 1998a). Figure 1 shows excerpts of the entries in</Paragraph>
      <Paragraph position="2"> object: almond 1, a pple 25, bean 5, beam 1, binge 1, bread 13, cake 17, cheese 8, dish 14, disorder 20, egg 31, grape 12, grub 2, hay 3, junk 1, meat 70, poultry 3, rabbit 4, soup 5, sandwich 18, pasta 7, vegetable 35, ...</Paragraph>
      <Paragraph position="3"> subject: adult 3, animal 8, beetle 1, cat 3, child 41, decrease 1, dog 24, family 29, guest 7, kid 22, patient 7, refugee 2, rider 1, Russian 1, shark 2, something 19, We 239, wolf 5, ...</Paragraph>
      <Paragraph position="4"> salad : adj-modifier: assorted 1, crisp 4, fresh 13, good 3, grilled 5, leftover 3, mixed 4, olive 3, prepared 3, side 4, small 6, special 5, vegetable 3, ...</Paragraph>
      <Paragraph position="5"> object-of: add 3, consume 1, dress 1, grow 1, harvest 2, have 20, like 5, love 1, mix 1, pick 1, place 3, prepare 4, return 3, rinse 1, season 1, serve 8, sprinkle 1, taste 1, test 1, Toss 8, try 3, ...  given by (Lin, 1998b).</Paragraph>
      <Paragraph position="6"> W ORD S IMILAR W ORDS ( WITH SIMILARITY SCORE ) EAT cook 0.127, drink 0.108, consume 0.101, feed 0.094, taste 0.093, like 0.092, serve 0.089, bake 0.087, sleep 0.086, pick 0.085, fry 0.084, freeze 0.081, enjoy 0.079, smoke 0.078, harvest 0.076, love 0.076, chop 0.074, sprinkle 0.072, Toss 0.072, chew 0.072 SALAD soup 0.172, sandwich 0.169, sauce 0.152, pasta 0.149, dish 0.135, vegetable 0.135, cheese 0.132, dessert 0.13, entree 0.121, bread 0.116, meat 0.116, chicken 0.115, pizza 0.114, rice 0.112, seafood 0.11, dressing 0.109, cake 0.107, steak 0.105, noodle 0.105, bean 0.102 the collocation database for the words eat and salad . The database contains a total of 11 million unique dependency relationships.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Corpus-based thesaurus
</SectionTitle>
      <Paragraph position="0"> Using the collocation database, Lin (1998b) used an unsupervised method to construct a corpus-based thesaurus consisting of 11839 nouns, 3639 verbs and 5658 adjectives/adverbs. Given a word w, the thesaurus returns a set of similar words of w along with their similarity to w . For example, the 20 most similar words of eat and salad are shown in Table 1.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML