File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0601_intro.xml

Size: 4,314 bytes

Last Modified: 2025-10-06 14:03:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0601">
  <Title>Challenges for annotating images for sense disambiguation</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Related work
</SectionTitle>
    <Paragraph position="0"> The complex relationship between annotations and images has been explored by the library community, who study management practices for image collections, and by the computer vision community, who would like to provide automated image retrieval tools and possibly learn object recognition methods.</Paragraph>
    <Paragraph position="1"> Commercial picture collections are typically annotated by hand, e.g. (Enser, 1993; Armitage and Enser, 1997; Enser, 2000). Subtle phenomena can make this very difficult, and content vs. interpretation may differ; an image of the Eiffel tower could be annotated with Paris or even love, e.g. (Armitage and Enser, 1997), and the resulting annotations are hard to use, cf. (Markkula and Sormunen, 2000), or Enser's result that a specialized indexing language gives only a &amp;quot;blunt pointer to regions of the Hulton collections&amp;quot;, (Enser, 1993), p. 35.</Paragraph>
    <Paragraph position="2"> Users of image collections have been well studied. Important points for our purposes are: Users request images both by object kinds, and individual identities; users request images both by what they depict and by what they are about; and that text associated with images is extremely useful in practice, newspaper archivists indexing largely on captions (Markkula and Sormunen, 2000).</Paragraph>
    <Paragraph position="3"> The computer vision community has studied methods to predict annotations from images, e.g. (Barnard et al., 2003; Jeon et al., 2003; Blei and Jordan, 2002). The annotations that are predicted most successfully tend to deal with materials whose identity can be determined without shape analysis, like sky, sea and the like. More complex annotations remain difficult. There is no current theory of word sense in this context, because in most current collections, words appear in the most common sense only. Sense is known to be important, and image information can disambiguate word senses (Barnard and Johnson, 2005).</Paragraph>
    <Paragraph position="4">  1. fish 35% any fish, people holding catch 2. musical instrument 28% any bass-looking instrument, playing 3. related: fish 10% fishing (gear, boats, farms), rel. food, rel. charts/maps 4. related: musical instrument 8% speakers, accessories, works, chords, rel. music 5. unrelated 12% miscellaneous (above senses not applicable) 6. people 7% faces, crowds (above senses not applicable) CRANE (2650) 5: crane, construction cranes, whooping crane, sandhill crane, origami cranes 1. machine 21% machine crane, incl. panoramas 2. bird 26% crane bird or chick 3. origami 4% origami bird 4. related: machine 11% other machinery, construction, motor, steering, seat 5. related: bird 11% egg, other birds, wildlife, insects, hunting, rel. maps/charts 6. related: origami 1% origami shapes (stars, pigs), paper folding 7. people 7% faces, crowds (above senses not applicable) 8. unrelated 18% miscellaneous (above senses not applicable) 9. karate 1% martial arts SQUASH (1948) 10: squash+: rules, butternut, vegetable, grow, game of, spaghetti, winter, types of, summer 1. vegetable 24% squash vegetable 2. sport 13% people playing, court, equipment 3. related:vegetable 31% agriculture, food, plant, flower, insect, vegetables 4. related:sport 6% other sports, sports complex 5. people 10% faces, crowds (above senses not applicable) 6. unrelated 16% miscellaneous (above senses not applicable) Table 1: Overview of annotated images for three ambiguous query terms, inspired by the WSD literature. For each term, the number of annotated images, the expanded query retrieval terms (taken terms from askjeeves.com), the senses, their distribution coverage, and rough sample annotation guidelines are provided, with core senses marked in bold. (a) machine (b) bird (c) origami (d) karate (e) rel. to a (f) rel. to b (g) rel. to c (h) people (i) unrel.</Paragraph>
    <Paragraph position="5">  senses are associated with the semantic field of a core sense, but the core sense is visually absent or undeterminable.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML