File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0601_metho.xml

Size: 8,507 bytes

Last Modified: 2025-10-06 14:10:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0601">
  <Title>Challenges for annotating images for sense disambiguation</Title>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Data set
</SectionTitle>
    <Paragraph position="0"> The data set has images retrieved from a web search engine. We deliberately focused on three keywords, which cover a range of phenomena in semantic ambiguity: BASS, CRANE, and SQUASH.</Paragraph>
    <Paragraph position="1"> Table 1 gives an overview of the data set, annotated by one author (CA).1 The webpage was not considered to avoid bias, given the ISD task.</Paragraph>
    <Paragraph position="2"> For each query, 2 to 4 core word senses were distinguished from inspecting the data using common sense. We chose this approach rather than ontology senses which tend to be incomplete or too specific for our purposes. For example, the origami sense of CRANE is not included in Word-Net under CRANE, but for BASS three different senses appear with fish. WordNet contains bird as part of the description for the separate entry origami, and some query expansion terms are hyponyms which occur as separate WordNet entries (e.g. bass guitar, sea bass, summer squash). Images may show multiple objects; a general strategy preferred a core sense if it was included.</Paragraph>
    <Paragraph position="3"> An additional complication is that given that the images are retrieved by a search engine there is no guarantee that they depict the query term, so additional senses were introduced. Thus, for most  core senses, a RELATED label was included for meanings related to the semantic field of a core sense. Also, a PEOPLE label was included since such images may occur due to how people take pictures (e.g. portraits of persons, group pictures, or other representations of people outside core and related senses). An UNRELATED label accounted for images that did not fit other labels, or were irrelevant or undeterminable. In fact, distinguishing between PEOPLE and UNRELATED was not always straightforward. Fig. 1 shows examples of CRANE when sense assignment was quite straightforward. However, distinguishing image senses was often not this clear. In fact, many border-line cases occurred when one could argue for different label assignments. Also, annotation cues are sub-ject to interpretation, and disagreements between judges are expected. They simply reflect that image senses are located on a semantic continuum.</Paragraph>
    <Paragraph position="4"> 4 Why annotating image senses is hard In general, annotating images involves special challenges, such as what to annotate and how extensively. We assign an image one sense. Nevertheless, compared to disambiguating a word, several issues are added for annotation. As noted above, a core sense may not occur, and judgements are characterized by increased subjectivity, with semantics beyond prototypical and peripheral  exemplars. Also, the disambiguating context is limited to image contents, rather than collocations of an ambiguous token. Fig. 2 illustrates selected challenging judgement calls for assigning or not the bird sense of CRANE, as discussed below.</Paragraph>
    <Paragraph position="5"> Depiction: Images may include man-made depictions of an object in artistic depictions, and the question is whether this counts as the object or not, e.g. Fig. 2(a-c). Gradient changes: Recognition is complicated by objects taking different forms and shapes, cf. the insight by (Labov, 1973) on gradual categories.2 For example, as seen in Fig. 2(d-f), birds change with age; an egg may be a bird, but a chick is, as is a fledgeling. Partial display: Objects may be rendered in incomplete condition. For example, Fig. 2(g-h) show merely feathers or a bird neck. Domain knowledge: People may disagree due to differences in domain knowledge, e.g. some non-experts may have a difficult time determining whether or not other similar bird species can be distinguished from a bird crane, cf. Fig. 2(i-j). This also affected annotations' granularity depending on keyword, see Table 1's example cues. Unusual appearance: Objects may occur in less frequent visual appearance, or lack distinguishing properties. For instance, Fig. 2(k) illustrates how sunset background masks birds' color information. Scale: The distance to objects may render them unclear and influence judgement accuracy, and people may differ in the degree of certainty required for assigning a sense. For example, Fig. 2(l-n) show flying or standing potential cranes at distance. Animate: Fig. 2(o-q) raise the question whether dead, skeletal, or artificial objects are instantiations or not. Other factors complicating the annotation task include image crowdedness disguising objects, certain entities having less salience, and lacking or unclear reference to object proportions. Senses 2Function or properties may also influence (Labov, 1973). may also be etymologically related or blend occasionally, or be guided by cultural interpretations, and so on.</Paragraph>
    <Paragraph position="6"> Moreover, related senses are meant to capture images associated with the semantic field of a core sense. However, because the notion and borders of a semantic field are non-specific, related senses are tricky. Annotators may build associations quite wildly, based on personal experience and opinion, thus what is or is not a related sense may very quickly get out of hand. For instance, a per-son may by association reason that if bird cranes occur frequently in fields, then an image of a field alone should be marked as related. To avoid this, guidelines attempted to restrict related senses, as exemplified in Table 1, with some data-driven revisions during the annotation process. However, guidelines are also based on judgement calls. Besides, for abstract concepts like LOVE, differentiating core versus related sense is not really valid. Lastly, an additional complexity of image senses is that in addition to traditional word senses, images may also capture repeatedly occurring iconographic patterns or senses. As illustrated in Fig. 3, the iconography of flying cranes is quite different from that of standing cranes, as regards motion, shape, identity, and color of figure and ground, respectively. Mixed cases also occur, e.g. when bird cranes are taking off or are about to land in relation to flight. Iconographic senses may compare to more complex linguistic structures than nominal categories, e.g. a modified NP or clause, but are represented by image properties.</Paragraph>
    <Paragraph position="7"> A policy for annotating iconographic senses is still lacking. Image groups based on iconographic senses seem to provide increased visual and semantic harmony for the eye, but experiments are needed to confirm how iconographic senses correspond to humans' perception of semantic image similarity, and at what level of semantic differen- null other steel structure/elevator? (d) crane or other machine? (e) company is related or not? (f) bird or abstract art? (g) crane in background or not? (h) origami-related paper? (i) inside of crane? (and is inside sufficient to denote image as machine crane?) tiation they become relevant for sense assessment.</Paragraph>
    <Paragraph position="8"> Lastly, considering the challenges of image annotation, it is interesting to look at annotation disagreements. Thus, another author (NL) inspected CRANE annotations, and recorded disagreement candidates, which amounted to 5%. Rejecting or accepting a category label seems less hard than independent annotation but still can give insights into disagreement tendencies. Several disagreements involved a core category vs. its related label vs. unrelated, rather than two core senses. Also, some disagreement candidates had tiny, fuzzy, partial or peripheral potential sense objects, or lacked distinguishing object features, so interpretation became quite idiosyncratic. The disagreement candidates were discussed together, resulting in 2% being true disagreements, 2% false disagreements (resolved by consensus on CA's labels), and 1% annotation mistakes. Examples of true disagreements are in Fig. 4. Often, both parties could see each others' points, but opted for another interpretation; this confirms that border lines tend to merge, indicating that consistency is challenging and not always guaranteed. As the annotation procedure advances, criteria may evolve and modify the fuzzy sense boundaries.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML