File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0604_concl.xml

Size: 1,137 bytes

Last Modified: 2025-10-06 13:53:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0604">
  <Title>Towards a Framework for Learning Structured Shape Models from Text-Annotated Images</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> We have outlined a framework for the creation of associated visual and linguistic structured models, from images annotated with textual captions. Thus far, we have focused on the important open problem of dealing with oversegmentation in images. We have developed a set of extensions to a probabilistic translation model (Brown et al., 1993) that enable us to successfully merge oversegmented regions into coherent objects. Our initial experiments on synthetic data demonstrate that our algorithm can learn a useful translation model between image objects and words, even in the presence of substantial oversegmentation. We are currently experimenting with various parameters in our synthetic scene generator to guide further development of the algorithm, as well as experimenting on real data from the Web.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML