File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0607_intro.xml

Size: 4,364 bytes

Last Modified: 2025-10-06 14:01:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0607">
  <Title>EBLA: A Perceptually Grounded Model of Language Acquisition</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> While traditional, top-down research fields such as natural language processing (NLP), computational linguistics, and speech recognition and synthesis have made great progress in allowing computers to process natural language, they typically do not address perceptual understanding. In these fields, meaning and context for a given word are based solely on other words and the logical relationships among them.</Paragraph>
    <Paragraph position="1"> To make this clearer, consider the following Webster's definition of apple: &amp;quot;The fleshy usually rounded and red or yellow edible pome fruit of a tree of the rose family.&amp;quot; (Webster's 1989) Using traditional approaches, a computer might be able to determine from such a definition that an apple is &amp;quot;edible,&amp;quot; that it is a &amp;quot;fruit,&amp;quot; and that it is usually &amp;quot;rounded and red or yellow.&amp;quot; But what does is mean to be &amp;quot;rounded and red&amp;quot;? People understand these words because their conceptual representations are grounded in their perceptual experiences. As for more abstract words, many have perceptual analogs or can be defined in terms of grounded words. Although it is unlikely that any two people share identical representations of a given word, there are generally enough similarities for that word to convey meaning. If computers can be enabled to ground language in perception, ultimately communication between man and machine may be facilitated.</Paragraph>
    <Paragraph position="2"> This paper details a new software framework, Experience-Based Language Acquisition (EBLA), that acquires a childlike language known as protolanguage in a bottom-up fashion based on visually perceived experiences. EBLA uses an integrated computer vision system to watch short videos and to generate internal representations of both the objects and the object-object relations in those videos. It then performs language acquisition by resolving these internal representations to the individual words in protolanguage descriptions of each video. Upon acquiring this grounded protolanguage, EBLA can perform basic scene analysis to generate simplistic descriptions of what it &amp;quot;sees.&amp;quot; EBLA operates in three primary stages: vision processing, entity extraction, and lexical resolution. In the vision processing stage, EBLA is presented with experiences in the form of short videos, each containing a simple event such as a hand picking up a ball. EBLA processes the individual frames in the videos to identify and store information about significant objects. In the entity extraction stage, EBLA aggregates the information from the video processing stage into internal representations called entities. Entities are defined for both the significant objects in each experience and for the relationships among those objects. Finally, in the lexical acquisition stage, EBLA attempts to acquire language for the entities extracted in the second stage using protolanguage descriptions of each event. It extracts the individual lexemes (words) from each description and then attempts to generate entity-lexeme mappings using an inference technique called cross-situational learning.</Paragraph>
    <Paragraph position="3"> EBLA is not primed with a base lexicon, so it faces the task of bootstrapping its lexicon from scratch.</Paragraph>
    <Paragraph position="4"> While, to date, EBLA has only been evaluated using short descriptions comprised of nouns and verbs, one of the primary goals of this research has been to develop an open system that can potentially learn any perceptually grounded lexeme using a unified approach. The entities recognized EBLA are generic in nature and are comprised of clusters of perceptual attributes linked in a database system. Although only twelve basic attributes have been programmed into the current system, both the EBLA software and database support the addition of other attributes. There are even mechanisms in the database to support dynamic loading/unloading of custom attribute calculators.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML