XML Viewer - c04-1049

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1049_metho.xml
Size: 15,141 bytes
Last Modified: 2025-10-06 14:08:42
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1049">
  <Title>Talking Robots With LEGO MindStorms</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
LEGO Company.
</SectionTitle>
    <Paragraph position="0"> this accessible technology, it is possible to create basic but interesting talking robots in limited time (7 weeks). This is relevant not only for future research, but can also serve as a teaching device that has shown to be extremely motivating for the students. MindStorms are a staple in robotics education (Yu, 2003; Gerovich et al., 2003; Lund, 1999), but to our knowledge, they have never been used as part of a language technology curriculum.</Paragraph>
    <Paragraph position="1"> The paper is structured as follows. We first present the basic setup of the MindStorms system and the software architecture. Then we present the four talking robots built by our students in some detail. Finally, we discuss the most important challenges that had to be overcome in building them.</Paragraph>
    <Paragraph position="2"> We conclude by speculating on further work in Section 5.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Architecture
</SectionTitle>
    <Paragraph position="0"> Lego MindStorms robots are built around a programmable microcontroller, the RCX. This unit, which looks like an oversized yellow Lego brick, has three ports each to attach sensors and motors, an infrared sender/receiver for communication with the PC, and 32 KB memory to store the operating system, a programme, and data.</Paragraph>
    <Paragraph position="1">  Our architecture for talking robots (Fig. 1) consists of four main modules: a dialogue system, a speech client with speech recognition and synthesis capabilities, a module for infrared communication between the PC and the RCX, and the programme that runs on the RCX itself. Each student team had to specify a dialogue, a speech recognition grammar, and the messages exchanged between PC and RCX, as well as the RCX control programme. All other components were off-the-shelf systems that were combined into a larger system by us.</Paragraph>
    <Paragraph position="2"> The centrepiece of the setup is the dialogue system. We used the DiaWiz system by CLT  Sprachtechnologie GmbH2, a proprietary framework for defining finite-state dialogues (McTear, 2002). It has a graphical interface (Fig. 2) that allows the user to draw the dialogue states (shown as rectangles in the picture) and connect them via edges. The dialogue system connects to an arbitrary number of &amp;quot;clients&amp;quot; via sockets. It can send messages to and receive messages from clients in each dialogue state, and thus handles the entire dialogue management. While it was particularly convenient for us to use the CLT system, it could probably replaced without much effort by a VoiceXML-based dialogue manager.</Paragraph>
    <Paragraph position="3"> The client that interacts most directly with the user is a module for speech recognition and synthesis. It parses spoken input by means of a recognition grammar written in the Java Speech Grammar Format, 3 and sends an extremely shallow semantic representation of the best recognition result to the dialogue manager as a feature structure. The output side can be configured to either use a speech synthesiser, or play back recorded WAV files. Our implementation assumes only that the recognition and synthesis engines are compliant with the Java Speech API 4.</Paragraph>
    <Paragraph position="4"> The IR communication module has the task of converting between high-level messages that the di- null and their low-level representations that are actually sent over the IR link, in such a way that the user need not think about the particular low-level details. The RCX programme itself is again implemented in Java, using the Lejos system (Bagnall, 2002). Such a programme is typically small (to fit into the memory of the microcontroller), and reacts concurrently to events such as changes in sensor values and messages received over the infrared link, mostly by controlling the motors and sending messages back to the PC.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Some Robots
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Playing Chess
</SectionTitle>
      <Paragraph position="0"> The first talking robot we present plays chess against the user (Fig. 3). It moves chess pieces on a board by means of a magnetic arm, which it can move up and down in order to grab and release a piece, and can place the arm under a certain position by driving back and forth on wheels, and to the right and left on a gear rod.</Paragraph>
      <Paragraph position="1"> The dialogue between the human player and the robot is centred around the chess game: The human speaks the move he wants to make, and the robot confirms the intended move, and announces check and checkmate. In order to perform the moves for the robot, the dialogue manager connects to a specialised client which encapsulates the GNU Chess system.5 In addition to computing the moves that the robot will perform, the chess programme is also used in disambiguating elliptical player inputs.</Paragraph>
      <Paragraph position="2"> Figure 4 shows the part of the chess dialogue model that accepts a move as a spoken command from the player. The Input node near the top waits for the speech recognition client to report that it</Paragraph>
      <Paragraph position="4"> JSGF format, whose production rules are annotated with tags (in curly brackets) representing a very shallow semantics. The tags for all production rules used in a parse tree are collected into a table.</Paragraph>
      <Paragraph position="5"> The dialogue manager then branches depending on the type of the command given by the user. If the command specified the piece and target square, e.g. &amp;quot;move the pawn to e4&amp;quot;, the recogniser will return a representation like {piece=&amp;quot;pawn&amp;quot; colTo=&amp;quot;e&amp;quot; rowTo=&amp;quot;4&amp;quot;}, and the dialogue will continue in the centre branch. The user can also specify the source and target square.</Paragraph>
      <Paragraph position="6"> If the player confirms that the move command was recognised correctly, the manager sends the move description to the chess client (the &amp;quot;send move&amp;quot; input nodes near the bottom), which can disambiguate the move description if necessary, e.g.</Paragraph>
      <Paragraph position="7"> by expanding moves of type &amp;quot;move the pawn to e4&amp;quot; to moves of type &amp;quot;move from e2 to e4&amp;quot;. Note that the reference &amp;quot;the pawn&amp;quot; may not be globally unique, but if there is only one possible referent that could perform the requested move, the chess client resolves this automatically.</Paragraph>
      <Paragraph position="8"> The client then sends a message to the RCX, which moves the piece using the robot arm. It updates its internal data structures, as well as the GNU Chess representations, computes a move for itself, and sends this move as another message to the RCX.</Paragraph>
      <Paragraph position="9"> While the dialogue system as it stands already offers some degree of flexibility with regard to move phrasings, there is still plenty of open room for improvements. One is to use even more context information, in order to understand commands like &amp;quot;take it with the rook&amp;quot;. Another is to incorporate recent work on improving recognition results in the chess domain by certain plausibility inferences (Gabsdil, 2004).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Playing a Shell Game
</SectionTitle>
      <Paragraph position="0"> Figure 6 introduces Luigi Legonelli. The robot represents a charismatic Italian shell-game player, and engages a human player in style: Luigi speaks German with a heavy Italian accent, lets the human player win the first round, and then tries to pull several tricks either to cheat or to keep the player interested in the game.</Paragraph>
      <Paragraph position="1">  Luigi's Italian accent was obtained by feeding transliterated German sentences to a speech synthesizer with an Italian voice. Although the resulting accent sounded authentic, listeners who were unfamiliar with the accent had trouble understanding it. For demonstration purposes we therefore decided to use recorded speech instead. To this end, the Italian student on the team lent his voice for the different sentences uttered by Luigi.</Paragraph>
      <Paragraph position="2"> The core of Luigi's dialogue model reflects the progress of game play in a shell game. At the start, Luigi and the player settle on a bet (between 1 and 10 euros), and Luigi shows under which shell the coin is. Then, Luigi manipulates the shells (see also below), moving them (and the coin) around the board, and finally asks the player under which shell the player believes the coin is. Upon the player's guess Luigi lifts the shell indicated by the player, and either loudly exclaims the unfairness of life (if he has lost) or kindly inquires after the player's visual capacities (in case the player has guessed wrong). At the end of the turn, Luigi asks the player whether he wants to play again. If the player would like to stop, Luigi tries to persuade the player to stay; only if the player is persistent, Luigi will end the game and beat a hasty retreat.</Paragraph>
      <Paragraph position="3">  (1) rob &amp;quot;Ciao, my name is Luigi Legonelli.</Paragraph>
      <Paragraph position="4"> Do you feel like a little game?&amp;quot; usr &amp;quot;Yes ... &amp;quot; rob &amp;quot;The rules are easy. I move da cuppa, you know, cuppa? You look, say where coin is. How much money you bet?&amp;quot; usr &amp;quot;10 Euros.&amp;quot; rob (Luigi moves the cups/shells) rob &amp;quot;So, where is the coin? What do you think, where's the coin?&amp;quot; usr &amp;quot;Cup 1&amp;quot; rob &amp;quot;Mamma mia! You have won! Who told you, where is coin?! Another game? Another game!&amp;quot; usr &amp;quot;No.&amp;quot; rob &amp;quot;Come! Play another game!&amp;quot; usr &amp;quot;No.&amp;quot; rob &amp;quot;Okay, ciao signorina! Police, much police! Bye bye!&amp;quot;  The shells used in the game are small cups with a metal top (a nail), which enables Luigi to pick them up using a &amp;quot;hand&amp;quot; constructed around a magnet. The magnet has a downward oriented, U-shaped construction that enables Luigi to pick up two cups at the same time. Cups then get moved around the board by rotating the magnet. By magnetizing the nail at the top of the cup, not only the cup but also the coin (touched by the tip of the nail) can be moved. When asked to show whether the coin is under a particular shell, one of Luigi's tricks is to keep the nail magnetized when lifting a cup - thus also lifting the coin, giving off the impression that there was no coin under the shell.</Paragraph>
      <Paragraph position="5"> The Italian accent, the android shape of the robot, and the 'authentic' behavior of Luigi all contributed to players genuinely getting engaged in the game. After the first turn, having won, most players acknowledged that this is an amusing Lego construction; when they were tricked at the end of the second turn, they expressed disbelief; and when we showed them that Luigi had deliberately cheated them, astonishment. At that point, Luigi had ceased to be simply an amusing Lego construction and had achieved its goal as an entertainment robot that can immerse people into its game.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Exploring a pyramid
</SectionTitle>
      <Paragraph position="0"> The robot in Figure 7, dubbed &amp;quot;Indy&amp;quot;, is inspired by the various robots that have been used to explore the Great Pyramids in Egypt (e.g. Pyramid Rover6, UPUAUT7). It has a digital videocamera (webcam) and a lamp mounted on it, and continually transmits images from inside the pyramid. The user, watching the images of the videocamera on a computer screen, can control the robot's movements and the angle of the camera by voice.</Paragraph>
      <Paragraph position="1">  Human-robot interaction is crucial to the exploration task, as neither user nor robot has a complete picture of the environment. The robot is aware of the environment through an (all-round) array of touch-sensors, enabling it to detect e.g. openings in walls; the user receives a more detailed picture, but  only of the environment straight ahead of the robot (due to the frontal orientation of the camera).</Paragraph>
      <Paragraph position="2"> The dialogue model for Indy defines the possible interaction that enables Indy and the user to jointly explore the environment. The user can initiate a dialogue to control the camera and its orientation (by letting the robot turn on the spot, in a particular direction), or to instruct the robot to make particular movements (i.e. turn left or right, stop).</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Traversing a labyrinth
</SectionTitle>
      <Paragraph position="0"> A variation on the theme of human-robot interaction in navigation is the robot in Figure 8. Here, the user needs to guide a robot through a labyrinth, specified by thick black lines on a white background. The task that the robot and the human must solve collaboratively is to pick up objects randomly strewn about the maze. The robot is able to follow the black  Both the user and the robot can take the initiative in the dialogue. The robot, capable of spotting crossings (and the possibilities to go straight, left and/or right), can initiate a dialogue asking for directions if the user had not instructed the robot beforehand; see Example 2.</Paragraph>
      <Paragraph position="1"> (2) rob (The robot arrives at a crossing; it recognizes the possibility to go either straight or left; there are no current instructions) null rob &amp;quot;I can go left or straight ahead; which way should I go?&amp;quot; usr &amp;quot;Please go right.&amp;quot; rob &amp;quot;I cannot go right here.</Paragraph>
      <Paragraph position="2"> usr &amp;quot;Please go straight.&amp;quot; rob &amp;quot;Okay.&amp;quot; The user can give the robot two different types of directions: in-situ directions (as illustrated in Example 2) or deictic directions (see Example 3 below). This differentiates the labyrinth robot from the pyramid robot described in SS3.3, as the latter could only handle in-situ directions.</Paragraph>
      <Paragraph position="3"> (3) usr &amp;quot;Please turn left at the next crossing.&amp;quot; rob &amp;quot;Okay&amp;quot; rob (The robot arrives at a crossing; it recognizes the possibility to go either straight or left; it was told to go left at the next crossing) rob (The robot recognizes it can go left and does so, as instructed)</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML