File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/a00-1015_metho.xml
Size: 15,910 bytes
Last Modified: 2025-10-06 14:07:02
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-1015"> <Title>J avox: A Toolkit for Building Speech-Enabled Applications</Title> <Section position="5" start_page="105" end_page="107" type="metho"> <SectionTitle> 3 Javox Grammars </SectionTitle> <Paragraph position="0"> The JAVOX infrastructure is not tied to any particular NLP method; in fact, the JAVOX grammar system is the second NLP implementation we have used. It is presented here because it is straightforward, easy to implement, and surprisingly powerful. JAVOX grammars axe based on Sun's Java Speech Grammar Format (JSGF) (Sun Microsystems, Inc., 1998). JSGF is a rule-based, speech-recognition grammar, designed to specify acceptable input to a recognizer. In JAVOX grammars, each JSGF rule may be augmented with a fragment of JAVOX Scripting Language code - we refer to JAVOX grammars as scriptable grammars. The result of parsing an utterance with a JAVOX grammar is a complete piece of JSL code, which is then interpreted to perform the action specified by the user.</Paragraph> <Paragraph position="1"> The process of speech-enabling an application in JAVOX consists of writing a grammar that contains the language to be used and the corresponding actions to be performed. Building on top of 3SGF means - in many cases - only one file is needed to contain all application-specific information. JSL-specific code is automatically stripped from the grammar at runtime, leaving an ordinary JSGF grammar. This JSGF grammar is sent to a Java-Speech-compliant recognizer as its input grammar. In the current Java implementation, each Java source file (Foo. java) can have an associated JAVOX grammar file (Foo. gram) that contains all the information needed to speak to the application. Encapsulating all natural language information in one file also means that porting the application to different languages is far easier than in most SLSs.</Paragraph> <Section position="1" start_page="105" end_page="107" type="sub_section"> <SectionTitle> 3.1 Seriptable Grammars </SectionTitle> <Paragraph position="0"> Since JSGF grammars are primarily speech-recognition grammars, they lack the ability to encode semantic information. They only possess a limited tag mechanism. Tags allow the recognizer to output a canonical representation of the utterance instead of the recognition verbatim. For example, public <ACTION> = move \[the\] <PART> <DIR>; public <PART> = eyes; public <PART> = ( cap I hat ); public <DIR> = up; public <DIR> = down; Grammar 1: A JSGF fragment from the Mr. Potato Head domain. the tag rm may be the output from both delete the file and remove it.</Paragraph> <Paragraph position="1"> Tags are not implemented in JAVOX grammars; instead, we augment the rules of JSGF with fragments of a scripting language, which contains much richer semantic information than is possible with tags. TRANSLATOR receives the raw utterance from the recognizer and translates it into the appropriate semantic representation. JAvox grammars do not mandate the syntax of the additional semantic portion. Though JSL is presented here, TRANSLATOR has been used to form Prolog predicates and Visual Basic fragments.</Paragraph> <Paragraph position="2"> JSGF rules can be explicitly made public or are implicitly private. Public rules can be imported by other grammars and can serve as the result of a recognition; a private rule can be used in a recognition, but cannot be the sole result. The five rules in Grammar 1 are from a JSGF-only grammar fragment from the Mr. Potato Head 2 domain (discussed later). Grammar 1 allows eight sentences, such as move the eyes up, move the eyes down, move the cap up, move the cap down, and move cap up. Rule names are valid Java identifiers enclosed within angle brackets; the left-hand side (LHS) is everything to the left of the equality sign and the right-hand side (RHS) is everything to the right. JAVOX grammars include the standard constructs available in JSGF, these include: Imports Any grammar file can be imported into other grammar files, though only public rules are exported. This allows for the creation of grammar libraries. When using JSL, Java classes can also be imported.</Paragraph> <Paragraph position="3"> Comments Grammars can be documented using Java comments: single-line comments (//) and delimited ones (/* until */).</Paragraph> <Paragraph position="4"> Parenthesis Precedence can be modified with parentheses.</Paragraph> <Paragraph position="5"> Alternatives A vertical bar ( I ) can be used to separate alternative elements, as in the <PART> rule of Grammar 1.</Paragraph> <Paragraph position="6"> Optionals Optional elements are enclosed within brackets (\[ and \] ), such as the in Grammar l's <ACTION> rule.</Paragraph> <Paragraph position="7"> 2Mr. Potato Head is a registered trademark of Hasbro, Inc. Kleene Star Operator A postfix Kleene star (*) operator can be used to indicate that the preceding element may occur zero or more times. Plus Operator A similar operator to indicate that an element may appear one or more times.</Paragraph> <Paragraph position="8"> A grammar's rules may be organized however the developer wishes. Some may choose to have one rule per utterance, while others may divide rules to the parts-of-speech level or group them by semantic value. In practice, we tend to write rules grouped by semantic value for nouns and verbs and at the parts-of-speech level for function words. Grammar 2 shows the Mr. Potato Head grammar augmented with JSL fragments.</Paragraph> <Paragraph position="9"> The semantic component of each rule is separated from the RHS by a colon and delimited with a brace and colon ({: until :}). Using Grammar 2, the parse and translation for Move the cap up is shown in Figure 2.</Paragraph> <Paragraph position="10"> Each rule may have either one semantic fragment or any number of named fields. A single fragment is sufficient when there is a one-to-one correlation between a lexical item and its representation in the program. Occasionally, a single lexical item may require several components to adequately express its meaning within a program. In Grammar 2, there is a one-to-one correlation between the direction of movement and the slideUp and slideDown functions in the <DIR> rules. These functions can also written as a single slide function, with the direction of the movement given by two parametric variables (cos and sin). In this situation, the direction rule (<DIR.}/F>) needs to be expressed with two values, each known as a named field. The word up may be represented by the named fields cos and sin, with the values 0 and 1 respectively.</Paragraph> <Paragraph position="11"> Another issue in JSL - which does not arise in the syntax-only JSGF - is the need to uniquely identify multiple sub-rules of the same type, when they occur in the same rule. For example, in a geometry grammar, two <POINT>s may be needed in a rule to declare a <LINE>, as in: public <LINE> = make a line from <POINT> to <POINT> : ...</Paragraph> <Paragraph position="12"> Uniquely numbering the sub-rules eliminates the ambiguity as to which <POINT> is which. Numbering can be used in both the RttS and the semantic portion of a rule; numbering is not allowed in the LHS of a rule. Syntactically, sub-rules are numbered with a series of single quotes3: public <LINE> = make a line from <POINT'> to <POINT''> : ...</Paragraph> </Section> <Section position="2" start_page="107" end_page="107" type="sub_section"> <SectionTitle> 3.2 Javox Scripting Language (JSL) </SectionTitle> <Paragraph position="0"> The JAVOX Scripting Language (JSL) is a stand-alone programming language, developed for use with the JAVOX infrastructure. JSL can be used to manipulate a running Java program and can be thought of as an application-independent macro language.</Paragraph> <Paragraph position="1"> The EXECUTER module interprets JSL and performs the specified actions. The specifics of JSL are not important to understanding JAVOX; for this reason, only a brief summary is presented here.</Paragraph> <Paragraph position="2"> JSL can read of modify the contents of an object's fields (data members) and can execute methods (member functions) on objects. Unlike Java, JSL is loosely-typed: Type checking is not done until a given method is executed. JSL has its own variables, which can hold objects from the host application; a JSL variable can store an object of any type and no casting is required. JSL supports Java's primitive types, Java's reference types (objects), and Lisp-like lists. Though JSL does support 3This representation is motivated by the grammars of (Hipp, 1992).</Paragraph> <Paragraph position="3"> Java's primitive types, they are converted into their reference-type equivalent. For example, an integer is stored as a java. lang. Integer and is converted back to an integer when needed.</Paragraph> <Paragraph position="4"> JSL has the standard control flow mechanisms found in most conventional programming languages, including if-else, for and while loops. With the exception of the evaluation of their boolean expressions, these constructs follow the syntax and behavior of their Java counterparts. Java requires that if-else conditions and loop termination criteria be a boolean value. JSL conditionals are more flexible; in addition to booleans, it evaluates non-empty strings as true, empty strings as false, non-zero values as true, zero as false, non-null objects as true, and null as false.</Paragraph> <Paragraph position="5"> In addition to Java's control flow mechanisms, JSL also supports foreach loops, similar to those found in Perl. These loops iterate over both JSL lists and members of java.util.List, executing the associated code block on each item. JSL lists are often constructed by recursive rules in order to handle conjunctions, as seen in Section 5.</Paragraph> </Section> </Section> <Section position="6" start_page="107" end_page="108" type="metho"> <SectionTitle> 4 Infrastructure </SectionTitle> <Paragraph position="0"> The JAVOX infrastructure has been designed to completely separate NLP code from the application's code. The application still can be run without JAVOX, as a typical, non-speech-enabled program - it is only speech-enabled when run with JAVOX.</Paragraph> <Paragraph position="1"> From the application's perspective, JAVOX operates at the systems-level and sits between the application and the operating system (virtual machine), as shown in Figure 1. TRANSLATOR interfaces with the speech recognizer and performs all necessary NLP.</Paragraph> <Paragraph position="2"> EXECUTER interfaces directly with the application and performs upcalls into the running program.</Paragraph> <Paragraph position="3"> Java has two key features that make it an ideal test platform for our experimental implementation: reflection and a redefineable loading scheme. Reflection provides a running program the ability to inspect itself, sometimes called introspection. Objects can determine their parent classes; every class is itself an object in Java (an instance of j ava.lang.Class). Methods, fields, constructors, and all class attributes can be obtained from a Class object. So, given an object, reflection can determine its class; given a class, reflection can find its methods and fields. JAVOX uses reflection to (1) map from the JSL-textual representation of an object to the actual instance in the running program; (2) find the appropriate j ava.lang.reflect.Methods for an object/method-name combination; and (3) actually invoke the method, once all of its arguments are known.</Paragraph> <Paragraph position="4"> Reflection is very helpful in examining the application program's structure; however, prior to using reflection, EXECUTER needs access to the objects in the running program. To obtain pointers to the objects, JAVOX uses JOIE, a load-time transformation tool (Cohen et al., 1998). JOIE allows us to modify each application class as it is loaded into the virtual machine. The JAVOX transform adds code to every constructor in the application that registers the new object with Executer. Conceptually, the following line is added to every constructor: Executer. register (this).</Paragraph> <Paragraph position="5"> This modification is done as the class is loaded, the compiled copy - on disk - is not changed. This allows the program to still be run without JhVOX, as a non-speech application. EXECUTER can - once it has the registered objects - use reflection to obtain everything else it needs to perform the actions specified by the JSL.</Paragraph> </Section> <Section position="7" start_page="108" end_page="109" type="metho"> <SectionTitle> 5 Example </SectionTitle> <Paragraph position="0"> Our longest running test application has been a Mr. Potato Head program; that allows users to manipulates a graphical representation of the classic children's toy. Its operations include those typically found in drawing programs, to include moving, recoloring and hiding various pieces of Mr. Potato Head.</Paragraph> <Paragraph position="1"> Grammar 3 shows a portion of application's grammar needed to process the utterance Move the eyes and glasses up. The result of parsing this utterance is shown in Figure 3.</Paragraph> <Paragraph position="2"> Once TRANSLATOR has processed an utterance, it forwards the resulting JSL fragment to EXECUTER.</Paragraph> <Paragraph position="3"> Figure 4 provides a reduced class diagram for the Mr. Potato Head application; the arrows correspond to the first iteration in the following trace. The following steps are performed as the JSL fragment from Figure 3 is interpreted: 1. A new variable - local to EXECUTER - named $iter is created. Any previously-declared variable with the same name is destroyed.</Paragraph> <Paragraph position="4"> 2. The foreach loop starts by initializing the loop variable to the first item in the list: Canvas.eyes0bj. This object's name consists of two parts; the steps to locate the actual instance in the application are: (a) The first part of the name, Canvas, is mapped to the only instance of the Canvas class in the context of this application.</Paragraph> <Paragraph position="5"> JAVOX has a reference to the instance because it registered with EXECUTER when it was created, thanks to a JOIE transformation. null (b) The second part of the name, eyes0bj, is found through reflection. Every instance of Canvas has a field named eyes0bj of type BodyPaxt. This field is the eyes0bj for which we are looking.</Paragraph> <Paragraph position="6"> 3. Once eyes0bj is located, the appropriate method must be found. We determine through reflection - that there are two methods in the BodyPart class with the name move, as seen in Figure 4.</Paragraph> <Paragraph position="7"> 4. We next examine the two arguments and determine them to be both integers. Had the arguments been objects, fields, or other method calls, this entire procedure would be done recursively on each.</Paragraph> <Paragraph position="8"> 5. We examine each possible method and determine that we need the one with two integer arguments, not the one taking a single Point argument.</Paragraph> <Paragraph position="9"> 6. Now that we have the object, the method, and the arguments, the upcall is made and the method is executed in the application. The result is that Mr. Potato Head's eyes move up on the screen.</Paragraph> <Paragraph position="10"> 7. This process is repeated for glass0bj and the loop terminates.</Paragraph> <Paragraph position="11"> After this process, both the eyes and glasses have moved up 20 units and Executer waits for additional input. The application continues to accept mouse and keyboard commands, just as it would without speech.</Paragraph> </Section> class="xml-element"></Paper>