File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0107_metho.xml
Size: 14,455 bytes
Last Modified: 2025-10-06 14:07:21
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0107"> <Title>A Measure of Semantic Complexity for Natural Language Systems</Title> <Section position="3" start_page="0" end_page="42" type="metho"> <SectionTitle> 2 Semantic vs. Syntactic complexity </SectionTitle> <Paragraph position="0"> The complexity measurement described above must be one that takes into account both the semantic and syntactic complexity of the domain. Semantic complexity is the number of &quot;things&quot; that we can talk about in the domain. This will include all the objects in the domain, the attributes of those objects to which one might refer, and the relationships between the objects that the user can express. Syntactic complexity refers to the variety of ways that the user will be allowed to refer to an object, attribute, or relationship. For example, a domain could include only two boys but if the user is allowed to refer to them in many ways (e.g., &quot;Bob&quot;, &quot;Jim&quot;, &quot;he&quot;, &quot;they&quot;, &quot;the two boys next to the water cooler at the back of the room&quot;), then the domain is semantically simple but syntactically complex. Likewise a domain with 100 objects that are each referred to only as Object1, Object2, etc.., is semantically complex but syntactically simple.</Paragraph> <Paragraph position="1"> Semantic and syntactic complexities form a trade-Off when it comes to building a language processor for a domain. To build a reliable and accurate processor, the domain must be sufficiently restrained.</Paragraph> <Paragraph position="2"> The more syntactic variety allowed the user, the fewer objects allowed in the domain. So, the more objects in the world, the more restricted the user's grammar and vocabulary. This leads to a tendency to consider the two fronts separately, and then consider a complete complexity measure as a combina- null tion of both. Having measures of syntactic and semantic complexity separately will help to find where the best compromise lies.</Paragraph> <Paragraph position="3"> This paper addresses semantic complexity only. It therefore does not completely define the complexity measure described in the introduction, but hopefully takes a step toward defining such a measure. Syntactic complexity measures such as grammar perplexity (Cole and Zue, 1995) should augment this semantic measure to give a full complexity measure.</Paragraph> </Section> <Section position="4" start_page="42" end_page="43" type="metho"> <SectionTitle> 3 Domain Terms </SectionTitle> <Paragraph position="0"> To analyze a domain's complexity, the domain expert must first specify the domain in which the system will work by determining the objects in the domain, each object's attributes, and the relationships between objects. Consider as an example the small domain of a simple army map, where there are a few objects on the map and the user can display, move, and show or set attributes of them. This example will be used to show how to define a domain using the following terms: Objects are the types of salient things in the domain. They correspond roughly to the subjects and objects of sentences used in the dialog. In the army display domain, the objects will be tanks, troops, bridges, forests, and hills. Notice that a type of object only needs to be specified once at this high level. Bridge is one object in our world, even though the actual program is able to distinguish many different bridges.</Paragraph> <Paragraph position="1"> Attributes of an object are the things that the program needs to know about the object in order to use it in the domain. They correspond roughly to adjectives that describe the object, or things that distinguish one of the objects from the others of that type. In our example, the domain requires the name and position of the bridge and the material of which the bridge is made. These three pieces of information include everything the system needs to know about any bridge. In the following figure, the attributes of an object are listed underneath each object type.</Paragraph> <Paragraph position="2"> Classes are objects, attributes, predicates, or other classes that are grouped together. A class can act as an object in the sense that it can have a name and have relationships with other objects.</Paragraph> <Paragraph position="3"> In our example domain, we will want to distinguish objects that can move from those that cannot, i.e., a MobileObject class as a grouping of Tanks and Troops. There are always three trivial classes: the class of all objects, all attributes (of all objects), and all predicates.</Paragraph> <Paragraph position="4"> Predicates are the relationships between the objects in the world. Any meaning that the user can convey using one or more of the objects should be represented by a predicate. They correspond to the relationship words, like the verbs and prepositions in a sentence, and one can usually find the predicates needed from looking at the allowed operations. For the example domain, the following is the list of allowable predicates, in a typical programming language format to distinguis h predicates from arguments.</Paragraph> <Paragraph position="5"> Display(Object) \[&quot;Display the tanks&quot;\] Move(MobileObject,Object) \[&quot;Move Troop at position 100, 400 to the hill&quot;\] Show(Attribute,Object) \[&quot;Show the range of sight of Tank 434&quot;\] Set(Object,Attribute,Attribute) \[&quot;The forest has an area of 100 square yards.&quot;\] Notice that classes can be written as predicate arguments to mean that any object in the class can be an argument. Specifically, the Object type refers to all objects, MobileObject refers to either Tank or Troop, and Attribute refers to any object's attribute.</Paragraph> </Section> <Section position="5" start_page="43" end_page="44" type="metho"> <SectionTitle> 4 Complexity Formulas </SectionTitle> <Paragraph position="0"> Now that the domain is specified, we can anMyze its semantics by estimating the number of bits of information conveyed by referring to each different aspect of the domain. This is common in information theory (Ash, 1965); that is, when the user makes a statement, it must be encoded, and the number of bits needed to encode the statement is a measure of its information content. Since the number of bits required to encode a statement in a given domain corresponds directly to the number of salient objects, this information measurement is useful in assigning a semantic complexity measurement.</Paragraph> <Paragraph position="1"> To get a. complexity measure for an entire domain, we begin at the lowest level and make counts corresponding to the information content described above. The counts from lower levels are combined to give a higher level count. Specifically, first each attribute value for a specific object is computed, then attribute values are combined to give object values, which are combined to give class values, and so forth until a value for the entire domain is computed.</Paragraph> <Paragraph position="2"> Define B(X) to be the number of bits conveyed by an instance of random variable X, and IX\] to be the number of possible values of X. (Possible ways of computing B(X) will be given in the next sections.) The random variable will represent different events, depending on where we are in the complexity analysis, but in general, the variable will represent the specification of possible attributes, objects, classes, or predicates.</Paragraph> <Paragraph position="3"> We start by defining the complexity of a single attribute for a single object. We give the formulas for computing the different levels of complexity (attribute level, object level, etc) and then work through the example domain.</Paragraph> <Paragraph position="4"> The complexity of attribute i for object j, denoted</Paragraph> <Paragraph position="6"> A simple sum is used because identifying one object uniquely corresponds to knowing each of its attributes. Therefore, the sum of the attribute information is the same as the complete object information. null Since objects can be grouped together into classes, a class complexity is the number of bits conveyed by distinguishing one type of object from that class, plus the maximum object complexity that occurs in that class: CC.,... = B(O) + max (OCob#) obj~class where O is the specification of an object in class. When a member of a class is specified, the amount of information conveyed is equal to the information in the object type specification (B(O)), plus the information conveyed by the actual object itself. The most that can be is the maximum object complexity in the class. Classes of predicates and attributes are defined in the same way.</Paragraph> <Paragraph position="7"> For each predicate, the complexity is the sum of the complexities of its arguments:</Paragraph> <Paragraph position="9"> This is the same as the object complexity as a sum of the complexities of its attributes.</Paragraph> <Paragraph position="10"> In general, predicate arguments will be classes. If a single object is the only possibility for an argument rather than a class of objects, then the object complexity can be used. This would be the same as making a class of one object: the class complexity of one object is equal to the complexity of the one member of the class.</Paragraph> <Paragraph position="11"> The entire domain's semantic complexity is then the same as the complexity of the class of all predicates defined for the domain. Specifically, for a domain with a set of predicates P, the semantic com-</Paragraph> <Paragraph position="13"> where P is the specification of a predicate in the domain.</Paragraph> <Paragraph position="14"> Any statement that the user can make should correspond to some predicate in the domain model. The information given in the sentence is the information given by the predicate specification (B(P)) plus the information given in the arguments to the predicate, which is as much as the greatest predicate complexity. null</Paragraph> <Section position="1" start_page="43" end_page="44" type="sub_section"> <SectionTitle> 5 Using Equal Probability Assumptions </SectionTitle> <Paragraph position="0"> Now we find a formula for B(X), the bits of information conveyed when referring to certain parts of the domain. For the army map example, we assume that all objects are equally likely to be referred to, and all attributes, classes, and relationships are also equally likely. So a troop is as likely to be referred to as a tank, or as a forest, etc. Also, a tank on the map is equally likely to be friend, foe, or unknown.</Paragraph> <Paragraph position="1"> Every value for the attributes will be equally likely.</Paragraph> <Paragraph position="2"> Under this assumption, the number of bits of information conveyed by referring to one entity out of v possible entities is log2v. That is, for the equally probable case, B(X) = log2\[X\[.</Paragraph> <Paragraph position="3"> Now we fill in the table from Figure 1, beginning with attribute values. A domain expert would decide how many different values are allowed for each attribute. In this example, we will specify that Tank's Priend/Foe value is either friend, foe, or unknown three possibilities.</Paragraph> <Paragraph position="5"> Assuming that there are 128 ID number possibilities, 65,000 positions, and 1,000 possible ranges, and assuming equal probability, we take the log of each number and fill in the complexity beside each attribute for that object. Following the hierarchy, we now add the attribute complexities to get the complexity of the tank object.</Paragraph> <Paragraph position="6"> Now we have OCtank = 45 and let's say in like manner we get OCtroop = 43. These two types of objects comprise the MobileObject class, so now we can compute this complexity:</Paragraph> <Paragraph position="8"> Similar formulas are used for predicate and complete domain complexity measurements, and the rest of the example should be obvious from Figure 2.</Paragraph> </Section> </Section> <Section position="6" start_page="44" end_page="44" type="metho"> <SectionTitle> 6 More General Information </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="44" end_page="44" type="sub_section"> <SectionTitle> Measurement </SectionTitle> <Paragraph position="0"> In most cases, the equal probability assumption will not hold. For example, the bridges in the domain can be made of any of eight materials, but if all of the visible bridges are made of wood, then the Material attribute for Bridge will probably be wood most of the time. In this case, referring to the &quot;wooden bridge&quot; on the map doesn't give much more information than just &quot;bridge.&quot; For this more general case, define B(X) to be B(X1, X2, ...X,) where each Xi is a possible value of X. Also define pl,p2, ...Pn to be their associated probabilities. Then</Paragraph> <Paragraph position="2"> These probabilities can be determined using frequency counts from sample dialogs, or estimated based on domain knowledge.</Paragraph> </Section> </Section> <Section position="7" start_page="44" end_page="45" type="metho"> <SectionTitle> 7 Future Work </SectionTitle> <Paragraph position="0"> The next step in this research is to obtain several domains that have been built into a dialog system and analyze them. The Circuit Fix-It Shoppe(Smith and D.R.Hipp, 1994) has been analyzed, but the results will only be interesting in comparison to other real domains. This comparison will not only help us verify the correctness of the analyses, but also bring up possible situations that the analysis may not cover.</Paragraph> <Paragraph position="1"> Next, we will want to identify a measure of syntactic complexity. This could be related to grammar perplexity. It should take into account vocabulary size, grammar constraints, and the amount of ambiguity in the grammar. We would like to be able to analyze the domains with both the semantic complexity and the syntactic complexity, and see that the results match our intuitions of complexity and the standards of lines of code, reliability, cost of software, and execution time. We would also be interested in observing the correlation between the syntactic and semantic complexities.</Paragraph> </Section> class="xml-element"></Paper>