File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-1420_metho.xml
Size: 14,371 bytes
Last Modified: 2025-10-06 14:15:15
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1420"> <Title>A Language-Independent System for Generating Feature Structures from Interlingua Representations</Title> <Section position="4" start_page="189" end_page="191" type="metho"> <SectionTitle> 3 Knowledge Resources </SectionTitle> <Paragraph position="0"> Tile developed architecture is language-independent, it takes the information about the *target language from three knowledge resources: lexicon, map-rules, and syntactic structure representation formalism of the target language. Lexicon, besides its other usages, provides information about the relationship between concept instances and word senses of tile target language \[Dorr, 1993\]. Map-rules define how the content of a TMR is related to the syntactic structure of the target language \[Mitamura and Nyberg, 1992\]. The last knowledge resource provides the information about the structure of the syntactic representation formalism.</Paragraph> <Paragraph position="1"> The interface between concept instances in TMR (denoting events and entities)and word senses of the target language is established using semantic and pragmatic properties O f lexemes that are defined in the lexicon. Since nouns denote entities and verbs denote events in a language, each word that belongs to one of these categories is also defined as a concept instance in the lexicon. So, for every TMR frame that is a concept instance, there is a set of candidate lexicon entries that are defined using the same concept. For example, if the previous example is considered, there are at least two candidates for an instantiated HUMAN, that are 'man' and 'child'.</Paragraph> <Paragraph position="2"> The meaning of every noun and verb is defined in the lexicon by constraining the abstraction provided by the parent concept. For example, one sense of 'man' can be defined as 'a male HUMAN whose age is greater than 17'. Such definitions are the major source of information used in lexical selection. In addition to meaning definitions, pragmatic properties of word senses can also be defined in the lexicon. For example, the preference Of 'guy' Over 'man' in informal situations to *express a negative attitude can be encoded by attaching the necessary stylistic and attitude requirements to the definition of 'guy'. Note that, words belonging to adjective and adverb categories are not defined as concept instances. Instead, they are represented in TMRs as features of events and entities, and their realizations are achieved through map-rules in generation.</Paragraph> <Paragraph position="3"> The syntactic structure formalism of the target language is represented using a frame-based notation, like feature structures. The developed system uses the syntax formalism through its corresponding tree structures defined in the knowledge resource. The relation between syntactic structure and TMR is described using map-rules. Each map-rule is related with either a concept from the ontology or a special frame type used in the TMR language to encode certain semantic or pragmatic issues such as aspect, modality and speech-act. Map-rules are utilized to relate thematic roles to grammatical counterparts, to create specific syntactic features such as tense, voice, and modifiers, and- to determine the syntactic connection between events. Map-rules defined for concepts follow the inheritance mechanism in the ontology and general syntactic properties are determined in parent concepts.</Paragraph> <Paragraph position="4"> Each map-rule mainly provides two types of information: content conditions and update operations. Content conditions should be satisfied by the input TMR before update operations are applied. Since map-rules Should be TMR independent, making references to arbitrary frames in the input TMR is not allowed in the definitions of content conditions. In fact, only three frames can be referenced in conditions: current active frame, current event frame, and current speech-act frame.</Paragraph> <Paragraph position="5"> Content conditions are defined to check the existence of certain features and/or their values in these frames. Update operations change the constructed syntactic structure of the sentence when they are applied.There are three types of update operations: feature addition such as add(tense, past), frame addition such as add(subject), and frame-to-frame mapping such as map(agent, subject).</Paragraph> </Section> <Section position="5" start_page="191" end_page="192" type="metho"> <SectionTitle> 4 Computational Model </SectionTitle> <Paragraph position="0"> The computational model is designed to process the TMR of a sentence as input and to construct the syntactic structure of that sentence selecting lexical items for the constituents of that sentence.</Paragraph> <Paragraph position="1"> To achieve these tasks, the model makes use of ontology and knowledge resources developed for the target language. Although lexical selection and syntactic structure construction can work in paraUel during TMR processing, they Call also be handled in two independent submodules. Lexical selection is activated whenever the TMR frame is a concept instance, and it is based on the semantic and the pragmatic properties of the candidate lexemes. Each TMR frame activates its attached map-rules to update the constructed syntactic structure. Besides these tasks, the model should determine the process order of TMR framesSo, the main module decides On the processing order and activates the lexical selection and the map-rule application submodules whenever necessary.</Paragraph> <Paragraph position="2"> The architecture is described in Figure 3.</Paragraph> <Section position="1" start_page="191" end_page="192" type="sub_section"> <SectionTitle> 4.1 Lexical Selection Module </SectionTitle> <Paragraph position="0"> Lexical selection is performed for every TMR frame which is a concept instance. Since there are generally more than one candidate lexeme for such a frame, the module should select the most near-perfect word sense that carries the meaning residing in the TMR frame into the target sentence.</Paragraph> <Paragraph position="1"> So, lexical selection in this work is mainly based on the meaning distance between the frame being processed and the candidate lexemes \[Temizsoy, 1997\]. The distance calculation is done through assigning penalties to features that are not matched in the two definitions. After calculating the proximities between the meaning in the TMR frame and the candidate lexemes, the module returns the closest one as the selected word sense. Although proximity of meaning is the major Criterion.</Paragraph> <Paragraph position="2"> there are cases in Which there are still ambiguity between candidates. In such cases, in addition to the semantic constraints Of lexical items, their pragmatic properties are also taken into account. Lexical selection is achieved in three successive steps: first the candidates whose subcategorization constraints are not satisfied in the TMR frame are removed from the list (contextdependent selection), then a distance is assigned to the remaining candidates by comparing the meaning residing in the TMR frame with their definitions in the lexicon (context-independent selection), and if it is still impossible to make a selection on those cMculated distances, the stylistics and pragmatic properties of Candidates are utilized . The architecture of lexical selection module is described as in Figure 4.</Paragraph> <Paragraph position="4"/> </Section> </Section> <Section position="6" start_page="192" end_page="194" type="metho"> <SectionTitle> SELECTED I.E XEIv~ </SectionTitle> <Paragraph position="0"> There are some heuristics that are utilized in calculating the distance between a TMR frame and a lexical item definition, and they can be summarized as follows: A penalty value is assigned to a feature that is in the lexeme definition, but not in the TMR frame, to nfinimize extraneous meaning introduction.</Paragraph> <Paragraph position="1"> Another penalty value is assigned to a feature that is in the TMR frame definition, but not in the lexeme definition, to reduce uncoverage of meaning.</Paragraph> <Paragraph position="2"> Match between two values from the same domain is proportional to the distance in ordered values a,nd the intersection sizes in ranges.</Paragraph> <Paragraph position="3"> The calculated match is normalized by the domain size of the feature to minimize distances in larger domains.</Paragraph> <Paragraph position="4"> The final distance is rated by its importance on the overall meaning such that mismatches in less relevant features liave smaller influence over the fina ! proximity.</Paragraph> <Section position="1" start_page="192" end_page="194" type="sub_section"> <SectionTitle> 4.2 Map-Rule Application. Module </SectionTitle> <Paragraph position="0"> This module collects all the map-rules associated with the TMR framebeing processed and updates the Constructed syntactic structure for map-rules whose content conditions are satisfied. The map-rules developed for ontology concepts follow the inheritance mechanism provided in the ontology.</Paragraph> <Paragraph position="1"> So, while processing a TMR frame which is an concept instance, this module should traverse tile ontology in a bottom-up fashion to apply map-rules that are associated with the ancestor concept s of the concept instance. Note that, since a lexical item can require some updates on the syntactic structure, this module also applies the map=rules associated with the selected lexical item. If the processed TMR frame is not a concept instance, the map=rules associated with its frame typeare applied to update the constructed syntactic structure.</Paragraph> <Paragraph position="2"> As mentioned, the syntax formalism of the target language is represented as tree structures in which frames are the internal nodes and the features are the leaves. Since frames and features in such a representation are used to describe distinct syntactic phenomena, unique names should be given to them. This.uniqueness property is utilized to find the place of a feature or a frame directly in the tree structure without traversing. So, feature or frame addition to the constructed tree is achieved by just finding its place, forming a partial tree through traversing the defined tree structure in a bottom-up fashion, and merging that partial tree to the previous constructed syntactic structure. Note that, tliese operations Can be done in logarithmic time \[Tenfizsoy. !997\].</Paragraph> <Paragraph position="3"> Some syntactic constructs have the same form although their syntactic realizations are different, like noun phrases. So, generally their structure is defined under a common frame which can be the value of various features in the overall structure. For example, noun phrases are the fillers of grammatical roles subject, direct-object, etc. To utilize such a form, the representation formalism is allowed to have more than one tree in its definition (one for verbal phrases, another for noun phrases, .etc.). The tree representing verbal phrase is taken to be the main one, all constructed children trees should be attached to it. The information about the attachment place of a child tree (noun phrase is t.he subject, place, etc.) is obtained from previous frame-to-frame mapping rules such as map(agent, subject).</Paragraph> <Paragraph position="4"> a</Paragraph> </Section> <Section position="2" start_page="194" end_page="194" type="sub_section"> <SectionTitle> 4.3 Main Module </SectionTitle> <Paragraph position="0"> The main module is responsible for determining the processing order of the TMR frames in the input. In this work, a depth-first strategy is used in ordering which is utilized in processing TMRs that have more than one event. Since verbal phrases are represented with the main tree in the syntax formalism, trees constructed for supplementary events should be attached to the tree built for the main event. Since depth-first processing guarantees that all children frames together with their parent frame are processed before processing the other TMR frames, the algorithm can safely constructs the syntactic structures of supplementary events and connects them to the main tree.</Paragraph> <Paragraph position="1"> So, the main module first constructs a processing stack which contains the main event (scope of the speech-act), relations or special frames (casual, temporal, textual relations, speech~acts, etc.),.and other events in the given order \[Temizs0y, 1997\]. After creating the syntactic tree of a supplementary event, the algorithm finds the syntactic relation of that event to the main one. This determines the attachment place of the child tree in the main tree. There are three cases in which events are related to the main one: Another event is used to describe a thematic role of the main event, like in :'I wahl to read a book&quot;. In this example; the phrase 'read a book' is processed individually by the algorithm, and its corresponding constructed tree is attached as the direct-object of the sentence (assuming that map(theme, direct-object) is previously applied).</Paragraph> <Paragraph position="2"> The connection between two events is a relation (casual relations, conjunctions, etc.), like in &quot;'Since John did not study enough, he could not pas s the exam&quot;. In this example, first the main event, PASS, is processed, then the frame which defines the relation is taken from the processing stack. Since oneof its arguments is not processed yet (the event STUDY in this example), the algoritl!m first constructs the tree structure of STUDY, and. then apply the syntactic realization Of the relation to the constructed trees of PASS and STUD}'.</Paragraph> <Paragraph position="3"> Another event is introduced to give some additional information about the main event or One of its components, like in &quot;John, who came to Four birthday party last monlh, went to Istanbug'. In this example, the algorithm first constructs the corresponding tree of GO, then it processes the event COME, and finally finds its relation to GO (definition of subject) and merges its constructed tree to the main one.</Paragraph> </Section> </Section> class="xml-element"></Paper>