XML Viewer - c00-2111

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2111_metho.xml
Size: 23,104 bytes
Last Modified: 2025-10-06 14:07:14
<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2111">
  <Title>Gilles.Serasset@ imag,fr</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
Keywords
</SectionTitle>
    <Paragraph position="0"> UNL, interlingua, pivot, deconversion, UNL~French localization, transfer, generation.</Paragraph>
    <Paragraph position="1"> Introduction The UNL project of network-oriented multilinguat communication has proposed a standard for encoding the meaning of natural language utterances as semantic hypergraphs intended to be used as pivots in multilingual information and communication systems. In the first phase (1997-1999), more than 16 partners representing 14 languages have worked to build deconverters transforming an (interlingual) UNL hypergraph into a natural language utterance.</Paragraph>
    <Paragraph position="2"> In this project, the strategy used to achieve this initial objective is free. The UNL-French deconverter under development first performs a &amp;quot;localization&amp;quot; operation within the UNL format, and then classical transfer and generation steps, using the Ariane-G5 environment and some UNL-specifc tools.</Paragraph>
    <Paragraph position="3"> The use of classical transfer and generation steps in the context of an interlingual project may sound surprising. But it reflects many interesting issues about the status of the UNL language, designed as an interlingua, but diversely used as a linguistic pivot (disambiguated abstract English), or as a purely semantic pivot.</Paragraph>
    <Paragraph position="4"> After introducing the UNL language, we present the architecture of the UNL-French deconverter, which &amp;quot;generates&amp;quot; from the UNL interlingua by first &amp;quot;localizing&amp;quot; the UNL form for French, within UNL, and then applying slightly adapted but classical transfer and generation techniques, implemented in the Ariane-G5 environlnent, supplemented by some UNL-specific tools.</Paragraph>
    <Paragraph position="5"> Then, we discuss the use of the UNL language as a linguistic or semantic pivot for highly multilingual information systems.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="769" type="metho">
    <SectionTitle>
1 The UNL project and language
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="768" type="sub_section">
      <SectionTitle>
1.1 The project
</SectionTitle>
      <Paragraph position="0"> UNL is a project of multilingual personal networking communication initiated by the University of United Nations based in Tokyo.</Paragraph>
      <Paragraph position="1"> The pivot paradigm is used: the representation I The terms &lt;&lt; deconvcrsion, and &lt;~ enconvcrsion, are specific to tile UNL proiect and are defined at paragraph 2.  of an utterance in the UNL interlingua (UNL stands for &amp;quot;Universal Networking Language&amp;quot;) is a hyl)ergraph where normal nodes bear UWs CUniversal Words&amp;quot;, or interlingual acceptions) with semantic attributes, and arcs bear semantic relations (deep cases, such as agt, obj, goal, etc.).</Paragraph>
      <Paragraph position="2"> Hypernodes group a subgraph defined by a set of connected arcs. A UW denotes a set of interlingual acceptions (word senses), although we often loosely speak of &amp;quot;the&amp;quot; word sense demoted by a UW.</Paragraph>
      <Paragraph position="3"> Because English is known by all UNL developers, the syntax of a tlormal WW is: &amp;quot;&lt;English word or compound&gt; ( &lt;list of restrictions&gt; ) &amp;quot;, O. Z. &amp;quot;look for (icl&gt;action, agt&gt;human, obj&gt;thing)&amp;quot; Going fronl a text to the corresponding &amp;quot;UNL text&amp;quot; or interactively constructing a UNL text is called &amp;quot;enconversioif', while producing a text fiom a sequence of UNL graphs is called &amp;quot;deconversion&amp;quot;.</Paragraph>
      <Paragraph position="4"> This departure fi'om the standard terms of analysis and generation is used to stress that this is not a classical M\]: projecl, bu! that UNL is planned to be the source format preferred for representing textual inl:ormation in tile envisaged multilingual network environment.</Paragraph>
      <Paragraph position="5"> Tile schedule of tile project, beginning with deconversion rather than cnconvcrsion, also reflects that difference.</Paragraph>
      <Paragraph position="6"> 14 hmguages have been tackled during the first 3--year phase of the prqject (1997-1999), while many more arc to be added in tile second phase. Each group is fi-ee to reuse its own software lools and/or lingware resources, or to develop directly with tools provided by tile UNL Center (UNU/IAS).</Paragraph>
      <Paragraph position="7"> Emphasis is on a very large lexical coverage, so that all groups spend most of their time on tile UNL-NL lexicons, and develop tools and methods for efficient lexical development. By contrast, gramnmrs have been initially limited to those necessary for deconversion, and will then bc gradually expanded to allow for more naturalness m formulating text to be enconverted.</Paragraph>
    </Section>
    <Section position="2" start_page="768" end_page="769" type="sub_section">
      <SectionTitle>
1.2 The UNL components
</SectionTitle>
      <Paragraph position="0"> Tile nodes of a UNL utterance are called Universal Words (or Uws). The syntax of a normal UW consists of 2 parts : a headword, a list of restrictions Because English is known by all UNL developers, tile headword is an English word or compound. The restrictions are given as all attribute value pail&amp;quot; where attributes are semantic relation labels (as the ones used in the graphs) and wllues are other UWs (restricted or not). A UW denotes a collection of interlingual acceptions (word senses), although we often loosely speak of &amp;quot;the&amp;quot; word sense denoted by an UW. For example, the unrestricted UW &amp;quot;look for&amp;quot; denotes all the word-senses associated to tile English compound word &amp;quot;look for&amp;quot;. Tile restricted UW &amp;quot; look for ( icl&gt;action, agt&gt;human, obj&gt;thing) &amp;quot; represents all tile word senses of the English word &amp;quot;look for&amp;quot; that are an action, perl%rmed by a human that affects a thing. In this case this leads to the word sense: &amp;quot;look for- to try to find&amp;quot;.</Paragraph>
      <Paragraph position="1">  A UNL expression is a hypergraph (a graph where a node is simple or recursively contains a hypergraph). Tile arcs bear semantic relation labels (deep cases, such as agt, obj, goal, etc.).</Paragraph>
      <Paragraph position="3"> Figm'e I. 1: A UNL graph deconvertible as &amp;quot;Ronaldo has headed the ball into the left corner of the net&amp;quot; In a UNL graph, UWs appear with attributes describing what is said from tile speaker's point of view. This includes phenomena like speech acts, truth wllues, time, etc.</Paragraph>
      <Paragraph position="4"> Hypernodes may also be used ill UNL expressions.</Paragraph>
      <Paragraph position="5">  deconverted as &amp;quot;Reckless drivers drink and drive&amp;quot; Graphs and subgraphs nmst contain one special node, called the entry of tile graph.</Paragraph>
      <Paragraph position="6">  These hypergraphs are denoted using the UNL language per se. In the UNL hmguagc, an  expression consists in a set of arcs, connecting the different nodes. As an example, the graph presented in figure 1.1 will be denoted as:</Paragraph>
      <Paragraph position="8"> mod(corner, left) Hypernodes are denoted by numbers. The graph contained by a hypernode is denoted as a set of arcs colored by this number as in: agt (:Ol.@entry, driver. @pl) aoj (reckless, driver.@pl) and:Ol (drive, drink.@entry) Entries of the graph and subgraphs are denoted with the &amp;quot;.@entry&amp;quot; attribute.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="769" end_page="771" type="metho">
    <SectionTitle>
2 Inside the French deconverter
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="769" end_page="769" type="sub_section">
      <SectionTitle>
2.1 Overview
</SectionTitle>
      <Paragraph position="0"> Deconversion is the process of transforming a UNL graph into one (or possibly several) utterance in a natural language. Any means may be used to achieve this task. Many UNL project partners use a specialized tool called DeCo but, like several other partners, we choose to use our own tools for this purpose.</Paragraph>
      <Paragraph position="1"> One reason is that DeCo realizes the deconversion in one step, as in some transfer-based MT systems such as METAL \[17\]. We prefer to use a more modular architecture and to split deconversion into 2 steps, transfer and generation, each divided into several phases, most of them written in Arlene-G5.</Paragraph>
      <Paragraph position="2"> Another reason for not using DeCo is that it is not well suited for the morphological generation of inflected languages (several thousands rules are needed for Italian, tens of thousands for Russian, but only about 20 rules and 350 affixes suffice to build an exhaustive GM for French in Sygmor). Last, but not least, this choice allows us to reuse modules already developed for French generation.</Paragraph>
      <Paragraph position="3"> This strategy is illustrated by figure 2.1.</Paragraph>
      <Paragraph position="5"> Fig. 2.1:2 possible deconversqon strategies Using this approach, we segment the deconversion process into 7 phases, as illustrated by figure 2.2.</Paragraph>
      <Paragraph position="6"> The third phase (graph-to-tree) produces a decorated tree which is fed into an Ariane-G5</Paragraph>
    </Section>
    <Section position="2" start_page="769" end_page="771" type="sub_section">
      <SectionTitle>
2.2 Transfer
2.2.1 Validation
</SectionTitle>
      <Paragraph position="0"> When we receive a UNL Graph for deconversion, we first check it for correctness. A UNL graph has to be connected, and the different features handled by the nodes have to be defined in UNL.</Paragraph>
      <Paragraph position="1"> If the graph proves incorrect, an explicit error message is sent back. This validation has to be performed to ilaprove robustness of the deconverter, as there is no hypothesis on the way a graph is created. When a graph proves valid, it is accepted for deconversion.</Paragraph>
      <Paragraph position="2">  In order to be correctly deconverted, tile graph has to be slightly modified.</Paragraph>
      <Paragraph position="3">  Some lexical units used in the graph may not be present in the French deconversion dictionary. This problem may appear under different circumstances. First, the French dictionary (which is still under development) may be incomplete. Second, the UW nmy use an unknown notation to represent a known French word sense, and third, the LAV may represent a non-French word sense.</Paragraph>
      <Paragraph position="4"> We solve these problems with the same method : Let w be a UWin the graph G. Let D be the French dictionary (a set of UWs). We substitute w in G by w' such that: w' e D and VxeD d(w, w', G) = d(w, x, G). where d is a pseudo-distance function.</Paragraph>
      <Paragraph position="5">  If different French UWs are at the same pseudo-distance of w, w' is chosen at random among these UWs (default in non-interactive mode).  Some crucial information may be missing, depending on the language of the source utterance (sex, modality, number, determination, politeness, kinship...).</Paragraph>
      <Paragraph position="6"> It is in general impossible to solve this problem fully automatically in a perfect manner, as we do not know anything about the document, its c:ontext, and its intended usage: FAHQDC 2 is no more possible than FAHQMT on arbitrary texts. We have to rely on necessarily imperfect heuristics.</Paragraph>
      <Paragraph position="7"> ttowever, we can specialize tile general French deconverter to produce specialized servers for different tasks and different (target) sublanguages. It is possible to assign priorities not only to various parts of the dictionaries (e.g., specialized vs. general), but also to equivalents of the same UW within a given dictionary. We can then define several user profiles. It is also possible to build a memory of deconverted and possibly postedited utterances for each specialized French deconversion server.</Paragraph>
      <Paragraph position="8">  After the localization phase, we have to perform the lexical transfer. It would seem natural to do ill within Ariane-G5, after converting the graph into a tree. But lexical transfer is contextsensitive, and we want to avoid the possibility of transferring differently two tree nodes corresponding to one and the same graph node. Each graph node is replaced by a French lcxical unit (LU), along with some variables. A lexical unit used in tile French dictionary denotes a derivational family (e.g. in English: destroy denotes destroy, destruction, destructible, destructive .... in French: d6truire for d6truire, destruction, destructible, indestructible, destructif, destructeur).</Paragraph>
      <Paragraph position="9"> There may be several possible lexical units for one UW. This happens when there is a real synonymy or when different terms are used in different domains to denote the same word sense 3. In that case, we currently choose tile lexical unit at random as we do not have any information on tile task the deconverter is used for.</Paragraph>
      <Paragraph position="10"> Tile same problem also appears because of tile slrategy used to build the French dictionary. In order to obtain a good coverage from the beginning, we have underspecified tile UWs and linked them to dift'ercnt lexical units. This way, we considered a UW as tile denotation of a set of word senses in French.</Paragraph>
      <Paragraph position="11"> Hence, we were able to reuse previous dictionaries and we can use the dictionary even if it is still under development and incolnplete. In our first version, we also solve this problem by a random selection of a lexical unit.</Paragraph>
      <Paragraph position="12">  The subsequent deconversion phases are performed in Ariane-G5. Hence, it is necessary to convert the UNL hypergraph into an Ariane-G5 decorated tree.</Paragraph>
      <Paragraph position="13"> The UNL graph is directed. Each arc is labelled by a semantic relation (agt, obj, ben, con...) and each node is decorated by a UW and a set of features, or is a hypernode. One node is distinguished as the &amp;quot;entry&amp;quot; of the graph. An ARIANE tree is a general (non binary) tree with decorations on its nodes. Each decoration is a set of wlriable-value pairs.</Paragraph>
      <Paragraph position="14"> The graph-to-tree conversion algorithln has to lnaintain the direction and labelling of the graph along with the decoration ot' the nodes. Our algorithm splits tile nodes that are the target of more than one arc, and reverses the direction of as few arcs as possible. An example of such a conversion is shown in figure 2.3.</Paragraph>
      <Paragraph position="16"> Let Z be the set of nodes of G, A the set of labels, T the created tree, and N is the set of nodes of T.</Paragraph>
      <Paragraph position="17"> Tile graph G={ (a,b,l) lac Y.,b6 Z,I~ A} is defined as a set of directed labelled arcs. We use an association list A = { (n,;,n.r) I ,,,+ ~ r,, U. r E N }, where we memorize the correspondence between nodes of the tree and nodes of the graph.</Paragraph>
      <Paragraph position="18"> 2 fully autonmtic high quality dcconvcrsion. 3 strictly speaking, tile same collection of intcrlingual woM senses (acccptions).</Paragraph>
      <Paragraph position="19">  let e(; e such that e is the entry of G e r 6- new tree-node (ed, entry)</Paragraph>
      <Paragraph position="21"> let a, r e N such that (a,a, r) e A in add b r to the daughters of a,r; else if there is (a,b,l) in G such that (b,br) 6  let brl,e N such that (b,br) e A in add a,, to the daughters of br; else exit on error (&amp;quot;non connected graph&amp;quot;); (a, a. r) e A then A then  The purpose of the structural transfer is to transform the tree obtained so far into a Generating Multilevel Abstract (GMA) structure \[4\].</Paragraph>
      <Paragraph position="22"> In this structure, non-interlingual linguistic levels (syntactic functions, syntagmatic categories...) are underspecified, and (if present), are used only as a set of hints for the generation stage.</Paragraph>
    </Section>
    <Section position="3" start_page="771" end_page="771" type="sub_section">
      <SectionTitle>
2.3 Generation
2.3.1 Paraphrase choice
</SectionTitle>
      <Paragraph position="0"> The next phase is in charge of the paraphrase choice. During this phase, decisions are taken regarding the derivation applied to each lexical unit in order to obtain the correct syntaglnatic category for each node. During this phase, the order of appearance and the syntactic functions of each parts of the utterance is also decided.</Paragraph>
      <Paragraph position="1"> The resulting structure is called Unique Multilevel Abstract (UMA) structure.</Paragraph>
      <Paragraph position="2">  The UMA structure is still lacking the syntactic sugar used in French to realize the choices made in the previous phase by generating articles, auxiliaries, and non connected compunds such as ne...pas, etc.</Paragraph>
      <Paragraph position="3"> The role of this phase is to create a Unique Multilevel Concrete (UMC) structure. By concrete, we mean that the structure ~s projective, hence the corresponding French text may be obtained by a standard left to right traversal of the leaves and simple morphological and graphemic rules. The result of these phases is a surface French utterance.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="771" end_page="772" type="metho">
    <SectionTitle>
3 Different uses of the UNL language
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="771" end_page="772" type="sub_section">
      <SectionTitle>
3.1 Hypergraphs vs colored graphs
</SectionTitle>
      <Paragraph position="0"> As presented in section 1.2.3, the syntax of the UNL language is based on the description of a graph, arc by arc. Some of these arcs are &amp;quot;coloured&amp;quot; by a number. This colouring is currently interpreted as hypernodes (nodes containing a graph, rather than a classical UW).</Paragraph>
      <Paragraph position="1"> This interpretation is arbitrary and imposes semantic constraints on a UNL utterance: the subgraph (the set of arcs labeled with the same colour) is connected, arcs with different colours cannot be connected to the same node.</Paragraph>
      <Paragraph position="2"> However, even if one uses the UNL language for a particular kind of application, a different interpretation may be chosen. By adding new semantic constraints to UNL expressions, one may restrict to the use of trees. On the contrary, by loosening semantic constraint, one may use colored graphs instead of the more restrictive hypergraphs.</Paragraph>
      <Paragraph position="3"> This flexibility of UNL may lead to uses that differ from the computer science point of view (different structures leading to different kinds of methods and applications) as well as from the linguistic point of view (different ways to represent the linguistic content of a utterance). This kind of structure is very useful to represent some utterances like &amp;quot;Christian pulls Gilles' leg&amp;quot;. Using a colored graph, one can represent the utterance with the graph shown in figure 3.1, which is not a hypergraph.</Paragraph>
      <Paragraph position="4">  however be represented in UNL htnguage When using normal hypergraphs, one could only represent the utterance as shown in figure 3.2.</Paragraph>
      <Paragraph position="5"> agt .... \[ make fun of i</Paragraph>
      <Paragraph position="7"> Heuce, keeping backward compatibility with other UNL based systems, one may develop an entirely new and more powerfld kind of application.</Paragraph>
    </Section>
    <Section position="2" start_page="772" end_page="772" type="sub_section">
      <SectionTitle>
3.2 Linguistic vs senmntie pivot
</SectionTitle>
      <Paragraph position="0"> The UNL language defines the interface structure to be used by applications (either a hypergraph or a colored graph). However, it does not restrict the choice of the data to be encoded.</Paragraph>
      <Paragraph position="1"> Since tile beginning, two possible and wflid apl~roaches has been mentioned. During the kickoff meeting of tile UNL prelect, Pr. Tsujii prolnoted the use of UNL as a linguistic pivot. With this approach, a UNL utterance should be the encoding of the deep structure of a valid English utterance that reflects the meaning of the source utterance. With this approach, the German sentence &amp;quot;Hans schwimt sehr gern&amp;quot; should be encoded as shown in figure 3.3.</Paragraph>
      <Paragraph position="2"> agt.. _ - like.@entry ~. ..</Paragraph>
      <Paragraph position="3"> \[ Ha-'~s \[ ' &amp;quot;-. man I ob j &amp;quot;-. &amp;quot;A, &amp;quot;~--agt ........ \[ s~wim \] i much , Figmv 3.3: a linguistic encoding of &amp;quot;ltcms schwimt sehr gern &amp;quot; On the opposite, Hiroshi Uchida promotes the use of UNL as a semantic pivot. With this second approach, the same sentence should be encoded as shown in figure 3.4.</Paragraph>
      <Paragraph position="4">  Each approach has its advantages and drawbacks and the choice between them can only be made with an application in mind. The linguistic approach leads to a better quality ill the produced results and is an answer to highly multilingual machine translation projects. With this approach, the UNL graphs can only be produced by people mastering English or by (partially) automatic enconverters.</Paragraph>
      <Paragraph position="5"> With the semantic approach, subtle differences in source utterances (indefinite, reflexivity...) can not be expressed, leading to a lower quality. However, using this approach, the UNL encoding is much more natural and easy to perform by a non English speaker (as the semantic relations and UWs are expressed at the source level). Hence, this approach is to be used for multilingual casual communication where users may express themselves by directly encoding UNL expressions with an appropriate editing tool.</Paragraph>
      <Paragraph position="6"> Conclusion Working oil tile French deconvel-ter has led to im interestiug architecture where deconversion, in principle a &amp;quot;generation from interlingua&amp;quot;, is implemented as transfer + generation from all abstract structure (UNL hypergraph) produced from a NL utterance. The idea to use UNL for directly creating documents gets here an indirect and perhaps paradoxical support, although it is clear that considerable progress and innovative interface design will be needed to make it practical.</Paragraph>
      <Paragraph position="7"> However, the UNL language proves flexible enough to be used by very different proiects. Moreover, with deconverters currently developed for 14 languages, joining the UNL project is really attractive. Let's hope that this effort will help breaking the language barriers.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML