File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/a92-1029_metho.xml

Size: 23,842 bytes

Last Modified: 2025-10-06 14:12:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="A92-1029">
  <Title>References</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 Compound Nouns
</SectionTitle>
    <Paragraph position="0"> Compound words pose well-known problems for linguistic description in general, and some additional ones for natural language processing in particular: Identification: how can compounds be distinguished from other words and phrases? Segmentation: what are the components of a compound? In many languages, including German, orthographic convention is such that compounds are written as single units. I Disambiguation: what is the correct analysis of a compound? On the widespread assumption that compounds have a recursive binary structure, any occurrences with more than two basic elements will ad1Elements of German compounds may however be separated by the so-called &amp;quot;Fugenseichen&amp;quot; (8, en, etc.). mit multiple analyses, 2 from which, normally, a single candidate must be selected.</Paragraph>
    <Paragraph position="1"> Interpretation: how can the meaning of a compound word be derived from the meanings of its parts? For many purposes, there is little point in performing any of the other tasks unless this is feasible.</Paragraph>
    <Paragraph position="2"> It is clear that solutions to these four problems may be closely interrelated; ill-formed interpretations may permit unwanted analyses to be filtered out, the correct analysis will constrain possible segmentations, and so on.</Paragraph>
    <Paragraph position="3"> In what follows, we outline an approach to the treatment of compounds within a specific limited application, the automatic translation of Swiss avalanche warning bulletins, which exploits the nature of the texts involved in order to translate compounds efficiently and correctly. We first give a brief overview of the project, paying special attention to phenomena related to compounds and their translation. We then describe ELU, 3 the software employed for translation, and discuss a number of the different treatments that it permits, motivating our choice of analysis with illustrations of their weak and strong points.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="209" type="metho">
    <SectionTitle>
2 Compounds in Avalanche Bulletins
</SectionTitle>
    <Paragraph position="0"> The avalanche bulletins are issued by IFENA (the Federal Institute for the Study of Snow and Avalanches) 4 a number of times a week during the winter season, the exact frequency of their appearance depending on weather conditions. Bulletins are prepared in German, and are translated into the other official Swiss languages, French and Italian, before publication. The current state of affairs, in which the source language is exclusively German, may change in future, and for this reason it has been decided to implement a reversible system (Bouillon and Boesefeldt, 1991a; Bouillon and Boesefeldt, 1991b).</Paragraph>
    <Paragraph position="1"> Avalanche bulletins describe a fixed and specifiable semantic domain, and employ a language restricted in both vocabulary and syntactic variety. They contain a large number of compounds, with differing grammatical  properties. It is the interpretation of compounds, together with the closely related issue of structural disambiguation, which most researchers have addressed (Finin, 1980; Isabelle, 1984; Sparck Jones, 1983). In an application such as avalanche bulletin translation, however, the necessity for a deep or sophisticated analysis of compound meanings is not obvious. On the one hand, the number of compounds appearing in the sub-language is highly restricted (approximately 400); they may therefore be listed exhaustively, and the question of disambiguation does not arise. And on the other, the interpretation of a compound is given by its established translation - the meaning of Lawinengefahr ('danger of avalanches') is just the information necessary to produce the required target language expression danger d'avalanches (or pericolo di valanghe, etc.).</Paragraph>
    <Paragraph position="2"> In the current system, the main difficulty thus consists of establishing a systematic and general relation between single words in German and complex nominal structures in French, without defining the semantic and pragmatic roles of the different parts of the compound.</Paragraph>
    <Section position="1" start_page="209" end_page="209" type="sub_section">
      <SectionTitle>
2.1 Variation in German Compounds
</SectionTitle>
      <Paragraph position="0"> A purely categorial analysis of the German compound nouns contained in the avalanche warning bulletins reveals the following internal combinations of nouns, adjectives, verbs and prepositions:  'explosion above the snow' The 'head' of a compound is its final constituent which determines its properties (in the examples above, -schnee, -gefahr, -zuwachs, etc.), and may itself be a compound (-sgdhang, for example). Constituents of a compound are not inflected, although, naturally, the compound as a whole may be.</Paragraph>
      <Paragraph position="1"> There is a very strong tendency to restrict the length of compounds to three elementary constituents; of approximately 400 compound nouns in the bulletins, issued over the past few years, only one (Naflschneelawinengefahr: 'danger of wet-snow avalanches') contains four. s Complex meanings that SThis distribution can also be found in other domains (Boesefeldt, 1989) as well as in general language (Fleischer, 1982).</Paragraph>
      <Paragraph position="2"> might be expressed by a fourth element within a compound generally occur in the form of a noun taking the compound as complement (e.g. Abgang yon Naflschneelawinen, literally: 'going-down of wet-snow avalanches')~ or conventional adjectival modification (e.g. sehr gross~ allgemeine Schneebre~tgefahr 'very great general dange~ of snow patches').</Paragraph>
    </Section>
    <Section position="2" start_page="209" end_page="209" type="sub_section">
      <SectionTitle>
2.2 The French Translation
</SectionTitle>
      <Paragraph position="0"> Compound nouns in the German original texts are usually translated into French as complex nominal phrase., containing adjectival or prepositional subparts (e.g Neuschnee: neige frafche; Schneebre~gefahr: danger d~ plaques de neige). The details of translation differ in number of aspects. First of all, the French phrases do not always contain the same syntactic categories as the cot.</Paragraph>
      <Paragraph position="1"> responding German compounds. The noun-noun compound Oberflachenschicht ('surface layer'), for instance is generally translated by the sequence of noun and post.</Paragraph>
      <Paragraph position="2"> nominal adjective couches superficielles.</Paragraph>
      <Paragraph position="3"> Secondly, prepositions within French translations o: German compounds vary from case to case: Schnee brettgefahr is translated as danger de plaques de neige Schneeraster as grille d_ neige and Lawinenforschung a,.</Paragraph>
      <Paragraph position="4"> recherche su_.._Er les avalanches. The use of articles withix these PPs also varies: tempdra~ure de la neige ('snov temperature') has the definite article, while quan~i~d d~ neige ('snow quantity') has none. Even though the us~ of the different prepositions and articles might contaii clues concerning the possible semantic and pragmatic re.</Paragraph>
      <Paragraph position="5"> lation between the different elements of the compound, (d denoting an instrument, an omitted article an abstracl or partitive, etc.), the size of the corpus (only about 100( words including nouns, verbs, articles etc.) makes it im possible to draw any general conclusion which could bq implemented.</Paragraph>
      <Paragraph position="6"> Finally, certain parts of German compounds arq systematically omitted from the corresponding Frencl phrase: Schneebret~anrifl ('rupture of a snow patch' translates as rupture de plaque, where the word neig does not appear.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="209" end_page="213" type="metho">
    <SectionTitle>
3 The treatment of compounds with
ELU
</SectionTitle>
    <Paragraph position="0"> Since 1990, the ELU system has been used at ISSCO fo the implementation of a machine translation system fo Swiss avalanche warning bulletins (Boesefeldt and Bouil lon, 1991; Bouillon and Boesefeldt, 1991a; Bouillon ant Boesefeldt, 1991b). ELU is a unification-based gram mar development environment in the style of PATR-I (Shieber, 1986), designed especially for experimentatio\] with machine translation.</Paragraph>
    <Paragraph position="1"> ELU is composed of three modules: a parser, a gen erator and a reversible transfer module. Since it is th, last of these that chiefly concerns us here, we shall sa: nothing about the parser or generator. ELU support various methods of translation. By writing grammar that share meaning representations, one may implemen an interlingua-based system; the result of analysing source-language text is used directly as the basis for gen  erating the target-language text, and transfer is unnecessary. At the other extreme, ELU transfer rules are capable of performing arbitrary transformations on an input structure, and thus permit whatever variety of transfer the user desires. deg A transfer-based approach to translation has been selected for the current project.</Paragraph>
    <Paragraph position="2"> Transfer rules are written which state binary relations over the representations of texts from the source and target languages so as to associate the analysis of one language with the synthesis of another. There are two kinds of transfer rule, treating atomic and complex feature structures. For example, the following two FSs, representing the German phrase die Ge/ahr and its French equivalent le danger, DETYPB definite L DBTYPE definite are related by means of the following four rules: :TA: gefahr danger :TA: definite definite :T: pred :T: detype :LI: &lt;* pred&gt;= X :LI: &lt;* detype&gt; = X :L2: &lt;* prod&gt; = Y :L2: &lt;* detype&gt; = Y :X: X = Y :X: X = Y The transfer rules establish a relation between the pairs of atoms gefahr - danger and definite - definite as the values of the attributes PRED and DETYPZ in the two representations. The rules pred and detype abstract away from the particular values which their paths may take; a statement of the form &amp;quot;:g: g = Y&amp;quot; indicates that the success of the rule in which it appears is conditional on the success of other rules involving whatever structures are bound to X and Y during the transfer process. Transfer of a FS thus proceeds recursively, beginning with the root, and, provided that no failure occurs, terminating with the atoms which form the 'leaves' of the FS. For further details, see Estival et ai. (1990) and Russell etal. (1991).</Paragraph>
    <Paragraph position="3"> Two broad classes of solutions suggest themselves for the treatment of compounds in ELU: one, interlingua-oriented (cf. Section 3.2), in which the representations in the two languages are broadly similar, as in the Gefahr - danger example, and the other exploiting more fully the transfer mechanism (cf. Section 3.1). In the former case, it is possible to pass from a German compound to a complex French nominal phrase by employing general transfer rules of the kind required by other aspects of the translation task, while in the latter, different representations are related by complex transfer rules specific to the treatment of compounds.</Paragraph>
    <Section position="1" start_page="210" end_page="211" type="sub_section">
      <SectionTitle>
3.1 The transfer-based approach
</SectionTitle>
      <Paragraph position="0"> At first view, the transfer-based approach seems to be the more natural.</Paragraph>
      <Paragraph position="1"> O,,Codescription, analyses of the type proposed by Kaplan et al. (1989) are also possible.</Paragraph>
      <Paragraph position="2"> In German, compounds can be introduced in the lexicon as simple words and treated in the same way by the grammar. The NP eine Schneebrettgefahr, for example, will be represented by a feature structure similar to that provided for the simple eine Gefahr. The semantics of adjectival modifiers is encoded as elements in a list, since the number of these may vary. Lists are indicated in attribute-value diagrams by '(...)' and in transfer rules by '\[... \] '; those shown here are empty:  The French grammar constructs the translation of a German compound from individual words, the head of the phrase being lexically specified for its complement and/or modifiers. For example, danger subcategorizes for a PP headed by the preposition de, and containing a NP which lacks an article and whose head noun is one of a class that includes plaques. The representation constructed for the French equivalent danger de plaques de neige will be considerably more complex than the German, with sub-FSs relating to the head (danger) and complement, which itself will be a complex structure with information concerning a head (plaques) and complement (neige). Since the prepositions in these constructions are determined by the lexieal properties of the nouns, they convey no information in these eases, and do not appear in the representation, the French grammar accounts for their correct distribution.</Paragraph>
      <Paragraph position="4"> It presents a number of disadvantages. In the first place, certain generalizations which we consider important cannot be taken into account. For example, an element may appear in a number of different compounds, and the same information has to be repeated in each case.</Paragraph>
      <Paragraph position="5">  For every compound including Gebiet ('region'), for example, we have to repeat the fact that the noun Gebie~ translates as rdgion:</Paragraph>
      <Paragraph position="7"> Moreover, as this term also exists as a simple word, we also have to add a transfer rule to translate the simple word: :TA: gebiet region No account is taken of the fact that the elements inside the compounds correspond to isolated words and translate in the same way; moreover, a larger number of transfer rules is required, which, in turn, decreases the efficiency of the system.</Paragraph>
      <Paragraph position="8"> The use of this technique leads to an additional problem. It is possible to add an adjective to a compound which already contains an adjective, e.g. ganze AIpensCzdhang ('whole southern slope of the Alps'). This phrase has the following representation:</Paragraph>
      <Paragraph position="10"> The representation of equivalent French expression ~out le versant sud des Alpes, however, is as follows:</Paragraph>
      <Paragraph position="12"> For the transfer between the two languages it is necessary to establish a relation between the dements inside the lists. We cannot be sure, however, that the adjective which is a part of the German compound will always be at the same place inside the list in French. For this reason, it was decided to distinguish within the French description between adjectives which compose in German and adjectives which do not. This distinction which might seem undesirable at first sight has been proved satisfactory and generalizable for the sublanguage we treat and could be quite easily established.</Paragraph>
      <Paragraph position="13"> An adjective cannot be found inside a German compound if it is part of a coordination, if it is modified by an adverb, if it has a particular morphological suffix or if it is a participle which is used adjectivally, if it has been given a certain semantic type 7 (adjectives qualifying snow always compose, adjectives expressing a degree of danger never compose), if there already is another adjective inside the compound or if the compound contains more than three elements. In all these cases it will therefore become the value of the path MOD : NOM and not the value of the new path MOD : N_COMP.</Paragraph>
      <Paragraph position="14"> The representation of the nominal phrase ~out le versan~ sud des AIpes is then modified as follows:</Paragraph>
      <Paragraph position="16"> The attribute MOD has thus been subdivided into two parts, MOP : NOM for adjectives like ~ou~ that do not compose and MOD : N_COMP for adjectives like sud that do compose.</Paragraph>
      <Paragraph position="17"> The transfer rules for the transfer between ganze AIpens~dhang and the French ~o~ le versant sud des Alpes can then be simplified and made more general:  This technique has been successfully implemented and was the first step towards the second solution to be presented here, the interlingua-oriented solution.</Paragraph>
    </Section>
    <Section position="2" start_page="211" end_page="213" type="sub_section">
      <SectionTitle>
3.2 An interllngua-orlented approach
</SectionTitle>
      <Paragraph position="0"> In order to obtain the same representation in German and in French, the German compounds have to be assigned in the lexicon a complex representation in which the word is semantically decomposed into head and complement(s). Moreover, certain information concerning the internal structure of the French compounds has to be added to the German representation during the analysis. This process can be clone quite easily in ELU by means of macros, s The desired structures need only to be defined once, the adequate values being instantiated at the place of the variables.</Paragraph>
      <Paragraph position="1"> rLexical items are typed according to the contexts in which they may appear.</Paragraph>
      <Paragraph position="2"> SET.u macros resemble a more powerful version of PATR-II 'templates', permitting the use of arguments, and multiple or recursive definitions.</Paragraph>
      <Paragraph position="3">  The NP Alpens~dhang will thus receive the same representation as its French equivalent:  The paths ARGS : MOD : NOM and ARGS : DZTYPE in the German representation have been added because they are added by grammar rules constituting French NPs and PPs.</Paragraph>
      <Paragraph position="4"> This technique makes it possible to simplify the transfer rules, which can then be used for the transfer of not only compounds, but also other items:  :TA: alpen alpes :TA: hang versant :TA: sud sud :TA: definite definite The semantic decomposition of the German compounds not only decreases the number of transfer rules, but a/so makes the transfer more general and coherent, as the different elements of a compound are translated as simple words.</Paragraph>
      <Paragraph position="5"> The first problem of this technique consists of deciding what complex semantic representation to give German compounds. The decomposition could be performed according to the categories in either German or the target language. A disadvantage of the former is that, for the reasons given in section 2.2, transfer rules for irregular cases must be added. The latter approach avoids this difficulty by assigning the noun-noun compound Sonnenlagen ('sunny places'), for example, a representation in which the contribution of the element sonnen- and -lage are classified as modifier and predicate respectively, rather than predicate and argument:  But there is another reason to treat the irregularities between German and French in the German lexicon. Two different words can also be used as synonyms inside and outside of a compound (e.g. Grisons is translated by Graub~nden in German, but Grisons nord is translated by Nordb~inden and not by Nordgraub~nden). Nordb~nden has therefore been assigned a representation suitable for the generation of the French nominal phrase, nord des Grisons:  This representation for the German compound, however, does not seem very satisfactory because it would be necessary to write a special atomic transfer rule in addition to the transfer rule for Graub~nden: :TA: bunden grisons :TA: graubunden grisons  Moreover, this would introduce an unwanted ambiguity when translating from French into German, since the French Grisons would now be related to both B~nden and Graubgnden. Nordbgnden is therefore assigned a representation in which nord and Graubgnden are related as shown here:</Paragraph>
      <Paragraph position="7"> Changes to lexical specifications can be made very easily by altering the value of the relevant path in the lexicon entry, without affecting the surface form of the lexicai item.</Paragraph>
      <Paragraph position="8"> Similarly, certain parts of the German compound have to be eliminated from the representation in the case of French nominal phrases in which certain parts of the German compound have been excluded (cf. section 2.2).</Paragraph>
      <Paragraph position="9"> This technique, however, cannot be applied if a compound without the element that is missing in French exists in German because the same French representation would then lead to the generation of two different nominal phrases in German.</Paragraph>
      <Paragraph position="10"> In a number of cases the content of a compound can also be expressed by a complex nominal phrase with only slight stylistic differences (Fleischer, 1982: p. 2021). The semantic representation used for compounds, however, is also used to express complex NPs in German; these constructions are also translated into French nominal phrases. In constrast to the case of adjectives within compound, which are assigned a special path in the representation (MOD : N_COMP), a complex NP in German, e.g. Abgang yon Lawinen ('going-down of avalanches') will get the following representation:  which is the same as the representation for the compound Lawinenabgang, containing exactly the same information. In order to avoid over-generation resulting from the same representation being used for two different syntactic constructions, the possible syntactic structures in German have been restricted.</Paragraph>
      <Paragraph position="11"> Analysis of the corpus of avalanche bulletin texts brought to light two types of case in which the same representation could be obtained for a compound and a complex NP. In the first, a fourth element either appears with a compound in a complex NP or has been included as an element of the compound itself. As both constructions contain exactly the same information, it was decided to eliminate the second possibility, and allow no more than three elements within a compound.</Paragraph>
      <Paragraph position="12"> For example, the word Naflschneelawinengefahr ('danger of wet snow avalanches') which is sometimes used instead of Gefahr yon Naflschneelawinen does not appear in the lexicon, and the phrasal version is used instead.</Paragraph>
      <Paragraph position="13"> It was also observed that, according to the degree of lexicaiisation, nouns originally derived from a verb such as Abgang can form a part of a compound (e.g.</Paragraph>
      <Paragraph position="14"> Neuschneeablagerung - 'deposit of wet snow') as well as a part of a nominal phrase (e.g. Verfesiigung der Schneedecke - 'consolidation of the snow cover'). As no general syntactic rule could be found to deal with this phenomenon it was decided to treat it on a case by case basis according to the occurrence of derived nouns in the avalanche bulletins.</Paragraph>
      <Paragraph position="15"> The restrictions introduced into the German grammar, which are due to general language tendencies as well as to the language used in the avalanche bulletins, thus make it possible to avoid over-generation for German compounds without introducing an additional path for complex NPs. In case of the sublanguage we are treating, semantic decomposition therefore does not complicate the transfer as it is carried out according to the requirements of the target language. If phenomena of another language which might be added later contradict the current decomposition, specific transfer rules can be added to the system.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML