File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-2081_metho.xml

Size: 4,683 bytes

Last Modified: 2025-10-06 14:13:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-2081">
  <Title>The Automatic Creation of Lexical Entries for a Multilingual MT System</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ULTRA 'S Lexicons
</SectionTitle>
    <Paragraph position="0"> There are two types of entries related to the specification of a lexical item in the ULTRA system: those for intermediate representation (IR) word sense tokens, and those for the words of the individual languages.</Paragraph>
    <Paragraph position="1"> Currently, there are eight IR word sense categories including entities (often corresponding to nouns), relations (often corresponding to verbs and adjectives), entity specifiers (often corresponding to determiners), relation specifiers (often corresponding to auxiliaries), case relations (often corresponding to prepositions), pro-AcrEs DE COLING-92. NAN'rl;s. 23-28 Ao~r 1992 S 3 4 PRoc. OF COLING-92, NANTES, AUG. 23-28, 1992 position specifiers (often corresponding to complementizers), proposition modifiers (often corresponding to sentential adverbials), and conjunctions. Each category is associateA with a special set of constraints which ranges in number fiom one for sentential adverbs, to nine for relations. The number of lexical categories for the individual language lexicons varies from eight to fourteen. There is no simple correspondence between the language-particular lexieai categories and the IR categories although the gross relationships stated above appear to hold.</Paragraph>
    <Paragraph position="2"> All entries take the general form of simple Prolog unit clauses in (12): (12) category (Form, F1, F2, ...).</Paragraph>
    <Paragraph position="3"> where FI, F2 and so on, are constraints. For language-particular entries, these arc gener',dly syntactic constraints associated with an orthographic form, Form, such as the gender of a noun, whether a verb is reflexive, and so on.</Paragraph>
    <Paragraph position="4"> For example, (13) is a simplified and readable version of a Spanish enlxy for the noun banco.</Paragraph>
    <Paragraph position="5"> 113) noun (banco, thirdsingular, masculine, bank4_ 1 ).</Paragraph>
    <Paragraph position="6"> Similarly, (14) is a Spanish entry for the verb ingreso: (14) verb (ingreso, thirdsingular, finite, past, simple, indicative, active, depositl 3).</Paragraph>
    <Paragraph position="7"> The final argument represents the IR word sense the Spanish form is used to express. This sense token is associated with a sense definition in LDOCE and is used to index the corresponding IR entry.</Paragraph>
    <Paragraph position="8"> For IR entries, tbe features FI, F2, and so on, correspond to universal semantic and pragmatic constraints on the word sense, Form, such as the classification of an entity as countable or not, the semantic case structure of a relation, and so on. For example the IR entry for bank4 1 would look something like:  (15) entity (bank4_l, class, countable, institution, abstract_object, economicsbanking). while the IR entxy for depositl 3 would look like: (16) relation (depositl_3, dynamic, placing, agent, patient, human, amount, human, abstract _object, economies_banking).</Paragraph>
    <Paragraph position="9"> 3. The Automatic Construction of Lexieal</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Items
</SectionTitle>
      <Paragraph position="0"> The work on automating lexieal entry has drawn upon extensive research at the Computing Research Laboratory in deriving semantic struethrees automatic',dly fiom large machine-readable dictionaries \[Slator, 1988; Wilks &amp; Slator, 1989; Guthfie eL al 1990\]. Much of the core IR lexicon has been deiived fi'om the 72,000 word senses in LDOCE. Codings fi'om the dictionary for such properties as semantic category, semantic preferences and so on have been used, either directly o~ indirectly, to generate partial specifications of some 10,000 IR tokens for the system.</Paragraph>
      <Paragraph position="1"> The partially antomated lexical entry proeess proceeds in three steps: 1) given a sense in LDOCE, an entry is constructed by a process of automatic extraction and formatting of intormation in the foml of a standardized data structure, 2) any remaining unspecified information in that structure is provided interactively, followed by 3) the automatic mapping from the fuUy specified data structure to the corresponding Prolog facts. Step 3) is very straightfolward and will not be de,scribed here. Below we give a short description of LDOCE and then discuss the techniques we have used to accomplish steps 1) and 2).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML