XML Viewer - w91-0216

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/w91-0216_metho.xml
Size: 8,837 bytes
Last Modified: 2025-10-06 14:12:51
<?xml version="1.0" standalone="yes"?>
<Paper uid="W91-0216">
  <Title>For the Lexicon That Has Everything</Title>
  <Section position="3" start_page="181" end_page="182" type="metho">
    <SectionTitle>
3 Entries for Phrases
</SectionTitle>
    <Paragraph position="0"> We have been concerned for several years with the design of entries for phrases; it seems apparent that we need to record the same kind of information for phrases as for single word entries and that they are involved in the same lexical relations as other words and more besides \[Markowitz et al. 1988; Ahlswede et al. 1988\]. Li and Markowitz are concentrating on questions about phrasal verbs. Problems about the kinds of constructions that these verbs take part in have often been discussed in the literature but not resolved. Markowitz has devised several series of examples that we are trying out on every passerby who happens to be a native speaker of English. The data collected so far is chaotic; it suggests that the explanations in the literature are over-simplified. CED contains many thousands of phrasal main entries and many more phrases appear as runons in other entries. We are trying to design programs to translate these phrasal entries into entries in the lexical database.</Paragraph>
  </Section>
  <Section position="4" start_page="182" end_page="182" type="metho">
    <SectionTitle>
4 Arguments for Verbs
</SectionTitle>
    <Paragraph position="0"> Information about appropriate arguments for verbs is an obvious need. We are building a table that indicates for each sense of the verb what cases it takes, how those cases are syntactically realized (as subject, object, or object of a preposition), whether it is obligaory or not, and what are the selection restrictions on the fillers of those case slots. Joanne Dardaine wrote a program to build skeleton entries for the verbs in the Brandeis Verb Lexicon \[Grimshaw and Jackendoff, 1985\]. Then we sit around and argue about additional examples, beginning with verbs that we are using in text generation in a tutoring system for cardiovascular physiology \[Zhang et hi. 1990\] and in an explanation subsystem for an expert system for stroke \[Lee and Evens 1991\]. Grimshaw's new book \[1990\] on argument patterns has been of the greatest help. Given the theoretical disagreements between Fillmore \[1970\], Bresnan \[1982\], and Grimshaw \[1990\], it is not possible to come up with an ideal solution. When in doubt we try to make the finest distinctions we can, in the belief that it will be easier for others to clump our categories together than to divide them further.</Paragraph>
    <Paragraph position="1"> Clearly much of what we are doing for verbs needs to be done for adjectives and adverbs. Much of the necessary research for adverbs has been carried out by Householder's group \[1965\] and by Sven Jacobson and published in very detailed and useful forms \[1964, 1978\]. Jacobson has generously given permission for us &amp;quot;to include this work in our database. Sumali Pin-Ngern Conlon is using the superb computing facilities of the University of Mississippi, where she is now a faculty member, to put this material into machine readable form and to combine it with the information from the Indiana Adverb Lists \[Householder et hi. 1965\]. We are trying to locate and understand more of the research on adjectives such as the work of Ljung at Goteborg, before we start to enhance our adjective tables appropriately.</Paragraph>
  </Section>
  <Section position="5" start_page="182" end_page="182" type="metho">
    <SectionTitle>
5 Sentential Complements
</SectionTitle>
    <Paragraph position="0"> We have split off the problem of sentential complements from other arguments for verbs because we wanted to store this information in separate database tables and because there are there are separate rich sources of information. Yu-Fen Huang has entered the verbs from Wierzbicka's list of speech act verbs. We are trying to find out if CED synonyms of speech act verbs are also speech act verbs and if they sometimes fit into the same speech act classes \[Wierzbicka 1989\] or performative classes, using McCawley's \[1979\] categories.</Paragraph>
    <Paragraph position="1"> Pin-Ngern wrote a program to put Indiana Verb List verbs \[Alexander and Kunz 1964; Bridgeman et hi. 1965\] into tables in the database. Huang is rewriting that program to include further information and trying to correlate Wierzbicka's \[1989\] speech act verbs and the Indiana verbs with their CED homograph and sense numbers.</Paragraph>
  </Section>
  <Section position="6" start_page="182" end_page="183" type="metho">
    <SectionTitle>
6 Sublanguage Information
</SectionTitle>
    <Paragraph position="0"> CED contains quite a lot of information about sublanguage and register (e.g., entries begin &amp;quot;a legal term for&amp;quot; or &amp;quot;a slang name for&amp;quot;). We are trying to figure out how and where to capture this information so that we can study it more effectively and also so that we can figure out to use it to make appropriate subsets of the lexical database.</Paragraph>
    <Paragraph position="1">  Of course, sublanguage affects the syntactic correlates of words as well as the lexical ones. It is clear that we need to relate syntactic information in the lexical database to a given sense and homographic number.</Paragraph>
    <Paragraph position="2"> We are designing tools to help us deliver subsets of the database to potential users. Clearly we need to be able to make subsets on the basis of sublanguage information as well as from word lists given us by people who want data to match. We expect to make this kind of data available in fiat files (unless the user has an Oracle Relational Database Management System). All the attributes currently recorded in the database are also defined in the database. Any user of the database will be provided with this information. We expect that most of these users will need add to information to the data that we give them. So far our lexical data acquisition tools function mainly as SQL forms \[Evens et al. 1989\]. We need to provide flat file versions of these tools.</Paragraph>
  </Section>
  <Section position="7" start_page="183" end_page="183" type="metho">
    <SectionTitle>
7 Tools for Accessing and Building the Database
</SectionTitle>
    <Paragraph position="0"> We a re designing two families of tools, one for building the database and one for accessing it. Database construction tools themselves fall into three categories. One group of tools is intended to collect information from human informants to make it easy to add material to the lexicon for some special purpose or to extend existing information. For example, we have a tool to examine synonyms of verbs on the Indiana Verb Lists that also take sentential complements and add them to the correct lists \[Evens et aL 1989\]. Another group of tools is intended to take explicit information from a source and put it into the right table or tables. The third group of tools, most of which were originally built for sublanguage study, is designed to tackle text, sometimes dictionary definitions, sometimes other text, and extract information from it. These tools make lists of words and phrases and count them and parse text. Frank Rinaldo has built most of these tools and is working on bigger and better ones.</Paragraph>
    <Paragraph position="1"> Our Oracle database expert, Robert Strutz, is working on tools to access the database.</Paragraph>
    <Paragraph position="2"> These tools extract information to be used by a parser or a text generation program. Other tools in this category check the database for missing data and make reports. One tool makes a list of nouns that appear in subsidiary noun tables but not in the main noun table, for example. Still other tools make subsets of the database for different kinds of user specifications.</Paragraph>
  </Section>
  <Section position="8" start_page="183" end_page="184" type="metho">
    <SectionTitle>
8 Current Applications
</SectionTitle>
    <Paragraph position="0"> A small subset of the lexical database, the stroke lexicon \[Ahlswede and Evens, 1988b\], is being used in experiments in information retrieval and text generation. Wang et aL \[1989\] are using thesaurus information to enhance queries in an interactive information retrieval system, which operates as a separate PC program and carries out searches of the stroke literature either independently or in support of an expert system. Lee and Evens \[1991\] are using the stroke lexicon to generate explanations for an expert system for stroke.</Paragraph>
    <Paragraph position="1"> Information about lexical-semantic relations is used in an experiment to make that text cohesive; other lexical information is used to support the basic generation process.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML