File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2195_metho.xml

Size: 8,706 bytes

Last Modified: 2025-10-06 14:14:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2195">
  <Title>An Overview of the EDR Electronic Dictianary and the Current Status of Its Utilization</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The Structure of the EDR Electronic
Dictionary
</SectionTitle>
    <Paragraph position="0"> The EDR Electronic Dictionary is composed of five types of dictionaries (Word, Bilingual, Concept, Co-occurrence, and Technical Terminology), as well as thb EDR  The Japanese Word Dictionary contains 250,000 words, and the English Word Dictionary contains 190,000 words. The Bilingual Dictionary lists the correspondences between headwords in the different languages. The Japanese-English Bilingual Dictionary contains 230,000 words, and the English-Japanese Bilingual Dictionary contains 190,000 words.</Paragraph>
    <Paragraph position="1"> The Concept Dictionary contains information on the 400,000 concepts listed in the Word Dictionary and is divided according to information type into the Headconcept Dictionary, the Concept Classification Dictionary, and the Concept Description Dictionary. The ! leadconcept Dictionary describes information on the concepts themselves. The Concept Classification Dictionary describes the super-sub relations among the 400,000 concc, pt,;. The Concept Description Dictionary describes the semantic (binary) relations, such as 'agent,' 'implement,' and 'place,' between concepts that co-occur in a sentence.</Paragraph>
    <Paragraph position="2"> The Co-occurrence Dictionm'y describes collocational information in the form of binm'y relations. The Japanese CooccmTence Dictionary contains 900,000 phrases, and the English Co-occurrence Dictionary contains 460,000 phrases.</Paragraph>
    <Paragraph position="3"> The Technical Terminology Dictionary covers the field of infbrmation processing, attd is split into four types of dictionaries of Word, Bilingual, Concept (Classification), and Co-occurrence.</Paragraph>
    <Paragraph position="4"> The linguistic data which the EDR Corpus contains has been obtained by collecting a large number of example sentences and analyzing them on morphoh)gical, syntactic, attd semantic levels. The Japanese Corpus contains 220,000 sentences, and the English Corpus contains 160,000 sentences.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="1092" type="metho">
    <SectionTitle>
3 Role of Each Dictionary
</SectionTitle>
    <Paragraph position="0"> This chapter describes the roles of the major subdictionaryies of the EDR Electronic Dictionary and shows some examples.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Word Dictionary
</SectionTitle>
      <Paragraph position="0"> The role of the Word Dictionary is to provide part of the information on the morphological, syntactic, and semantic revels that is requited for natulal language processing.</Paragraph>
      <Paragraph position="1"> Morphological information relates to headword (morpheme) mid intbrmation on the connectivity of roof phemes. This is used in morphological analysis to find the morphemes, and also used in morphological generation to produce output sentences.</Paragraph>
      <Paragraph position="2"> Information on the syntactic level includes parts of speech as well as surface case information and other grammatical attributes. This information is used in syntactic analysis and generation, and provides the basis for the formulation of parsing rules and production rules.</Paragraph>
      <Paragraph position="3"> Semantic information includes concept identifiers.</Paragraph>
      <Paragraph position="4"> Headconcept and concept explications are provided as accampanying information. The concept identifier is a numerical expression and the basic constituent of the Concept Dictionary. The headconcept is a representative word that is the most appropriate in expressing the corn cept identified by the concept identifier. The concept explication is an explanation written in natural language tor the p.,i,o,~;c of assisting humans in differentiating one ~:,~nccpl Imm another. Every Word Dictionary record has a concci)t identifier to link the Word Dictionary and the Concept Dictionary.</Paragraph>
      <Paragraph position="5"> The following is an example of English Word l)ictionary record: lleadword: dog Cormoct. J v i t:y : I!',I,N\] , I,',(',N1</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="1091" type="sub_section">
      <SectionTitle>
3.2 Bilingual Dictionary
</SectionTitle>
      <Paragraph position="0"> The Bilingual Dictionary is designed to give appropriate correspondence words to the headwords contained in the Word Dictionmy, in machine processings. The headword information of the Bilingual Dictionary is a subset of the Word Dictionary, that is, headword notations, parts of speech, concept identifiers, headconcepts, and concept  explications. The eoricept identifiers and concept explications are used to indentify the meaning of the polysemous headwords. Some of the correspondence words include additional information which describes the constraints where the correspondence words are u~d.</Paragraph>
      <Paragraph position="1"> The following is an example of English Japanese</Paragraph>
    </Section>
    <Section position="3" start_page="1091" end_page="1092" type="sub_section">
      <SectionTitle>
3.3 Concept Dictionary
</SectionTitle>
      <Paragraph position="0"> The role of the Concept Dictionary is to provide the data required for computer processing of the semantic contents or the concepts, expressed in natural language sentences, such as:  (1) Generating appropriate semantic representations for sentences (2) Determining the similarity (equivalence) of semantic contents (3) Converting a semantic content into a similar (equivalent) content  For this reason, the Concept Dictionary contains three types of subdictionaries: Headconcept Dictionary, Concept Classification Dictionary, and Concept Description Dictionary. In the Concept Dictionary, each concept is uniquely identified by a concept identifier which is a hexadecimal number. The Headconcept Dictionary contains the concept identifier and the headconcept, and the concept explication. The headconcept is a word whose meaning is close to the content meaning of the concept. The concept explication is an explanation which expresses the meaning of the concept. The Concept Classification Dictionary contains the set of pairs of concepts that have super-sub (is_a) relation. For example, the super-concepts of 'school' are 'organization,' 'building,' and 'function.' The sub-concepts of 'school' are 'elementary school,' 'university,' and so forth. The Concept Description Dictionary contains the set of pairs of concepts that have certain semantic relations other than super-sub relations. The following eight semantic relations are used: object agent goal implement a-object place scene cause</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="1092" end_page="1092" type="metho">
    <SectionTitle>
3.5 EDR Corpus
</SectionTitle>
    <Paragraph position="0"> The EDR Corpus is composed of the record number, ~ntence information, constituent information, morphological information, syntactic information, and semantic information. The basic role of the EDR Corpus is first to identify the sentence constituents of sentences, and then to indicate how the constituents combine to form the morphological, syntactic and semantic structure of the sentence using a large number of actual examples. The data in the Concept Description Dictionary and the Co-occurrence Dictionary is extracted from the EDR Corpus.</Paragraph>
    <Paragraph position="1"> These subdictionaries are not indendent, but are organically connected (Figure 1).</Paragraph>
  </Section>
  <Section position="6" start_page="1092" end_page="1092" type="metho">
    <SectionTitle>
4 The Current Status of Utilization
</SectionTitle>
    <Paragraph position="0"> As we mentioned in chapter 1, we have already released the first CD-ROM version of EDR Electronic dictionary (V1.0) in April 1995 after the nine year R&amp;D project.</Paragraph>
    <Paragraph position="1"> They are now being utilized at many sites for both academic and commercial purposes (Table 1). In fiscal 1995, furthermore refinement and improvement were done and the revised version (V1.5) is available since April 1996.</Paragraph>
    <Paragraph position="2"> One of the users, Fujitsu, released a commercial product using the EDR Electronic Dictionary in 1995. The product is called &amp;quot;Denjikai for Windows V2.0,&amp;quot; which retrieves the word information from various dictionaries including EDR Electronic Dictionary.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML