File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/c94-2171_evalu.xml

Size: 3,137 bytes

Last Modified: 2025-10-06 14:00:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2171">
  <Title>N-GRAM CLUSTER IDENTIFICATION DURING EMPIRICAL KNOWLEDGE REPRESENTATION GENERATION</Title>
  <Section position="6" start_page="7055" end_page="7055" type="evalu">
    <SectionTitle>
4. PRELIMINARY RESULTS
</SectionTitle>
    <Paragraph position="0"> The entire system, which is discussed in section 2, is currently trader development. The stage concerning the identification of cot~'elating paragraphs, which is discussed in section 3, has only recently been implemented.</Paragraph>
    <Paragraph position="1"> For this reason there are a limited number of results to report upon.</Paragraph>
    <Paragraph position="2"> The corpt, s currently being cousidered consists of 82 chemical patents containing over half a million words.</Paragraph>
    <Paragraph position="3"> The progr~,nls are beiug rt,n on a Sun TM Sparcstation Classic with 32 megabytes of RAM.</Paragraph>
    <Paragraph position="4"> qhble 1 presents an elementary example which is intended to demonstrate the systems scope for improvemeat as larger corpora are considered. It shows that it is possible to identify paragraphs which sufficiently correlate to provide a strong indication of fundamental concepts within the domain. In this example, a common stage of an expert,neat is being indicated.</Paragraph>
    <Paragraph position="5">  The results in table 1 were gained from analysis of a single patent containing approximately 14000 words. In the patent, 15 paragraphs contained the 4 gram (this is defined by a **4** after the first word of the n-gram) This was prepared from, and two of these contained the 9-gram oxime and 3-methoxycarbonyl-l-vinylo(c)'-carbonyl-l,2,5,6-tetrahydropyridine and recrystallised from methanol/diethyl ether; mp.</Paragraph>
    <Paragraph position="6"> This **4** was prepared from isopropyl carboxamide oxime **9** and 3-methoxycarbonyl-l-vinyloxy-earbonyl- 1,2,5,6tetrahydropyridiue and recrystallised from methanol/dlethyl ether, mp 112~C, Rf = 0.28 in dichloromethane/methanol (20:1) on silica.</Paragraph>
    <Paragraph position="7"> This **4** was prepared from phenylacetamide oxime **9** and 3-methoxycarbonyl-1vinyloxy-earbonyl- 1,2,5,6-tetrahydropyridine and reerystallised from metlmnol/ diethyl ether, nrp 154-158~C, Rf = 0.63 in dichloro-methanc/methanol (20:1) on ahnnina.</Paragraph>
    <Paragraph position="8"> Table 1 : examples of paragraph correlation Further correlations exists between the two paragraphs which have not been identified by the system due to the n-grams either being small or containing minor textual differences (e.g. OC, Rf =, and dichloro-methane/ methanol (20:1) on).</Paragraph>
    <Paragraph position="9"> Many more examples can be drawn from the analysis of this single patent which contain a large number of correlating n-grams but are too large and complicated to report on in this paper.</Paragraph>
    <Paragraph position="10"> Finally, an interesting result was that a 69-gram was identified which occurred twice within the single patent. It concerned the exph'mation of a diagram presenting the structure of a compound.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML