File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/94/c94-2171_concl.xml
Size: 2,226 bytes
Last Modified: 2025-10-06 13:57:13
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-2171"> <Title>N-GRAM CLUSTER IDENTIFICATION DURING EMPIRICAL KNOWLEDGE REPRESENTATION GENERATION</Title> <Section position="7" start_page="7055" end_page="7055" type="concl"> <SectionTitle> 5. CONCLUSIONS </SectionTitle> <Paragraph position="0"> I am not aware of any techniques, within knowledge representation generation research, which are significantly similar to this clustering approach. The novelty is due to the use of n-gram correspondences during the identification of sets of paragraphs containing similar conceptual inlbrmation, and the employment of these examples to emphasise the fundamental concepts within the domain.</Paragraph> <Paragraph position="1"> For this reason, it could prove to be a rewarding area for further research.</Paragraph> <Paragraph position="2"> Due to the nature of technical documents and technical language, a large quantity of the phrases used ,'u'e highly structured and standardised. This formalism implies that the n-gram clustering approach will produce effective resnlts during the identification of conceptually similar paragraphs.</Paragraph> <Paragraph position="3"> An essential test will be the assessment of a domain-specific semantic representation created using the correlating paragraphs generated by the system and the tools mentioned in section 2. It will be necessary to evaluate the scope and quality of the representation. One possibility is to compare, using an identical corpus, a representation created by a group of experts with that of the system. The fundamental point to convey is that as larger corpora are analysed the quantity of examples and quality of correlations will improve. The results of filrther experimentation and analysis will be reported in fi~ture publications. null Although this knowledge representation generation is the flmdamental stage of the process outlined in section 2, it is only a fragment of the entire system. An application developed using this process has the potential to be invaluable for domain specialists who wish to identify documents contailbing simih'u&quot; conceptual information within extremely large knowledge bases.</Paragraph> </Section> class="xml-element"></Paper>