File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/w00-0901_concl.xml

Size: 1,068 bytes

Last Modified: 2025-10-06 13:52:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0901">
  <Title>Comparing Corpora using Frequency Profiling</Title>
  <Section position="6" start_page="4" end_page="4" type="concl">
    <SectionTitle>
4 Conclusions
</SectionTitle>
    <Paragraph position="0"> reliability of the statistical tests (LL, Pearson~ X 2 and others) under the effects of corpus size, ratio of the corpora being compared and word (or tag) frequency.</Paragraph>
    <Paragraph position="1"> We do not propose a completely automated approach. The tools suggest a group of key items by decreasing order of significance which distinguish one corpus from another. It is then that the researcher should investigate occurrences of the significant items in the corpora using standard corpus techniques such as KWIC (key-word in context). The reasons behind their significance can be discovered and explanations sought for the patterns displayed. By this process, we can compare the corpora under investigation and make hypotheses about the language use that they represent.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML