File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1217_intro.xml
Size: 1,072 bytes
Last Modified: 2025-10-06 14:01:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1217"> <Title>How Should a Large Corpus Be Built?-A Comparative Study of Closure in AAnnotated Newspaper Corpora from Two Chinese Sources, Towards Building A Larger Representative Corpus Merged from Representative Sublanguage Collections</Title> <Section position="3" start_page="116" end_page="116" type="intro"> <SectionTitle> 2 Overview </SectionTitle> <Paragraph position="0"> This work applies the methodology of McEnery and Wilson to examine closure rates in a comparative study of all available tagged Chinese newspaper corpora. First I define lexical and syntactic closure for this study in section 3.</Paragraph> <Paragraph position="1"> Then, section 4 begins this study with an examination of ~ the newspaper texts of the Academica Sinica Balanced Corpus (ASBC). Section 5 extends this study to an examination of the newspaper texts of the UPenn Chinese Tree-bank (CTB). Section 6 presents my findings and section 7 discusses some implications for future corpus building.</Paragraph> </Section> class="xml-element"></Paper>