File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/05/i05-4010_abstr.xml
Size: 968 bytes
Last Modified: 2025-10-06 13:44:20
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-4010"> <Title>Harvesting the Bitexts of the Laws of Hong Kong From the Web</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> In this paper we present our recent work on harvesting English-Chinese bitexts of the laws of Hong Kong from the Web and aligning them to the subparagraph level via utilizing the numbering system in the legal text hierarchy. Basic methodology and practical techniques are reported in detail. The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws. It is particularly valuable to empirical MT research. This piece of work has also laid a foundation for exploring and harvesting English-Chinese bitexts in a larger volume from the Web.</Paragraph> </Section> class="xml-element"></Paper>