File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/h94-1005_metho.xml
Size: 1,246 bytes
Last Modified: 2025-10-06 14:13:51
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1005"> <Title>MULTILINGUAL TEXT RESOURCES AT THE LINGUISTIC DATA CONSORTIUM</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> MULTILINGUAL TEXT RESOURCES AT THE LINGUISTIC DATA CONSORTIUM </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> The Linguistic Data Consortium (LDC) is currently involved </SectionTitle> <Paragraph position="0"> in a major effort to expand its multilingual text resources, in particular for machine translation, message understanding and information retrieval research. The main sources for data acquisition are governmental and international organizations, newswire services, and diverse publishers.</Paragraph> <Paragraph position="1"> This paper describes some of the research that is being done to identify potential resources, discusses some of the process involved in negotiating the broadest possible access to the material for the human language technology research community, and identifies key issues and considerations in transducing the text into Common and well documented formats.</Paragraph> </Section> </Section> class="xml-element"></Paper>