File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0310_concl.xml

Size: 2,532 bytes

Last Modified: 2025-10-06 13:53:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0310">
  <Title>Bootstrapping Parallel Corpora</Title>
  <Section position="5" start_page="400000" end_page="400000" type="concl">
    <SectionTitle>
4 Discussion and Future Work
</SectionTitle>
    <Paragraph position="0"> In this paper we presented two methods for the automatic creation of additional parallel corpora. Co-training uses a number of different human translated parallel corpora to create additional data for each of them, leading to modest increases in translation quality. Coaching uses existing resources to create a fully machine translated corpora essentially reverse engineering the knowledge present in the human translated corpora and transferring that to another language. This has significant implications for the feasibility of using statistical translation methods for language pairs for which extensive parallel corpora do not exist.</Paragraph>
    <Paragraph position="1"> A setting in which this would become extremely useful is if the European Union extends membership to a new country like Turkey, and wants develop translation resources for its language. One can imagine that sizable parallel corpora might be available between Turkish and a few EU languages like Greek and Italian. However, there may be no parallel corpora between Turkish and Finnish.</Paragraph>
    <Paragraph position="2"> Our methods could exploit existing parallel corpora between the current EU language and use machine translations from Greek and Italian in order to create a machine translation system between Turkish and Finnish.</Paragraph>
    <Paragraph position="3"> We plan to extend our work by moving from co-training and its variants to another weakly supervised learning method, active learning. Active learning incorporates human translations along with machine translations, which should ensure better resulting quality than using machine translations alone. It will reduce the cost of creating a parallel corpus entirely by hand, by selectively and judiciously querying a human translator. In order to make the most effective use of the human translator's time we will be required to design an effective selection algorithm, which is something that was neglected in our current research. An effective selection algorithm for active learning will be one which chooses those examples which will add the most information to the machine translation system, and therefore minimizes the amount of time a human needs to spend translating sentences.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML