File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2005_concl.xml
Size: 2,594 bytes
Last Modified: 2025-10-06 13:55:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2005"> <Title>Tagging Portuguese with a Spanish Tagger Using Cognates</Title> <Section position="8" start_page="38" end_page="39" type="concl"> <SectionTitle> 8 Conclusion </SectionTitle> <Paragraph position="0"> We have shown that a tagging system with a small amount of manually created resources can be successful. We have previously shown that this approach can work for Czech and Russian (Hana et al., 2004; Feldman et al., 2006). Here we have shown its applicability to a new language pair.</Paragraph> <Paragraph position="1"> This can be done in a fraction of the time needed for systems with extensive manually created resources: days instead of years. Three resources are required: (i) a reference grammar (for information about paradigms and closed class words); (ii) a large amount of text (for learning a lexicon; e.g. newspapers from the internet); (iii) a limited access to a native speaker -- reference grammars are often too vague and a quick glance at results can provide feedback leading to a significant increase of accuracy; however both of these require only limited linguistic knowledge.</Paragraph> <Paragraph position="2"> In this paper we proposed an algorithm for cognate transfer that effectively projects the source language emission probabilities into the target language. Our experiments use minimal new human effort and show 21% error reduction over even emissions on a fine-grained tagset.</Paragraph> <Paragraph position="3"> In the near future, we plan to compare the effectiveness (time and price) of our approach with that of the standard resource-intensive approach to annotating a medium-size corpus (on a corpus of around 100K tokens). A resource-intensive system will be more accurate in the labels which it offers to the annotator, so annotator can work faster (there are fewer choices to make, fewer keystrokes required). On the other hand, creation of the infrastructure for such a system is very time consuming and may not be justified by the intended application.</Paragraph> <Paragraph position="4"> The experiments that we are running right now are supposed to answer the question of whether training the system on a small corpus of a closely related language is better than training on a larger corpus of a less related language. Some preliminary results (Feldman et al., 2006) suggest that using cross-linguistic features leads to higher pre- null cision, especially for the source languages which have target-like properties complementary to each other.</Paragraph> </Section> class="xml-element"></Paper>