File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0319_concl.xml
Size: 2,316 bytes
Last Modified: 2025-10-06 13:53:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0319"> <Title>An LSA Implementation Against Parallel Texts in French and English</Title> <Section position="7" start_page="3" end_page="4" type="concl"> <SectionTitle> 6. Conclusion </SectionTitle> <Paragraph position="0"> This paper presented a brief discussion on the possibility that the LSA methodology may have something to contribute with respect to identifying the difficulty level of the MT and TA tasks. In particular, it was shown that LSA can represent the symmetrical and non-symmetrical relationships that exist among the terms in cross-language document pairs. It was also shown that LSA has some capability to &quot;align&quot; similar terms in order to identify relevant documents in response to a query.</Paragraph> <Paragraph position="1"> Such positive results using the large, non-homegenous documents that were used in this analysis are very promising and suggest that further research in this area is needed. Additional work is planned that will separate the documents into smaller, syntactical units that will be analyzed using a very similar approach. However, the words and &quot;documents&quot; represented in the semantic space that is generated by these very small syntactic units will have very different associations and relationships than they do as part of the larger, non-homegenous texts.</Paragraph> <Paragraph position="2"> It is believed that by restricting the input texts to very small syntactic units, the LSA methodology will be able to make the proper &quot;alignment&quot; associations between the cross-language word pairs and, as a result, provide some information regarding the types of word pairs that are most difficult to align by an MT or TA system.</Paragraph> <Paragraph position="3"> Finally, the small number of documents used in this analysis raises questions about the optimal number of input texts that are needed to obtain valid results using the LSA methodology. It may be that a much smaller number of input texts is needed than formerly believed, in order to &quot;train&quot; an LSA-based system to correlate the appropriate cross-language word pairs.</Paragraph> <Paragraph position="4"> 30 mated pairs (or 60 documents in total)</Paragraph> </Section> class="xml-element"></Paper>