File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2810_concl.xml

Size: 1,616 bytes

Last Modified: 2025-10-06 13:55:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2810">
  <Title>FindingSimilarSentencesacrossMultipleLanguagesin Wikipedia</Title>
  <Section position="7" start_page="67" end_page="68" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> Inthispaperwefocusedonmultilingualaspectsof Wikipedia.Particularly, weinvestigatedthepotentialofWikipediaforgeneratingparallelcorporaby null applyingdifferentmethodsfor identifyingsimilar text acrossmultiplelanguages.We presentedtwo methodsand carriedout an evaluationon a sampleofDutch-EnglishWikipediapages.Theresults null show that both methods,usingsimpleheuristics, wereableto identifysimilartext betweenthe pair of Wikipediapagesthoughthey differin accuracy.</Paragraph>
    <Paragraph position="1">  Thebilinguallexiconapproachreturnsfewerincorrectpairs than the MT based approach. We interpretthis as sayingthat our bilinguallexicon basedmethodprovidesa moreaccuraterepresentationof the aboutnessof sentencesin Wikipedia  thantheMTbasedapproach.Furthermore,theresult we obtainedon a sampleof Wikipediapages and the outputof runningthe bilingual basedapproachonthewholeDutch-Englishgivessomein- null dicationof the potentialof Wikipediafor generatingparallelcorpora. null  As to futurework, the sentencesimilarity detectionmethodsthatweconsideredarenotperfect. null  E.g.,theMTbasedapproachreliesonroughtranslations; it is importantto investigate the contributionof highqualitytranslations.The bilingual lexiconapproachusesonlylexicalfeatures;other  languagespecificsentencefeaturesmighthelpimprove results.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML