File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-2005_concl.xml
Size: 1,424 bytes
Last Modified: 2025-10-06 13:55:20
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2005"> <Title>phrasing Based on Parallel Corpus for Normaliza-</Title> <Section position="11" start_page="111" end_page="111" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> In this paper, we study the differences among SMS normalization, general text normalization, spelling check and text paraphrasing, and investigate the different phenomena of SMS messages.</Paragraph> <Paragraph position="1"> We propose a phrase-based statistical method to normalize SMS messages. The method produces messages that collate well with manually normalized messages, achieving 0.8070 BLEU score against 0.6958 baseline score. It also significantly improves SMS translation accuracy from 0.1926 to 0.3770 in BLEU score without adjusting the MT model.</Paragraph> <Paragraph position="2"> This experiment results provide us with a good indication on the feasibility of using this method in performing the normalization task. We plan to extend the model to incorporate mechanism to handle missing punctuation (which potentially affect MT output and are not being taken care at the moment), and making use of pronunciation information to handle OOV caused by the use of phonetic spelling. A bigger data set will also be used to test the robustness of the system leading to a more accurate alignment and normalization.</Paragraph> </Section> class="xml-element"></Paper>