File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/c00-2119_abstr.xml
Size: 1,014 bytes
Last Modified: 2025-10-06 13:41:37
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2119"> <Title>Using a Broad-Coverage Parser for Word-Breaking in Japanese</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We describe a method of word segmentation in Japanese in which a broad-coverage parser selects the best word sequence while producing a syntactic analysis. This technique is substantially different from traditional statistics- or heuristics-based models which attempt to select the best word sequence before handing it to the syntactic component. By breaking up the task of finding the best word sequence into the identification of words (in the word-breaking component) and the selection of the best sequence (a by-product of parsing), we have been able to simplify the task of each component and achieve high accuracy over a wide varicty of data. Word-breaking accuracy of our system is currently around 97-98%.</Paragraph> </Section> class="xml-element"></Paper>