File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/88/a88-1028_concl.xml

Size: 1,609 bytes

Last Modified: 2025-10-06 13:56:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="A88-1028">
  <Title>COMPUTATIONAL TECHNIQUES FOR IMPROVED NAME SEARCH</Title>
  <Section position="7" start_page="209" end_page="209" type="concl">
    <SectionTitle>
5.0 PERFORMANCE
</SectionTitle>
    <Paragraph position="0"> Although the statistical model building is computationally intensive and time-consuming (several hours), the actual classification procedure is very efficient. The average cpu time to classify a query name was under 200 msec on a VAX-11/780. The rule component that generates spelling variants can process 100 query names in about 2-6 cpu seconds, the difference in time depending on average length of nal-ne.</Paragraph>
    <Paragraph position="1"> As for retrieval performance, in a test of 160 query names (including names known to be in the database and spelling variants not known to be in the database), there were 111 hits (69%) using NYSIIS procedures alone and 141 hits (88%) using the front-end language classifier and linguistic rules and sending the expanded query set to NYSIIS.</Paragraph>
    <Paragraph position="2"> In recent work, this technique has been extended to include modeling a database of Slavic surnames. Language classification accuracy based on a combined database of 13000 surnames representing Spanish, Farsi, Vietnamese, Slavic and 'other' names, with combined training data (1000 names from each language group to build each language model) and test data (remaining 8000 names), is 96.8% for Vietnamese, 87.7% for Farsi, 86.9% for Spanish, 86.5% for Slavic, and 82.9% for 'other'.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML