File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-2024_metho.xml

Size: 4,349 bytes

Last Modified: 2025-10-06 14:07:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-2024">
  <Title>An Indexing Scheme for Typed Feature Structures</Title>
  <Section position="4" start_page="5" end_page="5" type="metho">
    <SectionTitle>
3 Performance Evaluation
</SectionTitle>
    <Paragraph position="0"> We measured the performance of the ISTFS on a IBM xSeries 330 with a 1.26-GHz PentiumIII processor and a 4-GB memory. The data set consisting of 249,994 TFSs was generated by parsing the  800 bracketed sentences in the Wall Street Journal corpus (the first 800 sentences in Wall Street Journal 00) in the Penn Treebank (Marcus et al., 1993) with the XHPSG grammar (Tateisi et al., 1998). The size of the data set was 151 MB. We also generated two sets of query TFSs by parsing five randomly selected sentences in the Wall Street Journal corpus (QuerySetA and QuerySetB). Each set had 100 query TFSs. Each element of QuerySetA was the daughter part of the grammar rules. Each element of QuerySetB was the right daughter part of the grammar rules whose left daughter part is instantiated. Table 1 shows the number of data TFSs and the average number of unifiable, more-specific and more-general TFSs for QuerySetA and QuerySetB. The total time for generating the index tables (i.e., a set of paths, the path value table (Dpi;s ), the unifiability checking table (Upi;s ), and the two subsumption checking tables) was 102.59 seconds. The size of the path value table was 972 MByte, and the size of the unifiability checking table and the two subsumption checking tables was 13 MByte. The size of the unifiability and subsumption checking tables is negligible in comparison with that of the path value table. Figure 3 shows the growth of the size of the path value table for the size of the data set. As seen in the figure, it grows proportionally.</Paragraph>
    <Paragraph position="1"> Figures 4, 5 and 6 show the results of retrieval time for finding unifiable TFSs, more-specific TFSs and more-general TFSs respectively. In the figures, the X-axis shows the number of index paths that are used for limiting the data set. The ideal time means the unification time when the filtering rate is 100%, i.e., our algorithm cannot achieve higher efficiency than this optimum. The overall time is the sum of the filtering time and the unification time.</Paragraph>
    <Paragraph position="2"> As illustrated in the figures, using one to ten index paths achieves the best performance. The ISTFS achieved 2.84 times speed-ups in finding unifiables for QuerySetA, and 37.90 times speed-ups in finding unifiables for QuerySetB.</Paragraph>
    <Paragraph position="3">  able TFSs in QuerySetA, more than 95% of non-unifiable TFSs are filtered out by using only three index paths. In the case of QuerySetB, more than 98% of non-unifiable TFSs are filtered out by using only one index path.</Paragraph>
  </Section>
  <Section position="5" start_page="5" end_page="5" type="metho">
    <SectionTitle>
4 Discussion
</SectionTitle>
    <Paragraph position="0"> Our approach is said to be a variation of path indexing. Path indexing has been extensively studied in the field of automated reasoning, declarative programming and deductive databases for term indexing (Sekar et al., 2001), and was also studied in the field of XML databases (Yoshikawa et al., 2001). In path indexing, all existing paths in the database are first enumerated, and then an index for each path is prepared. Other existing algorithms differed from ours in i) data structures and ii) query optimization.</Paragraph>
    <Paragraph position="1"> In terms of data structures, our algorithm deals with typed feature structures while their algorithms deal with PROLOG terms, i.e., variables and instantiated terms. Since a type matches not only the same type or variables but unifiable types, our problem is much more complicated. Yet, in our system, hierarchical relations like a taxonomy can easily be represented by types. In terms of query optimization, our algorithm dynamically selects index paths to minimize the searching cost. Basically, their algorithms take an intersection of candidates for all paths in a query, or just limiting the length of paths (McCune, 2001). Because such a set of paths often contains many paths ineffective for limiting answers, our approach should be more efficient than theirs.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML