File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-1508_evalu.xml

Size: 4,076 bytes

Last Modified: 2025-10-06 13:59:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1508">
  <Title>Stochastic Multiple Context-Free Grammar for RNA Pseudoknot Modeling</Title>
  <Section position="6" start_page="61" end_page="62" type="evalu">
    <SectionTitle>
4 Experimental Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="61" end_page="62" type="sub_section">
      <SectionTitle>
4.1 Data for Experiments
</SectionTitle>
      <Paragraph position="0"> The dataset for experiments was taken from an RNA family database called &amp;quot;Rfam&amp;quot; (version 7.0) (Griffiths-Jones et al., 2003) which is a database of multiple sequence alignment and covariance models (Eddy and Durbin, 1994) representing non-coding RNA families. We selected three viral RNA families with pseudoknot annotations named Corona pk 3 (Corona), HDV ribozyme (HDV) and Tombus 3 IV (Tombus) (see Table 2).</Paragraph>
      <Paragraph position="1"> Corona pk 3 has a simple pseudoknotted structure, whereas HDV ribozyme and Tombus 3 IV have more complicated structures with pseudoknot. null</Paragraph>
    </Section>
    <Section position="2" start_page="62" end_page="62" type="sub_section">
      <SectionTitle>
4.2 Implementation
</SectionTitle>
      <Paragraph position="0"> We specified a particular SMCFG Gs by utilizing secondary structure annotation of each family. Rules were determined by considering consensus secondary structure. Probability parameters were estimated in a few selected sequences by the simplest pseudocounting method known as the Laplace's rule (Durbin et al., 1998): to add one extra count to the true counts for each base configuration observed in a few selected sequences. Note that the inside-outside algorithm was not used in the experiments. The other sequences in the alignment were used as the test sequences for prediction (see Table 2). We implemented the CYK algorithm with traceback in ANSI C on a machine with Intel Pentium D CPU 2.80 GHz and 2.00 GB RAM. Straightforward implementation gives rise to a serious problem of lack of memory space due to the higher order dynamic programming matrix (remember that the space complexity of the CYK algorithm is O(mn4)). The dynamic programming matrix in our specified model is sparse, and therefore, we successfully implemented the matrix as a hash table storing only nonzero probability values (equivalently, finite values of the logarithm of probabilities).</Paragraph>
    </Section>
    <Section position="3" start_page="62" end_page="62" type="sub_section">
      <SectionTitle>
4.3 Tests
</SectionTitle>
      <Paragraph position="0"> We tested prediction accuracy by calculating precision and recall (sensitivity), which are the ratio of the number of correct base pairs predicted by the algorithm to the total number of predicted base pairs, and the ratio of the number of correct base pairs predicted by the algorithm to the total number of base pairs specified by the trusted annotation, respectively. The results are shown in Table 3. A nearly correct prediction (94.4% precision and recall) for Corona pk 3 is shown in Figure 2 where underlined base pairs agree with trusted ones. The secondary structures predicted by our algorithm agree very well with the trusted structures. null</Paragraph>
    </Section>
    <Section position="4" start_page="62" end_page="62" type="sub_section">
      <SectionTitle>
4.4 Comparison with PSTAG
</SectionTitle>
      <Paragraph position="0"> We compared the prediction accuracy of our SMCFG algorithm with that of PSTAG algorithm (Matsui et al., 2005) (see Table 4). PSTAGs, as we have mentioned before, are proposed for modeling pairwise alignment of RNA sequences with pseudoknots and assign a probability to each alignment of TAG derivation trees. PSTAG algorithm, based on dynamic programming, calculates the most likely alignment for the pair of TAG derivation trees where one of them is in the form of an unfolded sequence and the other is a TAG derivation tree for known structure. SMCFG method shows better performance in accuracy than PSTAG method in the same test sets.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML