File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/h94-1073_metho.xml

Size: 2,456 bytes

Last Modified: 2025-10-06 14:13:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1073">
  <Title>Assessing the Retrieval Effectiveness of a Speech :Retrieval System by Simulating Recognition Errors</Title>
  <Section position="3" start_page="370" end_page="371" type="metho">
    <SectionTitle>
2. Test Setting
</SectionTitle>
    <Paragraph position="0"> The experiments are performed by means of the the standard information retrieval text collections CRANFIELD, MED-LAtLS, and CACM \[2\]. The indexing vocabulary consists of the VCV-, CV-, and VC-features ~i whose inverse document frequency n+l 1) idf(~,) := log k df(~i) S r is between the lower bound idfmin := 1.6 and an upper bound idfma~ which is chosen such that the indexing vocabulary consists of exactly 1000 features. Every indexing feature ~i and every document dj is assigned a weight</Paragraph>
    <Paragraph position="2"> Analogously, every indexing feature ~i is assigned a weight</Paragraph>
    <Paragraph position="4"> with respect to a given query q. As usual, the documents are presented to the user in decreasing order of the Retrieval Status Values RSV(q, dj) that are determined by the cosine measure.</Paragraph>
    <Paragraph position="5"> ~i aid * bi RSV(q, d~) := The recognition errors were simulated in three steps. First, a parser converts a text document into a sequence of indexing features by detecting VCV-, CV-, and VC-features that belong to the indexing vocabulary. Second, the sequence of indexing features is converted into another sequence of indexing features by removing features as follows. For every feature in the input sequence it is randomly determined whether it is recognized or not. If it is recognized, the feature is included in the output sequence; otherwise, it is removed. The probability that a feature is recognized is equal to the specified detection rate. Third, the reduced sequence of indexing features is converted into a final sequence by adding indexing features as follows. Assume that the original document consists of k occurrences of a word. According to \[8\], an average speaker needs approximately k/170 minutes or At := k/1020 hours for such a document with k word occurrences. We then add! fa * At occurrences of every indexing feature where f a denolLes the specified false alarms per keltword per hour.</Paragraph>
    <Paragraph position="6"> For simplicity, every indexing feature is assumed to have the same detection rate and false alarms.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML