File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1416_intro.xml

Size: 6,559 bytes

Last Modified: 2025-10-06 14:03:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1416">
  <Title>Generating Multiple-Choice Test Items from Medical Text: A Pilot Study</Title>
  <Section position="4" start_page="0" end_page="112" type="intro">
    <SectionTitle>
2 Multiple-Choice Test Item Generation
</SectionTitle>
    <Paragraph position="0"> A MCTI such as the one in example (1) typically consists of a question or stem, the correct answer or anchor (in our example, &amp;quot;chronic hepatitis&amp;quot;) and a list of distractors (options b to d):  (1) Which disease or syndrome may progress to cirrhosis if it is left untreated? a) chronic hepatitis b) hepatic failure c) hepatic encephalopathy d) hypersplenism  The MCTI in (1) is based on the following clause from the source text (called the source clause; see section 2.3 below): (2) Chronic hepatitis may progress to cirrhosis if it is left untreated.</Paragraph>
    <Paragraph position="1"> We aim to automatically generate (1) from (2) using our simple Rapid Item Generation (RIG) system that combines several components available off-the-shelf. Based on Mitkov et al., we saw MCTIG as consisting of at least the following tasks: a) Parsing b) Key-Term Identification c) Source Clause Selection d) Transformation to Stem e) Distractor Selection. These are discussed in the following sections.</Paragraph>
    <Section position="1" start_page="111" end_page="111" type="sub_section">
      <SectionTitle>
2.1 Sentence Parsing
</SectionTitle>
      <Paragraph position="0"> Sentence Parsing is crucial for MCTIG since the other tasks rely greatly on this information. RIG employs Charniak's (1997) parser which appeared to be quite robust in the medical domain.</Paragraph>
    </Section>
    <Section position="2" start_page="111" end_page="111" type="sub_section">
      <SectionTitle>
2.2 Key-Term Identification
</SectionTitle>
      <Paragraph position="0"> One of our main premises is that an appropriate MCTI should have a key-term as its anchor rather than irrelevant concepts. For instance, the concepts &amp;quot;chronic hepatitis&amp;quot; and &amp;quot;cirrhosis&amp;quot; are quite prominent in the source text that example (2) comes from, which in turn means that MCTIs containing these terms should be generated using appropriate sentences from that text.</Paragraph>
      <Paragraph position="1"> RIG uses the UMLS thesaurus3 as a domain specific resource to compute an initial set of potential key terms such as &amp;quot;hepatitis&amp;quot; from the source text. Similarly to Mitkov et al., the initial set is enlarged with NPs featuring potential key terms as their heads and satisfying certain regular expressions. This step adds terms such as &amp;quot;acute hepatitis&amp;quot; (which was not included in the version of UMLS utilised by our system) to the set.</Paragraph>
      <Paragraph position="2"> The tf.idf method (that Mitkov et al. did not find particularly effective) is used to promote the 30 most prominent potential key terms within the source text for subsequent processing, ruling out generic terms such as &amp;quot;patient&amp;quot; or &amp;quot;therapy&amp;quot; which are very frequent within a larger collection of medical texts (our reference corpus).</Paragraph>
    </Section>
    <Section position="3" start_page="111" end_page="111" type="sub_section">
      <SectionTitle>
2.3 Source Clause Selection
</SectionTitle>
      <Paragraph position="0"> Mitkov et al. treat a clause in the source text as eligible for MCTIG if it contains at least one key term and is finite as well as of the SV(O) structure. They acknowledge, however, that this strategy gives rise to a lot of inappropriate source clauses, which was the case in our domain too.</Paragraph>
      <Paragraph position="1"> To address this problem, we implemented a module which filters out inappropriate structures for MCTIG (see Table 1 for examples). This explains why the number of key terms and MCTIs varies among texts (Table 2).</Paragraph>
      <Paragraph position="2"> A finite main clause which contains an NP headed by a key term and functioning as a subject or object with all the subordinate clauses which depend on it is a source clause eligible for MCTIG provided that it satisfies our filters.</Paragraph>
      <Paragraph position="3">  Experimentation during development showed that our module improves source clause selection by around 30% compared to the baseline approach of Mitkov et al.</Paragraph>
    </Section>
    <Section position="4" start_page="111" end_page="111" type="sub_section">
      <SectionTitle>
2.4 Transformation to Stem
</SectionTitle>
      <Paragraph position="0"> Once an appropriate source clause is identified, it has to be turned to the stem of a MCTI. This involves getting rid of discourse cues such as &amp;quot;however&amp;quot; and substituting the NP headed by the key term such as &amp;quot;chronic hepatitis&amp;quot; in (1) with a wh-phrase such as &amp;quot;which disease or syndrome&amp;quot;.</Paragraph>
      <Paragraph position="1"> The wh-phrase is headed by the semantic type of the key-term derived from UMLS.</Paragraph>
      <Paragraph position="2"> RIG utilises a simple transformational component which produces a stem via minimal changes in the ordering of the source clause. The filtering module discussed in the previous section disregards the clauses in which the key term functions as a modifier or adjunct. Additionally, most of the key terms in the eligible source clauses appear in subject position which in turn means that wh-fronting and inversion is performed in just a handful of cases. The following example, again based on the source clause in (2), is one such case: (3) To which disease or syndrome may chronic hepatitis progress if it is left untreated?</Paragraph>
    </Section>
    <Section position="5" start_page="111" end_page="112" type="sub_section">
      <SectionTitle>
2.5 Selection of Appropriate Distractors
</SectionTitle>
      <Paragraph position="0"> MCTIs aim to test the ability of the student to identify the correct answer among several distractors. An appropriate distractor is a concept semantically close to the anchor which, however, cannot serve as the right answer itself.</Paragraph>
      <Paragraph position="1"> RIG computes a set of potential distractors for a key term using the terms with the same semantic type in UMLS (rather than WordNet coordinates employed by Mitkov et al.). Then, we apply a simple measure of distributional similarity derived from our reference corpus to select the best scoring distractors. This strategy means that MCTIs with the same answer feature very similar distractors.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML