File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/p03-2037_metho.xml

Size: 8,306 bytes

Last Modified: 2025-10-06 14:08:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-2037">
  <Title>Automatic Detection of Grammar Elements that Decrease Readability</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The check list of grammar elements
</SectionTitle>
    <Paragraph position="0"> The first component of the readability checker is the check list; in this list, we should define every Japanese grammar element and its readability level.</Paragraph>
    <Paragraph position="1"> A grammar element is a grammatical phenomenon concerned with readability, and its readability level indicates the familiarity of the grammar element.</Paragraph>
    <Paragraph position="2"> In Japanese, grammar elements are classified into four categories.</Paragraph>
    <Paragraph position="3">  1. Conjugation: the form of a verb or an adjective changes appropriately to the proceed word.</Paragraph>
    <Paragraph position="4"> 2. Functional word: postpositional particles work as case makers; auxiliary verbs represent tense and modality.</Paragraph>
    <Paragraph position="5"> 3. Sentential pattern: negation, passive form, and  question are represented as special sentence patterns.</Paragraph>
    <Paragraph position="6"> 4. Functional phrase: there are idiomatic phrases works functionally, like &amp;quot;not only ... but also ...&amp;quot; in English.</Paragraph>
    <Paragraph position="7"> A grammar section exists in a part of the Japanese Language Proficiency Test, which is used to measure and certify the Japanese language ability of a person who is a non-Japanese. There are four levels in this test; Level 4 is the elementary level, and Level 1 is the advanced level.</Paragraph>
    <Paragraph position="8"> Test Content Specifications (TCS) (Foundation and Association of International Education, 1994) is intended to serve as a reference guide in question compilation of the Japanese Language Proficiency Test. This book describes the list of grammar elements, which can be tested at each level. These lists fit our purpose: they can be used as the check list for the readability checker.</Paragraph>
    <Paragraph position="9"> TCS describes grammar elements in two ways. In the first way, a grammar element is described as a 3-tuple: its name, its patterns, and its example sentences. The following 3-tuple is an example of the grammar element that belongs to Level 4.</Paragraph>
    <Paragraph position="10">  jugations, functional words and sentential patterns that are defined in this first way. In the second way, a grammar element is described as a pair of its patterns and its examples. The following pair is an example of the grammar element that belongs to Level  tional phrases that are defined in this second way. We decided to use this example-based definition for the check list, because the check list should be independent from the implementation of the detector. If the check list depends on detector's implementation, the change of implementation requires change of the check list.</Paragraph>
    <Paragraph position="11"> Each item of the check list is defined as a 3-tuple:  (1) readability level, (2) name, and (3) a list of example pairs. There are four readability levels according  to the Japanese Language Proficiency Test. An example pair consists of an example sentence and an instance of the grammar element. It is an implicit description of the pattern detecting the grammar element. For example, the check item for 'Adjective (predicative, negative, polite)' is shown as follows,  of three morphemes: (1)X/hiroku/, the adjective means 'large' in renyo form, (2)sM/nai/, the adjective means 'not' in root form, and (3)pb/desu/, the auxiliary verb ends a sentence politely. So, this test pair represents implicitly that the grammar element can be detected by a pattern &amp;quot;Adjective(in renyo form) + nai + desu&amp;quot;.</Paragraph>
    <Paragraph position="12"> All example sentences are originated from TCS.</Paragraph>
    <Paragraph position="13"> Some check items have several test pairs. Table 1 shows the size of the check list.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 The grammar elements detector
</SectionTitle>
    <Paragraph position="0"> The check list must be converted into an explicit rule set, because each item of the check list shows no explicit description of its grammar element, only shows one or more pairs of an example sentence and an instance.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 The explicit rule set
</SectionTitle>
      <Paragraph position="0"> Four categories of grammar elements leads that each rule of the explicit rule set may take three different types.</Paragraph>
      <Paragraph position="1"> + Type M: A rule detecting a sequence of morphemes null + Type B: A rule detecting a bunsetsu. + Type R: A rule detecting a modifier-modifee relationship. null Type M is the basic type of them, because almost of grammar elements can be detected by morphological sequential patterns.</Paragraph>
      <Paragraph position="2"> Conversion from a check item to a Type M rule is almost automatic. This conversion process consists of three steps. First, an example sentence of the check item is analyzed morphologically and syntactically. Second, a sentence fragment covered by the target grammar element is extracted based on signs and fixed strings included in the name of the check item. Third, a part of a generated rule is relaxed based on part-of-speech tags. For example, the check item of the grammar element whose name is &amp;quot;Adjective (predicative, negative, polite)&amp;quot; is converted to the following rule.</Paragraph>
      <Paragraph position="4"> The function np() makes the declaration of the rule, and the functionDm()describes a morphological sequential pattern which matches the target. This example means that this grammar element belongs to Level 4, and can be detected by the pattern which consists of three morphemes.</Paragraph>
      <Paragraph position="5"> Type B rules are used to describe grammar elements such as conjugations including no functional words. They are not generated automatically; they are converted by hand from type M rules that are generated automatically. For example, the rule detecting the grammar element whose name is &amp;quot;Adjective in Root Form&amp;quot; is defined as follows. np( 4, 'Adjective in Root Form',</Paragraph>
      <Paragraph position="7"> The function Db() describes a pattern which matches a bunsetsu which consists of specified morphemes. This example means that this grammar element belongs to Level 3, and shows the detection pattern of this grammar element.</Paragraph>
      <Paragraph position="8">  Type R rules are used to describe grammar elements that include modifier-modifee relationships. In the case of the grammar element whose name is &amp;quot;Verb Modified by Adjective&amp;quot;, it includes a structure that an adjective modifies a verb. It is impossible to detect this grammar element by a morphological continuous pattern, because any bunsetsus can be inserted between the adjective and the verb. For such a grammar element, we introduce the function Dk() that takes two arguments: the former is a modifier and the latter is its modifee.</Paragraph>
      <Paragraph position="9"> np( 4, 'Verb Modified by Adjective',</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 The architecture of the detector
</SectionTitle>
      <Paragraph position="0"> The architecture of the detector is shown in Figure 1.</Paragraph>
      <Paragraph position="1"> The detector uses a morphological analyzer, Juman, and a syntactic analyzer, KNP (Kurohashi and Nagao, 1994). The rule set is converted into the format that KNP can read and it is added to the standard rule set of KNP. This addition enables KNP to detect candidates of grammar elements. The 'Detection' part selects final results from these candidates based on preference information given by the rule set.</Paragraph>
      <Paragraph position="2"> Figure 2 shows grammar elements detected by our detector from the sentence &amp;quot; chizu  nakatta.sTlh{&amp;quot; which means &amp;quot;Neither a map nor a rough map was not distributed.&amp;quot;</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML