File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-2024_metho.xml

Size: 10,248 bytes

Last Modified: 2025-10-06 14:09:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-2024">
  <Title>Corpus-Oriented Development of Japanese HPSG Parsers</Title>
  <Section position="4" start_page="139" end_page="141" type="metho">
    <SectionTitle>
3 Grammar Design
</SectionTitle>
    <Paragraph position="0"> First, we provide a brief description of some characteristics of Japanese. Japanese is head final, and phrases are typically headed by function words. Arguments of verbs usually have no fixed order (this phenomenon is called scrambling) and are freely omitted. Arguments' semantic relations to verbs are chiefly determined by their head postpositions.</Paragraph>
    <Paragraph position="1"> For example, 'boku/I ga/NOM kare/he wo/ACC koroshi/kill ta/DECL' (I killed him) can be paraphrased as 'kare wo boku ga koroshi ta,' without changing the meaning.</Paragraph>
    <Paragraph position="2"> The case alternation phenomenon must also be taken into account. Case alternation is caused by special auxiliaries &amp;quot;(sa)se&amp;quot; and &amp;quot;(ra)re,&amp;quot; which are causative and passive auxiliaries, respectively, and the verbs change their subcategorization behavior when they are combined with these auxiliaries.</Paragraph>
    <Paragraph position="3"> The following sections describe the design of our grammar. Especially, treatment of the scrambling and case alternation phenomena is provided in detail. null</Paragraph>
    <Section position="1" start_page="139" end_page="140" type="sub_section">
      <SectionTitle>
3.1 Fundamental Phrase Structures
</SectionTitle>
      <Paragraph position="0"> Figure 2 presents the basic structure of signs of our grammar. The HEAD feature specifies phrasal categories, the MOD feature represents restrictions on the left and right modifiees, and the VAL feature encodes valence information. (For the explanation of the BAR feature, see the description of the promo- null specifier-head PP or NP + postposition VP + verbal ending NP + suffix complement-head argument (PP/NP) + verb compound-noun NP + NP modifier-head modifier + head head-modifier phrase + punctuation  promotion promotes chunks to phrases tion schema below.) 1 For some types of phrases, additional features are specified as HEAD features. Now, we provide a detailed explanation of the design of the schemata and how the features in Figure 2 work. The following descriptions are also summarized in Table 1.</Paragraph>
      <Paragraph position="1"> specifier-head schema Words are first concatenated by this schema to construct basic word chunks. Postpositional phrases (PPs), which consist of post-positions and preceding phrases, are the most typical example of specifier-head structures. For postpositions, we specify a head feature PFORM, with the postposition's surface string as its value, in addition to the features in Figure 2, because differences of postpositions play a crucial role in disambiguating semantic-structures of Japanese. For example, the postposition 'wo' has a PFORM feature whose value is &amp;quot;wo,&amp;quot; and it accepts an NP as its specifier. As a result, a PP such as &amp;quot;kare wo&amp;quot; inherits the value of PFORM feature &amp;quot;wo&amp;quot; from 'wo.' The schema is also used when VPs are constructed from verbs and their endings (or, sometimes auxiliaries. See also Section 3.2).</Paragraph>
      <Paragraph position="2"> complement-head schema This schema is used for combining VPs with their subcategorized arguments (see Section 3.2 for details).</Paragraph>
      <Paragraph position="3"> compound-noun schema Because nouns can be freely concatenated to form compound nouns, a special schema is used for compound nouns.</Paragraph>
      <Paragraph position="4"> modifier-head schema This schema is for modifiers and their heads. Binary structures that cannot be captured by the above three schemata are also  the current implementation of the grammar.</Paragraph>
      <Paragraph position="5"> considered to be modifier-head structures.2 head-modifier schema This schema is used when the modifier-head schema is not appropriate. In the current implementation, it is used for a phrase and its following punctuation.</Paragraph>
      <Paragraph position="6"> promotion schema This unary schema changes the value of the BAR feature from chunk to phrase. The distinction between these two types of constituents is for prohibiting some kind of spurious ambiguities. For example, 'kinou/yesterday koroshi/kill ta/DECL' can be analyzed in two different ways, i.e. '(kinou (koroshi ta))' and '((kinou koroshi) ta).' The latter analysis is prevented by restricting &amp;quot;kinou&amp;quot;'s modifiee to be a phrase, and &amp;quot;ta&amp;quot;'s specifier to be a chunk, and by assuming &amp;quot;koroshi&amp;quot; to be a chunk.</Paragraph>
    </Section>
    <Section position="2" start_page="140" end_page="141" type="sub_section">
      <SectionTitle>
3.2 Scrambling and Case Alternation
</SectionTitle>
      <Paragraph position="0"> Scrambling causes problems in designing a Japanese HPSG grammar, because original HPSG, designed for English, specifies the subcategorization frame of a verb as an ordered list, and the semantic roles of arguments are determined by their order in the complement list.</Paragraph>
      <Paragraph position="1"> Our implementation treats the complement feature as a list of semantic roles. Semantic roles for which verbs subcategorize are agent, object, and goal.3 Correspondingly, we assume three subtypes of the complement-head schema: the agent-head, object-head, and goal-head schemata. When verbs take their arguments, arguments receive semantic roles which are permitted by the subcategorization of verbal signs. We do not restrict the order of application of the three types of complement-head schemata, so that a single verbal lexical entry can accept arguments that are scrambled in arbitrary order. In Figure 3, &amp;quot;kare ga&amp;quot; is a ga-marked PP, so it is analyzed as an agent of &amp;quot;koro(su).&amp;quot; 4  as an agent, though without &amp;quot;(sa)re,&amp;quot; it takes a &amp;quot;ga&amp;quot;marked PP as an agent and a &amp;quot;wo&amp;quot;-marked PP as an object.</Paragraph>
      <Paragraph position="2"> We consider auxiliaries as a special type of verbs which do not have their own subcategorization frames. They inherit the subcategorization frames of verbs.5 To capture the case alternation phenomenon, each verb has distinct lexical entries for its passive and causative uses. This distinction is made by binary valued HEAD features, PASSIVE and CAUSATIVE. The passive (causative) auxiliary restricts the value of its specifier's PASSIVE (CAUSATIVE) feature to be plus, so that it can only be combined with properly case-alternated verbal lexical entries.</Paragraph>
      <Paragraph position="3"> Figure 4 presents the lexical sign of the passive auxiliary &amp;quot;(ra)re.&amp;quot; Our analysis of an example sentence is presented in Figure 5. Note that the passive auxiliary &amp;quot;re(ta)&amp;quot; requires the value of the PASSIVE feature of its specifier be plus, and hence &amp;quot;koro(sa)&amp;quot; cannot take the same lexical entry as in Figure 3.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="141" end_page="142" type="metho">
    <SectionTitle>
4 Grammar Extraction from EDR
</SectionTitle>
    <Paragraph position="0"> The EDR Japanese corpus consists of 207802 sentences, mainly from newspapers and magazines.</Paragraph>
    <Paragraph position="1"> The annotation of the corpus includes word segmen5The control phenomena caused by auxiliaries are currently unsupported in our grammar.</Paragraph>
    <Paragraph position="2">  tation, part-of-speech (POS) tags, phrase structure annotation, and semantic information.</Paragraph>
    <Paragraph position="3"> The heuristic conversion of the EDR corpus into an HPSG treebank consists of the following steps. A sentence '((kare/NP-he wo/PP-ACC) (koro/VP-kill shi/VP-ENDING ta/VP-DECL))' ([I] killed him yesterday) is used to provide examples in some steps. Phrase type annotation Phrase type labels such as NP and VP are assigned to non-terminal nodes.</Paragraph>
    <Paragraph position="4"> Because Japanese is head final, the label of the right-most daughter of a phrase is usually percolated to its parent. After this step, the example sentence will be '((PP kare/NP wo/PP) (VP koro/VP shi/VP ta/VP)).' Assign head features The types of head features of terminal nodes are determined, chiefly from their phrase types. Features specific to some categories, such as PFORM, are also assigned in this step.</Paragraph>
    <Paragraph position="5"> Binarization Phrases for which EDR employs flat annotation are converted into binary structures. The binarized phrase structure of the example sentence will be '((kare wo) ((koro shi) ta)).' Assign schema names Schema names are assigned according to the patterns of phrase structures. For instance, a phrase structure which consists of PP and VP is identified as a complement-head structure, if the VP's argument and the PP are coindexed. In the example sentence, 'kare wo' is annotated as 'koro&amp;quot;s object in EDR, so the object-head schema is applied to the root node of the derivation.</Paragraph>
    <Paragraph position="6"> Inverse schema application The consistency of the derivation of the obtained HPSG treebank is ver- null ified by applying the schemata to each node of the derivation trees in the treebank.</Paragraph>
    <Paragraph position="7"> Lexicon Extraction Lexical entries are extracted from the terminal nodes of the obtained treebank.</Paragraph>
  </Section>
  <Section position="6" start_page="142" end_page="142" type="metho">
    <SectionTitle>
5 Disambiguation Model
</SectionTitle>
    <Paragraph position="0"> We also train disambiguation models for the grammar using the obtained treebank. We employ log-linear models (Berger et al., 1996) for the disambiguation. The probability of a parse a0 of a sentence  where a19 a17 are feature functions, a21 a17 are strengths of the feature functions, and a0 a28 spans all possible parses of a1 . We employ Gaussian MAP estimation (Chen and Rosenfeld, 1999) as a criterion for optimizing a21 a17 . An algorithm proposed by Miyao et. al. (2002) provides an efficient solution to this optimization problem. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML