File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2144_metho.xml

Size: 6,963 bytes

Last Modified: 2025-10-06 14:15:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2144">
  <Title>HPSG-Style Underspecified Japanese Grammar with Wide Coverage</Title>
  <Section position="4" start_page="876" end_page="878" type="metho">
    <SectionTitle>
3 Refinement of our Grammar
</SectionTitle>
    <Paragraph position="0"> Our goal in this section is to improve accuracy without losing coverage. Constraints to improve accuracy can also be represented by TFSs and be added to the original grammar components such as ID schemata, LEs, and LETs.</Paragraph>
    <Paragraph position="1"> The basic idea to improve accuracy is that including descriptions for rare linguistic phenomena might make it more difficult for our system to choose the right analyses. Thus, we abandon some rare linguistic phenomena. This approach is not always linguistically valid but at least is practical for real-world corpora.</Paragraph>
    <Paragraph position="2"> In this section, we consider some frequent linguistic phenomena, and explain how we discarded the treatment of rare linguistic phenomena in favor of frequent ones, regarding three components: (i) the postposition 'wa', (ii) relative clauses and commas and (iii) nominal suffixes representing time. The way how we abandon the treatment of rare linguistic phenomena is by introducingadditional constraints in feature structures. Regarding (i) and (ii), we introduce 'pseudo-principles', which are unified with ID schemata in the same way principles are unified. Regarding (iii), we add some feature structures to LEs/LETs.</Paragraph>
    <Section position="1" start_page="876" end_page="877" type="sub_section">
      <SectionTitle>
3.1 Postposition 'Wa'
</SectionTitle>
      <Paragraph position="0"> The main usage of the postposition 'wa' is divided into the following two patternsS:  (a) (b)* ......... 1 ................ (~) ....... .... ........ .... l ' I ' (c) (d)* ......... l ........ T ........ (i) ........ I ............ i ! .... l_-. * ........... ; *------'t----4 ....... '-&amp;quot; I ....... '-'1  (2) Taro wa iku a ika nai. -TOPICgO ~ut Jiro wa -TOPIC go -NEG 'Though Tarogoes, Jiro does not go.&amp;quot; (3) Tokai wa hito ga ookute sawagashii. city -TOPIC people -SUBJ many noisy  'A city is noisy because there are ninny people.' Although there are exceptions to the above patterns (e.g., Sentence (4) &amp; Figure (2)), they are rarely observed in real-world corpora. Thus, we abandon their treatment.</Paragraph>
      <Paragraph position="1"> (4) Ude wa nai ga, konjo ga aru. ability -TOPIC missing but guts -SUaJ exist 'Though he does not have ability, he has guts.' To deal with the characteristic of 'wa', we introduced the WA feature and the P_WA feature. Both of them are binary features as follows:  WA +/- The phrase contains a/no 'wa'.</Paragraph>
      <Paragraph position="2"> P_WA +/- The PP is/isn't marked by 'wa'.</Paragraph>
      <Paragraph position="3"> We then introduced a 'pseudo-principle' for 'wa' in a disjunctive form as below6:</Paragraph>
      <Paragraph position="5"> (B) When applying head-modifier schema, also apply: where wa_h~(-,-). .a_hm(-, +). .~_hm(+, +). ... and so on.</Paragraph>
      <Paragraph position="6"> This treatment prunes the parse trees like those in Figure l(b, d) as follows:  * Figure l(b) l) At (:~), the head-complement schema should be applied, and (A) of the 'pseudoprinciple should also be applied. 2) Since the phrase 'iku kedo ashita wa ika nai' contains a 'wa', \[\] is +. 3) Since the PP 'Kyou wa' is marked by 'wa', \[-3\] is +.</Paragraph>
      <Paragraph position="7"> 4) .a_hc(\[~\], \[~ \[\]-\]) fails. * Figure l(d) 1) At (#), the head-modifier schema should be applied, and (B) of the 'pseudo-principle' should also be applied. 2) Since the phrase ' Tokai wa hito ga ookute' contains a 'wa', E/is +.</Paragraph>
      <Paragraph position="8"> 3) Since the phrase 'sawagashii' contains no 'wa', \[-~ is --.</Paragraph>
      <Paragraph position="9"> 4) .._hm(E\], D fails.</Paragraph>
    </Section>
    <Section position="2" start_page="877" end_page="878" type="sub_section">
      <SectionTitle>
3.2 Relative Clauses and Commas
</SectionTitle>
      <Paragraph position="0"> Relative clauses have a tendency to contain no commas. In Sentence (5), the PP 'Nippon de,' is a complement of the main verb 'atta', not a complement of 'umareta' in the relative clause (Figure 3(a) ), though 'Nippon de' is preferred to 'urnaveta' if the comma after 'de' does not exist (Figure 3(b) ). We, therefore, abandon the treatment of relative clauses containing a</Paragraph>
      <Paragraph position="2"> (b) correct parse tree for comma-removed Sentence (5) comma.</Paragraph>
      <Paragraph position="3"> (5) Nippon de, saikin umareta akachan Japan -LOC recently be-born-PAST baby ni atta.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="878" end_page="878" type="metho">
    <SectionTitle>
-GOAL meet-PAST
</SectionTitle>
    <Paragraph position="0"> 'ill Japan I met a baby who was born recently.' To treat such a tendency of relative clauses.</Paragraph>
    <Paragraph position="1"> we first introduced the TOUTEN feature 7. The TOUTEN feature is a binary feature which takes +/- if the phrase contains a/no comma. We then introduced a 'pseudo-principle' for relative clauses as follows: (A) When applying head-relative schema, also apply: \[ DTRSlNH.DTRITOUTE - \] (B) When applying other ID schemata, this pseudo-principle has no effect.</Paragraph>
    <Paragraph position="2"> This is to make sure that parse trees for relative clauses with a comma cannot be produced.</Paragraph>
    <Section position="1" start_page="878" end_page="878" type="sub_section">
      <SectionTitle>
3.3 Nominal Suffixes Representing
Time and Commas
</SectionTitle>
      <Paragraph position="0"> Noun phrases (NPs) with nominal suffixes such as nen (year), gatsu (month), and ji (hour) represent information about time. Such NPs are sometimes used adverbially, rather than nominally. Especially NPs with such a nominal suffix and comma are often used adverbially (Sentence (6) &amp; Figure 4(a) ), while general SPs with a comma are used in coordinate structures (Sentence (7) &amp; Figure 4(b) ).</Paragraph>
      <Paragraph position="1"> (6) 1995 nen, jishin ga okita. year earthquake -SUBJ Occur-PAST An earthquake occurred in 1995.</Paragraph>
      <Paragraph position="2"> rA touten stands for a comma in Japanese.</Paragraph>
      <Paragraph position="3">  (6) and (7) respectively (7) Kyoto, Nara ni itta. -GOAL gO-PAST  I went to Kyoto and-Nara.</Paragraph>
      <Paragraph position="4"> In order to restrict the behavior of NPs with nominal time suffixes and commas to adverbial usage only, we added the following constraint to the LE of a comma, constructing a coordinate structure:</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="878" end_page="878" type="metho">
    <SectionTitle>
\[ MARK \[SYN\[LOCAL\[N-SUFFIX - \]
</SectionTitle>
    <Paragraph position="0"> This prohibits an NP with a nominal suffix from being marked by a comma for coordination.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML