File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-1503_metho.xml

Size: 21,804 bytes

Last Modified: 2025-10-06 14:08:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1503">
  <Title>The Parallel Grammar Project</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Project History
</SectionTitle>
    <Paragraph position="0"> The ParGram project began in 1994 with three languages: English, French, and German. The grammar writers worked closely together to solidify the grammatical analyses and conventions. In addition, as XLE was still in development, its abilities grew as the size of the grammars and their needs grew.</Paragraph>
    <Paragraph position="1"> After the initial stage of the project, more languages were added. Because Japanese is typologically very different from the initial three European languages of the project, it represented a challenging case. Despite this typological challenge, the Japanese grammar has achieved broad coverage and high performance within a year and a half. The South Asian language Urdu also provides a widely spoken, typologically distinct language. Although it is of Indo-European origin, it shares many characteristics with Japanese such as verb-finality, relatively free word order, complex predicates, and the ability to drop any argument (rampant pro-drop). Norwegian assumes a typological middle position between German and English, sharing different properties with each of them. Both the Urdu and the Norwegian grammars are still relatively small.</Paragraph>
    <Paragraph position="2"> Each grammar project has different goals, and each site employs grammar writers with different backgrounds and skills. The English, German, and Japanese projects have pursued the goal of having broad coverage, industrial grammars. The Norwegian and Urdu grammars are smaller scale but are experimenting with incorporating different kinds of information into the grammar. The Norwegian grammar includes a semantic projection; their analyses produce not only c- and f-structures, but also semantic structures. The Urdu grammar has implemented a level of argument structure and is testing various theoretical linguistic ideas. However, even when the grammars are used for different purposes and have different additional features, they have maintained their basic parallelism in analysis and have profited from the shared grammar writing techniques and technology.</Paragraph>
    <Paragraph position="3"> Table (1) shows the size of the grammars. The first figure is the number of left-hand side categories in phrase-structure rules which compile into a collection of finite-state machines with the listed number of states and arcs.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Parallelism
</SectionTitle>
    <Paragraph position="0"> Maintaining parallelism in grammars being developed at different sites on typologically distinct languages by grammar writers from different linguistic traditions has proven successful. At project meetings held twice a year, analyses of sample sentences are compared and any differences are discussed; the goal is to determine whether the differences are justified or whether the analyses should be changed to maintain parallelism. In addition, all of the f-structure features and their values are compared; this not only ensures that trivial differences in naming conventions do not arise, but also gives an overview of the constructions each language covers and how they are analyzed. All changes are implemented before the next project meeting. Each meeting also involves discussion of constructions whose analysis has not yet been settled on, e.g., the analysis of partitives or proper names. If an analysis is agreed upon, all the grammars implement it; if only a tentative analysis is found, one grammar implements it and reports on its success. For extremely complicated or fundamental issues, e.g., how to represent predicate alternations, subcommittees examine the issue and report on it at the next meeting. The discussion of such issues may be reopened at successive meetings until a concensus is reached.</Paragraph>
    <Paragraph position="1"> Even within a given linguistic formalism, LFG for ParGram, there is usually more than one way to analyze a construction. Moreover, the same theoretical analysis may have different possible implementations in XLE. These solutions often differ in efficiency or conceptual simplicity and one of the tasks within the ParGram project is to make design decisions which favor one theoretical analysis and concomitant implementation over another.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Parallel Analyses
</SectionTitle>
      <Paragraph position="0"> Whenever possible, the ParGram grammars choose the same analysis and the same technical solution for equivalent constructions. This was done, for example, with imperatives. Imperatives are always assigned a null pronominal subject within the f-structure and a feature indicating that they are imperatives, as in (2).</Paragraph>
      <Paragraph position="1">  (2) a. Jump! Saute! (French) Spring! (German) Tobe! (Japanese) Hopp! (Norwegian) kuudoo! (Urdu) b. PRED jump SUBJ</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SUBJ PRED pro
STMT-TYPE imp
</SectionTitle>
    <Paragraph position="0"> Another example of this type comes from the analysis of specifiers. Specifiers include many different types of information and hence can be analyzed in a number of ways. In the ParGram analysis, the c-structure analysis is left relatively free according to language particular needs and slightly varying theoretical assumptions. For instance, the Norwegian grammar, unlike the other grammars, implements the principles in (Bresnan, 2001) concerning the relationship between an X -based c-structure and the f-structure. This allows Norwegian specifiers to be analyzed as functional heads of DPs etc., whereas they are constituents of NPs in the other grammars. However, at the level of f-structure, this information is part of a complex SPEC feature in all the grammars. Thus parallelism is maintained at the level of f-structure even across different theoretical preferences. An example is shown in (3) for Norwegian and English in which the SPEC consists of a QUANT(ifier) and a POSS(essive) (SPEC can also contain information about DETerminers and DEMONstratives).</Paragraph>
    <Paragraph position="1">  (3) a. alle mine hester (Norwegian)  Interrogatives provide an interesting example because they differ significantly in the c-structures of the languages, but have the same basic f-structure.</Paragraph>
    <Paragraph position="2"> This contrast can be seen between the German example in (4) and the Urdu one in (5). In German, the interrogative word is in first position with the finite verb second; English and Norwegian pattern like German. In Urdu the verb is usually in final position, but the interrogative can appear in a number of positions, including following the verb (5c).</Paragraph>
    <Paragraph position="3">  (4) Was hat John Maria gegeben? (German) what has John Maria give.PerfP 'What did John give to Mary?' (5) a. jon=nee marii=koo kyaa diiyaa? (Urdu) John=Erg Mary=Dat what gave 'What did John give to Mary? b. jon=nee kyaa marii=koo diiyaa? c. jon=nee marii=ko diiyaa kyaa? Despite these differences in word order and hence in c-structure, the f-structures are parallel, with the interrogative being in a FOCUS-INT and the sentence having an interrogative STMT-TYPE, as in (6).</Paragraph>
    <Paragraph position="4"> (6) PRED give SUBJ,OBJ,OBL FOCUS-INT PRED proPRON-TYPE int</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SUBJ PRED John
OBJ [ ]
OBL PRED Mary
STMT-TYPE int
</SectionTitle>
    <Paragraph position="0"> In the project grammars, many basic constructions are of this type. However, as we will see in the next section, there are times when parallelism is not possible and not desirable. Even in these cases, though, the grammars which can be parallel are; so, three of the languages might have one analysis, while three have another.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Justified Differences
</SectionTitle>
      <Paragraph position="0"> Parallelism is not maintained at the cost of misrepresenting the language. This is reflected by the fact that the c-structures are not parallel because word order varies widely from language to language, although there are naming conventions for the nodes. Instead, the bulk of the parallelism is in the f-structure. However, even in the f-structure, situations arise in which what seems to be the same construction in different languages do not have the same analysis. An example of this is predicate adjectives, as in (7).</Paragraph>
      <Paragraph position="1"> (7) a. It is red.</Paragraph>
      <Paragraph position="2"> b. Sore wa akai. (Japanese) it TOP red 'It is red.' In English, the copular verb is considered the syntactic head of the clause, with the pronoun being the subject and the predicate adjective being an XCOMP. However, in Japanese, the adjective is the main predicate, with the pronoun being the subject. As such, these receive the non-parallel analyses seen in (8a) for Japanese and (8b) for English.</Paragraph>
      <Paragraph position="3">  (8) a. PRED red SUBJ SUBJ PRED pro b. PRED be XCOMP SUBJ</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SUBJ PRED pro
XCOMP PRED red SUBJSUBJ [ ]
</SectionTitle>
    <Paragraph position="0"> Another situation that arises is when a feature or construction is syntactically encoded in one language, but not another. In such cases, the information is only encoded in the languages that need it. The equivalence captured by parallel analyses is not, for example, translational equivalence. Rather, parallelism involves equivalence with respect to grammatical properties, e.g. construction types. One consequence of this is that a typologically consistent use of grammatical terms, embodied in the feature names, is enforced. For example, even though there is a tradition for referring to the distinction between the pronouns he and she as a gender distinction in English, this is a different distinction from the one called gender in languages like German, French, Urdu, and Norwegian, where gender refers to nominal agreement classes. Parallelism leads to the situation where the feature GEND occurs in German, French, Urdu, and Norwegian, but not in English and Japanese. That is, parallelism does not mean finding the same features in all languages, but rather using the same features in the same way in all languages, to the extent that they are justified there. A French example of grammatical gender is shown in (9); note that determiner, adjective, and participle agreement is dependent on the gender of the noun.</Paragraph>
    <Paragraph position="1"> The f-structure for the nouns crayon and plume are as in (10) with an overt GEND feature.</Paragraph>
    <Paragraph position="2"> (9) a. Le petit crayon est cass'e. (French) the-M little-M pencil-M is broken-M.</Paragraph>
    <Paragraph position="3"> 'The little pencil is broken.' b. La petite plume est cass'ee. (French) the-F little-F pen-F is broken-F.</Paragraph>
    <Paragraph position="4"> 'The little pen is broken.'  F-structures for the equivalent words in English and Japanese will not have a GEND feature.</Paragraph>
    <Paragraph position="5"> A similar example comes from Japanese discourse particles. It is well-known that Japanese has syntactic encodings for information such as honorification. The verb in the Japanese sentence (11a) encodes information that the subject is respected, while the verb in (11b) shows politeness from the writer (speaker) to the reader (hearer) of the sentence. The f-structures for the verbs in (11) are as in  (12) with RESPECT and POLITE features within the ADDRESS feature.</Paragraph>
    <Paragraph position="6"> (11) a. sensei ga hon wo oyomininaru.</Paragraph>
    <Paragraph position="7"> teacher Nom book Acc read-Respect 'The teacher read the book.' (Japanese) b. seito ga hon wo yomimasu.</Paragraph>
    <Paragraph position="8"> student Nom book Acc read-Polite 'The student reads the book.' (Japanese) (12) a. PRED yomu SUBJ,OBJ ADDRESS RESPECT + b. PRED yomu SUBJ,OBJ</Paragraph>
  </Section>
  <Section position="9" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ADDRESS POLITE +
</SectionTitle>
    <Paragraph position="0"> A final example comes from English progressives, as in (13). In order to distinguish these two forms, the English grammar uses a PROG feature within the tense/aspect system. (13b) shows the f-structure for (13a.ii).</Paragraph>
    <Paragraph position="1">  (13) a. John hit Bill. i. He cried.</Paragraph>
    <Paragraph position="2"> ii. He was crying.</Paragraph>
    <Paragraph position="3"> b. PRED cry SUBJ</Paragraph>
  </Section>
  <Section position="10" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SUBJ PRED pro
TNS-ASP TENSE pastPROG +
</SectionTitle>
    <Paragraph position="0"> However, this distinction is not found in the other languages. For example, (14a) is used to express both (13a.i) and (13a.ii) in German.</Paragraph>
    <Paragraph position="1">  (14) a. Er weinte. (German) he cried 'He cried.' b. PRED weinen SUBJ</Paragraph>
  </Section>
  <Section position="11" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SUBJ PRED pro
TNS-ASP TENSE past
</SectionTitle>
    <Paragraph position="0"> As seen in (14b), the German f-structure is left underspecified for PROG because there is no syntactic reflex of it. If such a feature were posited, rampant ambiguity would be introduced for all past tense forms in German. Instead, the semantics will determine whether such forms are progressive.</Paragraph>
    <Paragraph position="1"> Thus, there are a number of situations where having parallel analyses would result in an incorrect analysis for one of the languages.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 One Language Shows the Way
</SectionTitle>
      <Paragraph position="0"> Another type of situation arises when one language provides evidence for a certain feature space or type of analysis that is neither explicitly mirrored nor explicitly contradicted by another language. In theoretical linguistics, it is commonly acknowledged that what one language codes overtly may be harder to detect for another language. This situation has arisen in the ParGram project. Case features fall under this topic. German, Japanese, and Urdu mark NPs with overt case morphology. In comparison, English, French, and Norwegian make relatively little use of case except as part of the pronominal system. Nevertheless, the f-structure analyses for all the languages contain a case feature in the specification of noun phrases.</Paragraph>
      <Paragraph position="1"> This &amp;quot;overspecification&amp;quot; of information expresses deeper linguistic generalizations and keeps the fstructural analyses as parallel as possible. In addition, the features can be put to use for the isolated phenomena in which they do play a role. For example, English does not mark animacy grammatically in most situations. However, providing a ANIM + feature to known animates, such as people's names and pronouns, allows the grammar to encode information that is relevant for interpretation. Consider the relative pronoun who in (15).</Paragraph>
      <Paragraph position="2"> (15) a. the girl[ANIM +] who[ANIM +] left b. the box[ANIM +] who[ANIM +] left The relative pronoun has a ANIM + feature that is assigned to the noun it modifies by the relative clause rules. As such, a noun modified by a relative clause headed by who is interpreted as animate. In the case of canonical inanimates, as in (15b), this will result in a pragmatically odd interpretation, which is encoded in the f-structure.</Paragraph>
      <Paragraph position="3"> Teasing apart these different phenomena crosslinguistically poses a challenge that the ParGram members are continually engaged in. As such, we have developed several methods to help maintain parallelism. null</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Mechanics of Maintaining Parallelism
</SectionTitle>
      <Paragraph position="0"> The parallelism among the grammars is maintained in a number of ways. Most of the work is done during two week-long project meetings held each year.</Paragraph>
      <Paragraph position="1"> Three main activities occur during these meetings: comparison of sample f-structures, comparison of features and their values, and discussions of new or problematic constructions.</Paragraph>
      <Paragraph position="2"> A month before each meeting, the host site chooses around fifteen sentences whose analysis is to be compared at the meeting. These can be a random selection or be thematic, e.g., all dealing with predicatives or with interrogatives. The sentences are then parsed by each grammar and the output is compared. For the more recent grammars, this may mean adding the relevant rules to the grammars, resulting in growth of the grammar; for the older grammars, this may mean updating a construction that has not been examined in many years. Another approach that was taken at the beginning of the project was to have a common corpus of about 1,000 sentences that all of the grammars were to parse. For the English, French, and German grammars, this was an aligned tractor manual. The corpus sentences were used for the initial f-structure comparisons. Having a common corpus ensured that the grammars would have roughly the same coverage. For example, they all parsed declarative and imperative sentences. However, the nature of the corpus can leave major gaps in coverage; in this case, the manual contained no interrogatives. null The XLE platform requires that a grammar declare all the features it uses and their possible values. Part of the Urdu feature table is shown in (16) (the notation has been simplified for expository purposes). As seen in (16) for QUANT, attributes which take other attributes as their values must also be declared. An example of such a feature was seen in (3b) for SPEC which takes QUANT and POSS features, among others, as its values.</Paragraph>
      <Paragraph position="3"> (16) PRON-TYPE: pers poss null .</Paragraph>
      <Paragraph position="4"> PROPER: date location name title .</Paragraph>
      <Paragraph position="5"> PSEM: locational directional .</Paragraph>
      <Paragraph position="6"> PTYPE: sem nosem .</Paragraph>
    </Section>
  </Section>
  <Section position="12" start_page="0" end_page="0" type="metho">
    <SectionTitle>
QUANT: PRED QUANT-TYPE
</SectionTitle>
    <Paragraph position="0"> QUANT-FORM .</Paragraph>
    <Paragraph position="1"> The feature declarations of all of the languages are compared feature by feature to ensure parallelism. The most obvious use of this is to ensure that the grammars encode the same features in the same way. For example, at a basic level, one feature declaration might have specified GEN for gender while the others had chosen the name GEND; this divergence in naming is regularized. More interesting cases arise when one language uses a feature and another does not for analyzing the same phenomena. When this is noticed via the feature-table comparison, it is determined why one grammar needs the feature and the other does not, and thus it may be possible to eliminate the feature in one grammar or to add it to another. null On a deeper level, the feature comparison is useful for conducting a survey of what constructions each grammar has and how they are implemented.</Paragraph>
    <Paragraph position="2"> For example, if a language does not have an ADEGREE (adjective degree) feature, the question will arise as to whether the grammar analyzes comparative and superlative adjectives. If they do not, then they should be added and should use the ADEGREE feature; if they do, then the question arises as to why they do not have this feature as part of their analysis. Finally, there is the discussion of problematic constructions. These may be constructions that already have analyses which had been agreed upon in the past but which are not working properly now that more data has been considered. More frequently, they are new constructions that one of the grammars is considering adding. Possible analyses for the construction are discussed and then one of the grammars will incorporate the analysis to see whether it works. If the analysis works, then the other grammars will incorporate the analysis. Constructions that have been discussed in past ParGram meetings include predicative adjectives, quantifiers, partitives, and clefts. Even if not all of the languages have the construction in question, as was the case with clefts, the grammar writers for that language may have interesting ideas on how to analyze it.</Paragraph>
    <Paragraph position="3"> These group discussions have proven particularly useful in extending grammar coverage in a parallel fashion.</Paragraph>
    <Paragraph position="4"> Once a consensus is reached, it is the responsibility of each grammar to make sure that its analyses match the new standard. As such, after each meeting, the grammar writers will rename features, change analyses, and implement new constructions into their grammars. Most of the basic work has now been accomplished. However, as the grammars expand coverage, more constructions need to be integrated into the grammars, and these constructions tend to be ones for which there is no standard analysis in the linguistic literature; so, differences can easily arise in these areas.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML