File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1081_metho.xml
Size: 11,407 bytes
Last Modified: 2025-10-06 14:13:44
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-1081"> <Title>PARSING TURKISH USING THE LEXICAL FUNCTIONAL GRAMMAR FORMALISM 1</Title> <Section position="4" start_page="0" end_page="494" type="metho"> <SectionTitle> 3 TURKISH SYNTAX </SectionTitle> <Paragraph position="0"> In this section, we would like to highlight two of the relevant key issues in Turkish grammar, namely highly inflected agglutinative morphology and free word order, and give a description of the structural classification of Turkish sentences that we deal with.</Paragraph> <Section position="1" start_page="0" end_page="494" type="sub_section"> <SectionTitle> 3.1 Morphology </SectionTitle> <Paragraph position="0"> Turkish is an agglutinative language with word structures formed by productive affixations of derivational and inflectional suffixes to root words \[Ottazer 1993\]. This extensive use of suffixes causes morphological parsing of words to be rather complicated, and results iu ambiguous lexical interpretations in many cases. For example: (1) ~ocuklarl a. child+PLU+3SG-POSS his children b. child+3PL-POSS their child c. child+PLU+3PL-POSS their children d. child+PLU+ACC children (acc.) Such ambiguity can sometimes be resolved at phrase and sentence levels by the help of agreement requireuletlts though this is not always possible:</Paragraph> <Paragraph position="2"/> <Paragraph position="4"> For example, in (2a) only the interpretation (l c) (i.e., :heir et,ihlren) is possible because: * the agreement requirement between lhe modifier and the modified parts in a possessive compound norm eliminates (la). ~ ,, the facts that gel (come) does not subcategorize for an accusative marked direct object, and that in Turkish the subject of a sentence must be nominative 3 elintinate lid).</Paragraph> <Paragraph position="5"> * the agreement requirement between the subject and the verb of a sentence eliminates lib). 4 In (2b), both (l a) and (l c) are possible (his children, and their children, respectively) because the moditier of the possessive compound noun is It covert one: it may be either onun (his) or onlartn (their). The other two interpretations are eliminated due to the same reasons as in (2a).</Paragraph> </Section> <Section position="2" start_page="494" end_page="494" type="sub_section"> <SectionTitle> 3.2 Word Order </SectionTitle> <Paragraph position="0"> If we concern ourselves with the typical order o\[ constituents, Turkish can be characterized as being a subject object-verb (SOV) language, though the data in Table 1 fiom Erguwmh \[Erguwmh I979\], shows that other orders for constituents are also common (especially in discourse).</Paragraph> <Paragraph position="1"> In Turkish it is not the position, but the case of a noun phrase that determines its grammatical function in the sei1tence. Consequently typical order of the constituents may change rather freely without affecting the grammaticality of a sentence. Due to various syntactic and pragmatic constraints, sentences with the non-typical orders are not sbould be the sitllle. This is true also lot tim nunlber t)atures with one exception: third person plural subjects may sometimes take third person sillglllllr verbs.</Paragraph> <Paragraph position="2"> stylistic wu'iants or the typical versions which can be used interehange:tbly in any c(mtexl \[l~rguvanh 1979\]. For ex-. an@e, a constituenl lhat is to be emphasized is generally placed immediately before the verb. This affects the places of all the constittmnts in a sentence except that of the verb:</Paragraph> <Paragraph position="4"> (It was file child to whom l gave tim book.) (3a) is an example of tim typk:al word order whereas in (31)) the subject, ben, is eml)hasized. Similarly, in (3c) the indirect object, ('oeu,@, is eml)hasized.</Paragraph> <Paragraph position="5"> In addition to these i)ossihle changes, the verb itself tnay move away from its lypical place, i.e., the end or Ihe sen{CIICC. ~tlch sga\[etlces alc called inverted .~'gnlences ;I1KI are typically used in informal prose and discourse.</Paragraph> <Paragraph position="6"> llowew:r, this looseness or ordering collstr.'lilltS at sen tence level does not extc.nd into all syntactic levels. There are even COltStfilil/tS at sentence level: * A nominative direct object should be placed immediately before the verb. 5 llence, (51)) is ungramlnatical: 6 (5a) Ben q'oeu~a Mtap vet(lira.</Paragraph> <Paragraph position="7"> 1 child+l)Nl' book give+PAST+IS(; (I gave a bool,: to tim child.) (51)) *(;oeu~a Idta I) ben verdim.</Paragraph> <Paragraph position="8"> child+l)A'F book l give+PAST+l St; ,, Some adverbial COml)lements or quality (those that are actually qualitative adjectives) always p,ecede the verb or, if it exists, tile indetinite direct object: (6a) Yeme~,i iyi i)i~/ir(lin.</Paragraph> <Paragraph position="9"> ntcal+A('C good co(>k-l-l)AS'\['+2S(l (You cooked tile ineal well.) (6h) iyi yeme~i pi~irdin.</Paragraph> <Paragraph position="10"> good ineal+AC(~ cook-IPAST+2S(; (You cooked the good meal.) (6c) iyi yemek l)i~iirdin.</Paragraph> <Paragraph position="11"> good meal cook+PAST+2SG (You cooked a good meal./You cooked a meal well.) Note th'tt although (61)) is L, ramnmtical iyi is no more an adverbial complentent, bill is an adjective that modities yeme~,i. Note also that (6c) is ambiguous: iyi can be in- null adverb modifying pi~virdin.7</Paragraph> </Section> <Section position="3" start_page="494" end_page="494" type="sub_section"> <SectionTitle> 3.3 ' Structural Classification of Sentences </SectionTitle> <Paragraph position="0"> The following summarizes the major classes of sentences in Turkish.</Paragraph> <Paragraph position="1"> ,Simple Sentences: A simple sentence contains only one independent judgement. The sentences in 12), (3), (4a), (5a), and (6) are all examples of simple sentences. ,Complex Sentences : In Turkish, a sentence can be transformed into a construction with a verbal notttt, a participle or a gerund by affixing certain suffixes to the verb of the sentence. Complex sentences are those that include such dependent (subordinate) clauses as their constituents, or as modifiers of their constituents. Dependent clauses may themselves contain other dependent clauses. So, we may have embedded structures such as: (It wouldn't have been right for me to think that I wouldn't be able to find drinkable water here.) The subject of (7) (burada i?ilebilecek su bulamayacafi, tmt zannetmek - to think that I wouldn't be able to find drinkable water here) is a nominal dependent clause whose definite object (burada ifilebilecek su bulamayaca~mtt that I wouldn't be able to find drinkable water here) is an adjectival dependent clause which acts as a nominal one. The indefinite object of this defnite object (ifilebilecek su -drinkable water) is a conlpound noun whose nlodifier part is another adjectival dependent clause (ifilebilecek drinkable), and modified part is a noun (su - water).</Paragraph> <Paragraph position="2"> It should be noted that there are other types of sentences in the classification according to structure, ttowever, we will not be concerned with them here because of space limitations. (See $im~ek \[$imsek 1987\], and Gting(~rdft \[GfingOrdi~ 1993\] for details.)</Paragraph> </Section> </Section> <Section position="5" start_page="494" end_page="496" type="metho"> <SectionTitle> 4 SYSTEM ARCHITECTURE AND IM- PLEMENTATION </SectionTitle> <Paragraph position="0"> We have implemented our parser in the grammar developmeat environment of the Generalized LR Parser/Compiler morphological rules as the parser lets us incorporate onr own morphological analyzer for wbich we use a full scale two-level specification of Turkish morphology based on a lexicon of about 24,000 root words\[Oflazer 1993\]. This lexicon is nminly used for morpbological analysis .'rod has limited additional syntactic and semantic information, and is augmented with an argument structure database. 8 Figure 1 shows the architecture of our system. When a sentence is given as input to tbe program, the program first calls the morphological analyzer lot- each word in the sentence, and keeps the results of these calls in a list to be used later by the parser.&quot; If the tnorpt'~ological atmlyzer fails to return a structure for a word for any reason (e.g., the lexicon may lack the word or the word may be misspelled), the program returns with an error message. After the morphological attalysis is completed, the parser is invoked to check whether the sentence is granmmtical. The parser performs bottom-up parsing. During this analysis, whenever it consumes a new word from {he sentence, it picks lip the morphological structttrc of this word from the list. If the word is a finite verb or an intinitiwtl, the parser is also provided with the subcategorizatiou frante o1' the word, At the end of the analysis, if the sentence is grammatical, its f-structure is output by the parser.</Paragraph> <Paragraph position="1"> 8The morphological mudyzer returns a list nfJkature-vahw pairs. For instance forlhe ward evdekilerin (of those (things) in the house/your things in the house) it will relorll</Paragraph> <Paragraph position="3"> 9Recall that tllcre may be a number of morl)hologieally alnbiguous interflrclalic, ns uf a word. In such a case, die nlorphological analyzer returns all of \[lie possible nlorllhological strilctllres ill a list, lind tile parser takes care of the ambiguity regarding the gramnmr rules.</Paragraph> </Section> <Section position="6" start_page="496" end_page="496" type="metho"> <SectionTitle> TOTAL 153 5 TItE GRAMMAR </SectionTitle> <Paragraph position="0"> In this section, we present an overview of the LI'~(I specitication that we have developed for Turkish syntax. Our grammar inchldes rules for sentences, dependent clauses, noun phrases, adjectival phrases, postpositional phraxes, adverbial constructs, verb phrases, and a number of h:~:ical look up rules. Ideg &quot;lable 2 presents the number of rules for each category in the grammar. There are also some intermediary rules, not shown here.</Paragraph> <Paragraph position="1"> Recall that the typical order of constituents in a sentence may change due to a number of reasons. Since the order of phrases is tixed in the phrase structure component of an LFG rule, tiffs rather free nature of word order in sentence level constitutes a major problem. In order to keep fi'om using a number of redundant rules we adopt tbe following strategy in our rules: We use the same place bolder, <XP>, for all the syntactic categories in the phrase structure component of a sentence or a dependent chmse rule, and check the categories of these phrases in the eqtmtions part of the rule.</Paragraph> <Paragraph position="2"> In Figure 2, we give a granmmr rule for the sentence with two constituents, with an informal description of the equatkm part.~ Recall also that an indefinite object shouk\[ be placed immediately before tile verb, :md some adverbial complenmnts of quality (those that are actually qualitative adjectives) always precede tile verb or, if it exists, the indefinite direct object. In our grammar, we treat such objects and adverbial complements as parts of the verb phrase. So, we do not check these constraints at the sentence or depeudeut clause level.</Paragraph> </Section> class="xml-element"></Paper>