File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-1012_intro.xml

Size: 5,839 bytes

Last Modified: 2025-10-06 14:03:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1012">
  <Title>Statistical Dependency Parsing of Turkish</Title>
  <Section position="3" start_page="0" end_page="89" type="intro">
    <SectionTitle>
2 Turkish
</SectionTitle>
    <Paragraph position="0"> Turkish is an agglutinative language where a sequence ofinflectional andderivational morphemes get affixed to a root (Oflazer, 1994). At the syntax level, the unmarked constituent order is SOV, but constituent order may vary freely as demanded by the discourse context. Essentially all constituent orders are possible, especially at the main sentence level, with very minimal formal constraints.</Paragraph>
    <Paragraph position="1"> In written text however, the unmarked order is dominant at both the main sentence and embedded clause level.</Paragraph>
    <Paragraph position="2"> Turkish morphotactics is quite complicated: a given word form may involve multiple derivations and the number of word forms one can generate from a nominal or verbal root is theoretically infinite. Derivations in Turkish are very productive, and the syntactic relations that a word is in- null volved in as a dependent or head element, are determined by the inflectional properties of the one or more (possibly intermediate) derived forms. In this work, we assume that a Turkish word is represented as a sequence of inflectional groups (IGs hereafter), separated by ^DBs, denoting derivation boundaries, in the following general form: root+IG1 +^DB+IG2 +^DB+*** +^DB+IGn.</Paragraph>
    <Paragraph position="3"> Here each IGi denotes relevant inflectional features including the part-of-speech for the root and for any of the derived forms. For instance, the derivedmodifiersaVglamlas,tirdiVgimizdaki1 null would be represented as:2</Paragraph>
    <Paragraph position="5"> The five IGs in this are the feature sequences separated by the ^DB marker. The first IG shows the part-of-speech for the root which is its only inflectional feature. The second IG indicates a derivation into a verb whose semantics is &amp;quot;to become&amp;quot; the preceding adjective. The third IG indicates that a causative verb with positive polarity is derived from the previous verb. The fourth IG indicates the derivation of a nominal form, a past participle, with +Noun as the part-of-speech and +PastPart, as the minor part-of-speech, with some additional inflectional features. Finally, the fifth IG indicates a derivation into a relativizer adjective. null A sentence would then be represented as a sequence of the IGs making up the words. When a word is considered as a sequence of IGs, linguistically, the last IG of a word determines its role as a dependent, so, syntactic relation links only emanate from the last IG of a (dependent) word, and land on one of the IGs of a (head) word on the right (with minor exceptions), as exemplified in Figure 2. And again with minor exceptions, the dependency links between the IGs, when drawn above the IG sequence, do not cross.3 Figure 3 from Oflazer (2003) shows a dependency tree for a Turkish sentence laid on top of the words segmented along IG boundaries.</Paragraph>
    <Paragraph position="6"> With this view in mind, the dependency relations that are to be extracted by a parser should be relations between certain inflectional groups and 1Literally, &amp;quot;(the thing existing) at the time we caused (something) to become strong&amp;quot;.</Paragraph>
    <Paragraph position="7"> 2The morphological features other than the obvious part-of-speech features are: +Become: become verb, +Caus: causative verb, +PastPart: Derived past participle, +P3sg: 3sg possessive agreement, +A3sg: 3sg numberperson agreement, +Loc: Locative case, +Pos: Positive Polarity, +Rel: Relativizing Modifier.</Paragraph>
    <Paragraph position="8">  not orthographic words. Since only the word-final inflectional groups have out-going dependency links to a head, there will be IGs which do nothave anyoutgoing links (e.g., thefirstIGofthe word b&amp;quot;uy&amp;quot;umesi in Figure 3). We assume that such IGs are implicitly linked to the next IG, but neither represent nor extract such relationships with the parser, as it is the task of the morphological analyzer to extract those. Thus the parsing models that we will present in subsequent sections all aim to extract these surface relations between the relevant IGs, and in line with this, we will employ performance measures based on IGs and their relationships, and not on orthographic words.</Paragraph>
    <Paragraph position="9"> We use a model of sentence structure as depicted inFigure 4. Inthis figure, thetop part represents thewordsinasentence. Aftermorphological analysis and morphological disambiguation, each word is represented with (the sequence of) its inflectional groups, shown in the middle of the figure. The inflectional groups are then reindexed so that they are the &amp;quot;units&amp;quot; for the purposes of parsing. The inflectional groups marked with [?] are those from which a dependency link will emanate from, to a head-word to the right. Please note that the number of such marked inflectional groups is the same as the number of words in the sentence, and all of such IGs, (except one corresponding to the distinguished head of the sentence which will not have any links), will have outgoing dependency links.</Paragraph>
    <Paragraph position="10"> Inthe rest of this paper, wefirst givea very brief overview a general model of statistical dependency parsing and then introduce three models for dependency parsing of Turkish. We then present our results for these models and for some additional experiments for the best performing model.</Paragraph>
    <Paragraph position="11"> We then close with a discussion on the results, analysis of the errors the parser makes, and conclusions. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML