File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-1012_metho.xml
Size: 14,972 bytes
Last Modified: 2025-10-06 14:10:06
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1012"> <Title>Statistical Dependency Parsing of Turkish</Title> <Section position="4" start_page="89" end_page="92" type="metho"> <SectionTitle> 3 Parser </SectionTitle> <Paragraph position="0"> Statistical dependency parsers first compute the probabilities of the unit-to-unit dependencies, and then find the most probable dependency tree T[?] among the set of possible dependency trees. This This old house-at+that-is rose's such grow +ing everyone very impressed Such growing of the rose in this old house impressed everyone very much. +'s indicate morpheme boundaries. The rounded rectangles show the words while the inflectional groups within the words that have more than 1 IG are emphasized with the dashed rounded rectangles. The inflectional features of each inflectional group as produced by the morphological analyzer are listed below.</Paragraph> <Paragraph position="2"> where in our case S is a sequence of units (words, IGs) and T, ranges over possible dependency trees consisting of left-to-right dependency links dep(wi,wH(i)) with wH(i) denoting the head unit to which the dependent unit, wi, is linked to.</Paragraph> <Paragraph position="3"> The distance between the dependent units plays an important role in the computation of the dependency probabilities. Collins (1996) employs this distance [?]i,H(i) in the computation of word-to-word dependency probabilities</Paragraph> <Paragraph position="5"> suggesting that distance is a crucial variable when deciding whether two words are related, along with other features such as intervening punctuation. Chung and Rim (2004) propose a different method and introduce a new probability factor that takes into account the distance between the dependent and the head. The model in equation 3 takes into account the contexts that the dependent and head reside in and the distance between the head and the dependent.</Paragraph> <Paragraph position="7"> Here Phi represents the context around the dependent wi and PhH(i), represents the context around the head word. P(dep(wi,wH(i))|S) is the probability of the directed dependency relation between wi and wH(i) in the current sentence, while seeing asimilardependency (withwi asthedependent, wH(i) as the head in a similar context) in the training treebank.</Paragraph> <Paragraph position="8"> For the parsing models that will be described below, the relevant statistical parameters needed have been estimated from the Turkish treebank (Oflazer et al., 2003). Since this treebank is relatively smaller than the available treebanks for other languages (e.g., Penn Treebank), we have opted to model the bigram linkage probabilities in an unlexicalized manner (that is, by just taking certain morphosyntactic properties into account), to avoid, to the extent possible, the data sparseness problem which is especially acute for Turkish. We have also been encouraged by the success of the unlexicalized parsers reported recently (Klein and Manning, 2003; Chung and Rim, 2004).</Paragraph> <Paragraph position="9"> For parsing, we use a version of the Backward Beam Search Algorithm (Sekine et al., 2000) developed for Japanese dependency analysis adapted to our representations of the morphological structureofthewords. Thisalgorithm parses asentence by starting from the end and analyzing it towards thebeginning. Bymakingtheprojectivity assumption that the relations do not cross, this algorithm considerably facilitates the analysis.</Paragraph> <Paragraph position="10"> 4 Details of the Parsing Models In this section we detail three models that we have experimented with for Turkish. All three models are unlexicalized and differ either in the units used for parsing or in the way contexts modeled. In all three models, we use the probability model in Equation 3.</Paragraph> <Section position="1" start_page="91" end_page="91" type="sub_section"> <SectionTitle> 4.1 Simplifying IG Tags </SectionTitle> <Paragraph position="0"> Our morphological analyzer produces a rather rich representation with a multitude of morphosyntactic and morphosemantic features encoded in the words. However, not all of these features are necessarily relevant in all the tasks that these analyses can be used in. Further, different subsets of these features may be relevant depending on the function of a word. In the models discussed below, we use a reduced representation of the IGs to &quot;unlexicalize&quot; the words: 1. For nominal IGs,4 we use two different tags depending on whether the IG is used as a dependent or as a head during (different stages of ) parsing: * If the IG is used as a dependent, (and, only word-final IGs can be dependents), we represent that IG by a reduced tag consisting of only the case marker, as that essentially determines the syntactic function of that IG as a dependent, and only nominals have cases.</Paragraph> <Paragraph position="1"> * If the IG is used as a head, then we use only part-of-speech and the possessive agreement marker in the reduced tag.</Paragraph> <Paragraph position="2"> 4These are nouns, pronouns, and other derived forms that inflectwiththesameparadigm asnouns, includinginfinitives, past and future participles.</Paragraph> <Paragraph position="3"> 2. For adjective IGs with present/past/future participles minor part-of-speech, we use the part-of-speech when they are used as dependents and the part-of-speech plus the the possessive agreement marker when used as a head.</Paragraph> <Paragraph position="4"> 3. For other IGs, we reduce the IG to just the part-of-speech.</Paragraph> <Paragraph position="5"> Such a reduced representation also helps alleviate the sparse data problem as statistics from many word forms with only the relevant features are conflated.</Paragraph> <Paragraph position="6"> We modeled the second probability term on the right-hand side of Equation 3 (involving the distance between the dependent and the head unit) in the following manner. First, we collected statistics over the treebank sentences, and noted that, if we count words as units, then 90% of dependency links link to a word that is less than 3 words away. Similarly, if we count distance in terms of IGs, then 90% of dependency links link to an IG that is less than 4 IGs away to the right. Thus we selected a parameter k = 4 for Models 1 and 3 below, wheredistance ismeasured interms ofwords, and k = 5 for Model 2where distance is measured in terms of IGs, as a threshold value at and beyond which a dependency is considered &quot;distant&quot;. During actual runs, P(wi links to some head H(i) [?] i away|Phi) was computed by interpolating P1(wi links to some head H(i) [?] i away|Phi) estimated from the training corpus, and P2(wi links to some head H(i) [?] i away) the estimated probability for a length of a link when no contexts are considered, again estimated from the training corpus. When probabilities are estimated from the training set, all distances larger than k are assigned the same probability. If even after interpolation, the probability is 0, then a very small value is used. This is a modified version of the backed-off smoothing used by Collins (1996) to alleviate sparse data problems. A similar interpolation isusedforthefirstcomponent ontheright hand side of Equation 3 by removing the head and the dependent contextual information all at once.</Paragraph> </Section> <Section position="2" start_page="91" end_page="92" type="sub_section"> <SectionTitle> 4.2 Model 1 - &quot;Unlexicalized&quot; Word-based Model </SectionTitle> <Paragraph position="0"> In this model, we represent each word by a reduced representation of its last IG when used as a representation of its IGs when used as a head. Since a word can be both a dependent and a head word, the reduced representation to be used is dynamically determined during parsing.</Paragraph> <Paragraph position="1"> Parsing then proceeds with words as units represented in this manner. Once the parser links these units, we remap these links back to IGs to recover the actual IG-to-IG dependencies. We already know that any outgoing link from a dependent will emanate from the last IG of that word. For the head word, we assume that the link lands on the first IG of that word.6 For the contexts, we use the following scheme.</Paragraph> <Paragraph position="2"> A contextual element on the left is treated as a dependent and is modeled with its last IG, while a contextual element on the right is represented as if it were a head using all its IGs. We ignore any overlaps between contexts in this and the subsequent models.</Paragraph> <Paragraph position="3"> In Figure 5 we show in a table the sample sentence in Figure 3, the morphological analysis for each word and the reduced tags for representing the units for the three models. For each model, we list the tags when the unit is used as a head and when it is used as a dependent. For model 1, we use the tags in rows 3 and 4.</Paragraph> </Section> <Section position="3" start_page="92" end_page="92" type="sub_section"> <SectionTitle> 4.3 Model 2 - IG-based Model </SectionTitle> <Paragraph position="0"> In this model, we represent each IG with reduced representations in the manner above, but do not concatenate them into a representation for the word. So our &quot;units&quot; for parsing are IGs. Theparser directly establishes IG-to-IGlinks from word-final IGs to some IG to the right. The contexts that are used in this model are the IGs to the left (starting with the last IG of the preceding word) and the right of the dependent and the head IG.</Paragraph> <Paragraph position="1"> The units and the tags we use in this model are in rows 5 and 6 in the table in Figure 5. Note that the empty cells in row 4 corresponds to IGs which can not be syntactic dependents as they are not word-final.</Paragraph> </Section> <Section position="4" start_page="92" end_page="92" type="sub_section"> <SectionTitle> 4.4 Model 3 - IG-based Model with Word-final IG Contexts </SectionTitle> <Paragraph position="0"> This model is almost exactly like Model 2 above.</Paragraph> <Paragraph position="1"> The two differences are that (i) for contexts we only use just the word-final IGs to the left and the right ignoring any non-word-final IGs in between (except for the case that the context and the head overlap, where we use the tag of the head IG in6This choice is based on the observation that in the treebank, 85.6% of the dependency links land on the first (and possibly the only) IG of the head word, while 14.4% of the dependency links land on an IG other than the first one.</Paragraph> <Paragraph position="2"> stead of the final IG); and (ii) the distance function is computed in terms of words. The reason this model is used is that it is the word final IGs that determine the syntactic roles of the dependents.</Paragraph> </Section> </Section> <Section position="5" start_page="92" end_page="93" type="metho"> <SectionTitle> 5 Results </SectionTitle> <Paragraph position="0"> Since in this study we are limited to parsing sentences with only left-to-right dependency links7 which do not cross each other, we eliminated the sentences having such dependencies (even if they contain a single one) and used a subset of 3398 such sentences in the Turkish Treebank. The gold standard part-of-speech tags are used in the experiments. The sentences in the corpus ranged between 2 words to 40 words with an average of about 8 words;8 90% of the sentences had less than or equal to 15 words. In terms of IGs, the sentences comprised 2 to 55 IGs with an average of 10 IGs per sentence; 90% of the sentences had less than or equal to 15 IGs. We partitioned this set into training and test sets in 10 different ways to obtain results with 10-fold cross-validation.</Paragraph> <Paragraph position="1"> We implemented three baseline parsers: of Nivre (2003). The parser uses 23 unlexicalized linking rules and a heuristic that links any non-punctuation word not linked by the parser to the last IG of the last word as a dependent. null Table 1 shows the results from our experiments with these baseline parsers and parsers that are based on the three models above. The three modelshave been experimented withdifferent contexts around both the dependent unit and the head. In each row, columns 3 and 4 show the percentage of IG-IG dependency relations correctly recovered for all tokens, and just words excluding punctuation from the statistics, while columns 5 and 6 show the percentage of test sentences for which all dependency relations extracted agree with the baselines behave the same.</Paragraph> <Paragraph position="2"> relations in the treebank. Each entry presents the average and the standard error of the results on the test set, over the 10 iterations of the 10-fold crossvalidation. Our main goal is to improve the percentage of correctly determined IG-to-IG dependency relations, shown in the fourth column of the table. The best results in these experiments are obtained with Model 3 using 1 unit on both sides of the dependent. Although it is slightly better than Model 2 with the same context size, the difference between the means (0.4+-0.2) for each 10 iterations is statistically significant.</Paragraph> <Paragraph position="3"> Since wehave been using unlexicalized models, we wanted to test out whether a smaller training corpus would have a major impact for our current models. Table 2 shows results for Model 3 with no context and 1 unit on each side of the dependent, obtained by using only a 1500 sentence subset of the original treebank, again using 10-fold cross validation. Remarkably the reduction in training set size has a very small impact on the results.</Paragraph> <Paragraph position="4"> Although all along, we have suggested that determining word-to-word dependency relationships is not the right approach for evaluating parser performance for Turkish, we have nevertheless performed word-to-word correctness evaluation so thatcomparison withother wordbased approaches can be made. In this evaluation, we assume that a dependency link is correct if we correctly determine the head word (but not necessarily the correct IG). Table 3 shows the word based results for the best cases of the models in Table 1.</Paragraph> <Paragraph position="5"> We have also tested our parser with a pure word model where both the dependent and the head are represented by the concatenation of their IGs, that is, by their full morphological analysis except the root. Theresult forthiscase isgiven inthe lastrow of Table 3. This result is even lower than the rule-based baseline.10 Forthis model, if weconnect the 10Also lower than Model 1 with no context (79.1+-1.1) dependent to the first IG of the head as we did in Model 1, the IG-IG accuracy excluding punctuations becomes 69.9+-3.1, which is also lower than baseline 3 (70.5%).</Paragraph> </Section> class="xml-element"></Paper>