XML Viewer - w06-2927

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2927_metho.xml
Size: 10,123 bytes
Last Modified: 2025-10-06 14:10:54
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2927">
  <Title>Multi-lingual Dependency Parsing at NAIST</Title>
  <Section position="5" start_page="191" end_page="193" type="metho">
    <SectionTitle>
FORM
LEMMA
CPOSTAG
POSTAG
FEATS
</SectionTitle>
    <Paragraph position="0"> Key: The features for machine learning of each token the label. The architecture of the parser consists of four major procedures and as in Fig.1: (i) Decide the neighboring dependency attachment between all adjacent words in the input sentence by SVM-based tagger (as a preprocessing) (ii) Extract the surrounding features for the focused pair of nodes.</Paragraph>
    <Paragraph position="1"> (iii) Estimate the dependency attachment operation of the focused pair of nodes by SVMs.</Paragraph>
    <Paragraph position="2"> (iv) If there is a left or right attachment, estimate the label of dependency relation by MaxEnt.</Paragraph>
    <Paragraph position="3"> We will explain the main procedures (steps (ii)(iv)) in sections 2.1 and 2.2, and the preprocessing in section 2.3.</Paragraph>
    <Section position="1" start_page="191" end_page="191" type="sub_section">
      <SectionTitle>
2.1 Word dependency analysis
</SectionTitle>
      <Paragraph position="0"> In the algorithm, the state of the parser is represented by a triple AIS ,, . S and I are stacks, S keeps the words being in consideration, and I keeps the words to be processed. A is a list of dependency attachments decided in the algorithm.</Paragraph>
      <Paragraph position="1"> Given an input word sequence W, the parser is initialized by the triple ph,,Wnil . The parser estimates the dependency attachment between two words (the top elements of stacks S and I). The algorithm iterates until the list I becomes empty.</Paragraph>
      <Paragraph position="2"> There are four possible operations (Right, Left, Shift and Reduce) for the configuration at hand.</Paragraph>
      <Paragraph position="3"> Right or Left: If there is a dependency relation that the word t or n attaches to word n or t, add the new dependency relation ()nt - or ()tn - into A, remove t or n from S or I.</Paragraph>
      <Paragraph position="4"> If there is no dependency relation between n and t, check the following conditions.</Paragraph>
      <Paragraph position="5"> Reduce: If there is no word 'n ( In [?]' ) which may depend on t, and t has a parent on its left side, the parser removes t from the stack S.</Paragraph>
      <Paragraph position="6"> Shift: If there is no dependency between n and t, and the triple does not satisfy the conditions for Reduce, then push n onto the stack S.</Paragraph>
      <Paragraph position="7"> In this work, we adopt SVMs for estimating the word dependency attachments. SVMs are binary classifiers based on the maximal margin strategy.</Paragraph>
      <Paragraph position="8"> We use the polynomial kernel: d K )1()( zxzx, [?]+= with d =2. The performance of SVMs is better than that of the maximum entropy method in our preceding work for Chinese dependency analysis (Cheng, 2005b). This is because that SVMs can combine features automatically (using the polynomial kernel), whereas the maximum entropy method cannot. To extend binary classifiers to multi-class classifiers, we use the pair-wise method, in which we make</Paragraph>
      <Paragraph position="10"> binary classifiers between all pairs of the classes (Kreb el, 1998). We use Libsvm (Lin et al., 2001) in our experiments. In our method, the parser considers the dependency attachment of two nodes (n,t). The features of a node are the word itself, the POS-tag and the information of its child node(s). The context features are 2 preceding nodes of node t (and t itself), 2 succeeding nodes of node n (and n itself), and their child nodes. The distance between nodes n and t is also used as a feature. The features are shown in Fig.2.</Paragraph>
    </Section>
    <Section position="2" start_page="191" end_page="192" type="sub_section">
      <SectionTitle>
2.2 Label tagging
</SectionTitle>
      <Paragraph position="0"> We adopt MaxEnt to estimate the label of dependency relations. We have tried to use linear-chain conditional random fields (CRFs) for estimating the labels after the dependency relation analysis.</Paragraph>
      <Paragraph position="1"> This means that the parser first analyzes the word dependency (head-modifier relation) of the input sentence, then the CRFs model analyzes the most suitable label set with the basic information of input sentence (FORM, LEMMA, POSTAG......etc) and the head information (FORM and POSTAG) of each word. However, as the number of possible labels in some languages is large, training a CRF model with these corpora (we use CRF++ (Kudo, 2005)) cost huge memory and time.</Paragraph>
      <Paragraph position="2"> Instead, we combine the maximum entropy method in the word dependency analysis to tag the label of dependency relation. As shown in Fig. 1, the parser first gets the contextual features to estimate the word dependency. If the parsing operation  is &amp;quot;Left&amp;quot; or &amp;quot;Right&amp;quot;, the parser then use MaxEnt with the same features to tag the label of relation. This strategy can tag the label according to the current states of the focused word pair. We divide the training instances according to the CPOSTAG of the focused word n, so that a classifier is constructed for each of distinct POS-tag of the word n.</Paragraph>
    </Section>
    <Section position="3" start_page="192" end_page="193" type="sub_section">
      <SectionTitle>
2.3 Preprocessing
</SectionTitle>
      <Paragraph position="0"> In our preceding work (Cheng, 2005a), we discussed three problems of our basic methods (adopt Nivre's algorithm with SVMs) and proposed three preprocessing methods to resolve these problems.</Paragraph>
      <Paragraph position="1"> The methods include: (1) using global features and a two-steps process to resolve the ambiguity between the parsing operations &amp;quot;Shift&amp;quot; and &amp;quot;Reduce&amp;quot;. (2) using a root node finder and dividing the sentence at the root node to make use of the top-down information. (3) extracting the prepositional phrase (PP) to resolve the problem of identifying the boundary of PP.</Paragraph>
      <Paragraph position="2"> We incorporated Nivre's method with these preprocessing methods for Chinese dependency analysis with Penn Chinese Treebank and Sinica Treebank (Chen et al., 2003). This was effective because of the properties of Chinese: First, there is no multi-root in Chinese Treebank. Second, the boundary of prepositional phrases is ambiguous.</Paragraph>
      <Paragraph position="3"> We found that these methods do not always improve the accuracy of all the languages in the shared task.</Paragraph>
      <Paragraph position="4"> We have tried the method (1) in some languages to see if there is any improvement in the parser. We attempted to use global features and two-step analysis to resolve the ambiguity of the operations. In Chinese (Chen et al., 2003) and Danish (Kromann, 2003), this method can improve the parser performance. However, in other languages, such as Arabic (Hajic et al., 2004), this method decreased the performance. The reason is that the sentence in some languages is too long to use global features. In our preceding work, the global features include the information of all the un-analyzed words. However, for analyzing long sentences, the global features usually include some useless information and will confuse the two-step process. Therefore, we do not use this method in this shared task.</Paragraph>
      <Paragraph position="5"> In the method (2), we construct an SVM-based root node finder to identify the root node and divided the sentence at the root node in the Chinese Treebank. This method is based on the properties of dependency structures &amp;quot;One and only one element is independent&amp;quot; and &amp;quot;An element cannot have modifiers lying on the other side of its own head&amp;quot;. However, there are some languages that include multi-root sentences, such as Arabic, Czech, and Spanish (Civit and Marti, 2002), and it is difficult to divide the sentence at the roots. In multi-root sentences, deciding the head of the words between roots is difficult. Therefore, we do not use the method (2) in the share task.</Paragraph>
      <Paragraph position="6"> The method (3) -namely PP chunker- can identify the boundary of PP in Chinese and resolve the ambiguity of PP boundary, but we cannot guarantee that to identify the boundary of PP can improve the parser in other languages. Even we do not understand construction of PP in all languages.</Paragraph>
      <Paragraph position="7"> Therefore, for the robustness in analyzing different languages, we do not use this method.</Paragraph>
      <Paragraph position="8">  tagger In the bottom-up dependency parsing approach, the features and the strategies for parsing in early stage (the dependency between adjacent  words) is different from parsing in upper stage (the dependency between phrases). Parsing in upper stage needs the information at the phrases not at the words alone. The features and the strategies for parsing in early and upper stages should be separated into distinct. Therefore, we divide the neighboring dependency attachment (for early stage) and normal dependency attachment (for upper stage), and set the neighboring dependency attachment tagger as a preprocessor.</Paragraph>
      <Paragraph position="9"> When the parser analyzes an input sentence, it extracts the neighboring dependency attachments first, then analyzes the sentence as described before. The results show that tagging the neighboring dependency word-pairs can improve 9 languages out of 12 scoring languages, although in some languages it degrades the performance a little. Potentially, there may be a number of ways for decomposing the parsing process, and the current method is just the simplest decomposition of the process. The best method of decomposition or dynamic changing of parsing models should be investigated as the future research.</Paragraph>
      <Paragraph position="10">  We extract all words that depend on the adjacent word (right or left).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML