File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1706_metho.xml
Size: 15,881 bytes
Last Modified: 2025-10-06 14:08:36
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1706"> <Title>The Effect of Rhythm on Structural Disambiguation in Chinese</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Analysis of Rhythmic Constraints </SectionTitle> <Paragraph position="0"> We divide our analysis of the use of rhythm in Chinese phrases into two categories, based on two types of phrases in Chinese: (1) simple phrases, containing only words, i.e. all the child nodes are POS tag in the derivation tree; and (2) complex phrases in which at least one constituent is a phrase itself, i.e. it has at least one child node with phrase type symbol (like NP, VP) in its derivation tree.</Paragraph> <Paragraph position="1"> Below we will give the statistical analysis of the distribution of rhythm feature in different constructions from both simple and complex phrases. The corpus from which the statistical data is drawn contains 200K words of newspaper text from the People's Daily. The texts are wordsegmented, POS tagged and labeled with content chunks. The content chunk is a phrase containing only content words, akin to a generalization of a BaseNP. These content chunks are parsed into binary shallow trees. More details about content chunks can be found in Section 3.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Rhythm feature in simple phrases </SectionTitle> <Paragraph position="0"> Simple phrases contain two lexical words (since, as discussed above, our parse trees are binary). The rhythm feature of each word is defined to be the number of syllables in it. Thus the rhythm feature for a word can take on one of the following three values: (1) monosyllabic; (2) bi-syllabic; and (3) multi-syllabic, meaning with three syllables or more.</Paragraph> <Paragraph position="1"> Since each binary phrase contains two words, the set of rhythm features for a simple phrase is:</Paragraph> <Paragraph position="3"> where 0, 1, 2 represent monosyllabic, bi-syllabic and multi-syllabic respectively.</Paragraph> <Paragraph position="4"> In the following sections, we will present three case studies on the distributions of rhythm feature in different constructions: (1) verbs as modifier or head in NP; (2) the contrast between NPs and VPs formed by &quot; verb + noun&quot; sequences; (3) &quot;noun + verb&quot; sequences.</Paragraph> <Paragraph position="5"> 2.1.1 Case 1: Verb as modifier/head in NP In Chinese, verbs can function as modifier or head in a noun phrase without any change of forms. For example, in &quot;Guo Shu /fruit tree Zai Pei /growing&quot;, &quot;Zai Pei &quot; is the head while in &quot;Zai Pei /growing Ji Zhu /technique&quot;, &quot;Zai Pei &quot; is a modifier. However, in such constructions, there are strong constraints on the length of both verbs and nouns. Table 1 gives the distributions of the rhythm feature in the rule &quot;NP -> N V&quot;('N' and 'V' represent noun and verb respectively) in which the verb is the head and &quot;NP -> V N&quot; in which the verb is a modifier.</Paragraph> <Paragraph position="6"> Table 1 Distribution of rhythm feature in NP with verb as modifier or head [0,0] [0,1] [0,2] [1,0] [1,1] [1,2] [2,0] [2,1] [2,2] Total</Paragraph> <Paragraph position="8"> Table 1 indicates that in both rules, the rhythm pattern [1,1], ie. &quot;bi-syllabic + bi-syllabic&quot;, prevails. In the rule &quot;NP -> V N&quot;, this pattern accounts for 93% among the nine possible patterns while in the rule &quot;NP -> N V&quot;, this pattern accounts for 81%. We can also find that in both cases, [0,2] and [2,0] are prohibited, that is to say, both verbs and nouns cannot be longer than two syllables.</Paragraph> <Paragraph position="9"> 2.1.2 Case 2: Contrast between NP and VP formed by &quot;V N&quot; sequence The sequence &quot;V N&quot;(&quot;verb + noun&quot;) can constitute an NP or a VP. The rhythm patterns in the two types of phrases are significantly different, however, as shown in Table 2. We see that in the NP case, verbs are mainly bi-syllabic. The total number of examples with bi-syllabic verbs in NP is 2820, accounting for 98% of all the cases. On the other hand, mono-syllabic verbs are less likely to appear in this position. The total number of examples with mono-syllabic verbs in NP is 23, accounting for only 0.8% of all the cases. That is to say, the likelihood of bi-syllabic verbs appearing in this syntactic position is 122 times the likelihood of mono-syllabic verbs. On the other hand, there is no big difference between bi-syllabic verbs and mono-syllabic verbs in the VP formed by &quot;V + N&quot;. The ratios of bi-syllabic and mono-syllabic verbs in VP are 48 % and 55% respectively. The statistical facts tell us that for a &quot;verb + noun&quot; sequence, if the verb is not bi-syllabic then it is very unlikely to be an NP. Figure 1 depicts more clearly the difference between NP and VP formed by &quot;V N&quot; sequence in the distribution of rhythm feature.</Paragraph> <Paragraph position="11"> An &quot;N V&quot;(&quot;noun + verb&quot;) sequence can be mainly divided into three types by the dominating phrasal category: (1) NP(noun phrase), e.g. &quot;Guo Shu /fruit tree Zai Pei /growth&quot;; (2) S(subject-verb construction), e.g. &quot; Cai Qi /colored flag Piao Yang /flutter&quot;; (3)NC(non-constituent), eg. &quot;Jing Ji /economy Fa Zhan /develop&quot; in &quot;Zhong Guo /China De /DE Jing Ji /economy Fa Zhan /develop De /DE Hen /very Kuai /fast&quot;. ('China's economy develops very fast') Table 3 gives the distribution of rhythm feature in the three types of cases.</Paragraph> <Paragraph position="12"> We see in Table 3, in rule &quot;NP -> N V&quot;, that the verb cannot be mono-syllabic since the first row is 0 in all the patterns in which verb is monosyllabic([0,0], [1,0],[2,0]). The &quot;bi-syllabic + bi-syllabic&quot; ([1,1]) pattern accounts for 93% (1275/1371) of the total number. Let's look at the cases with mono-syllabic verbs in all the three types. The total number of such examples is 1652 in the corpus (adding all the numbers in columns [0,0], [1,0] and [2,0] on the three rows). Among these 1652 cases, there is not one example in which the &quot;N V&quot; is an NP. The sequence has a probability of 3%(47/1652) to be an S and 97 %(1605/1652) of being an NC(non-constituent).</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Rhythm feature in complex phrases </SectionTitle> <Paragraph position="0"> Just as we saw with two word simple phrases, the rhythm feature also has an effect on complex phrases where at least one component is a phrase, i.e. spanning over two words or more. For example, for the following fragment of a sentence: Kua /stride Jin /into San Xia /the Three Gorges Gong Cheng /project Da Men /gate 'enter into the gate of the Three Gorges Project' according to PCFG, the parse as indicated in applies the rule &quot;NP-> VP N&quot; (i.e. &quot;Jin San Xia Gong Cheng &quot; modifying &quot; Da Men &quot;). This rule has 216 occurrences in the corpus, of which 168 times it contains a VP of 2 words, 30 times a VP of 3 words and 18 times a VP of more than 3 words.</Paragraph> <Paragraph position="1"> These statistics indicate that this rule prefers to choose a short VP acting as the modifier of a noun, as in &quot;NP(VP( Chong /grow Liang /grain) Da Hu /large family)&quot; and &quot;NP(VP(Xue /learn Lei Feng /Lei Feng) Biao Bing /model)&quot;. But in the example in Figure 2(a), the VP contains 3 words, so it is less likely to be a modifier in an NP.</Paragraph> <Paragraph position="2"> When a phrase works as a constituent in a larger phrase, its rhythm feature is defined as the number of words in it. Thus a phrase may take on one of the three values for the rhythm feature: (1) two words; (3) three words; and (3) more than three words. Similar to that in the simple phrases, we may use 0, 1, 2 to represent the three values respectively. Therefore, for every construction containing two constituents, its rhythm feature can be described by a 3x3 matrix uniformly. For example, in the examples for rule &quot;NP -> VP N&quot; above, the feature value for &quot;NP(VP(Chong /grow Liang /grain) Da Hu /large family)&quot; is [0, 1] in which 0 indicates the VP contains 2 words and 1 represents that the noun is bi-syllabic. The rule helps to interpret the meaning of the feature value, i.e. the value is for a word or a phrase. For example, for rule &quot;VP -> V N&quot;, feature value [0, 1] means that the verb is mono-syllabic and the noun is bisyllabic, while for rule &quot;NP-> VP N&quot;, feature [0,1] means that the VP contains two words and the noun is bi-syllabic.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Content Chunk Parsing </SectionTitle> <Paragraph position="0"> We have chosen the task of content chunk parsing to test the usefulness of our rhythm feature to Chinese text. In this section we address two questions: (1) What is a content chunk? (2) Why are we interested in content chunk parsing? A content chunk is a phrase formed by a sequence of content words, including nouns, verbs, adjectives and content adverbs. There are three kinds of cases for the mapping between content word sequences and content chunks: (1) A content word sequence is a content chunk. A special case of this is that a whole sentence is a content chunk when all the words in it are content words, eg. [[Qian Jing /Prospect Gong Si /company]NP [Tui Chu /release [Gao Ji /advanced [Dian Nao /computer [Pai Ban /typesetting Xi Tong /system]NP]NP]NP]VP ('Prospect Company released an advanced computer typesetting system.').</Paragraph> <Paragraph position="1"> (2) A content word sequence is not a content chunk. For example, in &quot;Zhong Guo /China De /AUX Jing Ji /economy Fa Zhan /develop De /AUX Hen /very Kuai /fast&quot;('China's economy develops very fast.'), &quot;Jing Ji /economy Fa Zhan /develop&quot; is a content word sequence, but it's not a phrase in the sentence.</Paragraph> <Paragraph position="2"> (3) A part of a content word sequence is a content chunk. For example, in &quot; Si Ying /private Jing Ji /economy Fa Zhan /develop De /AUX Shi Tou /trend Hen /very Hao /good&quot;('The developmental trend of private economy is very good.'), &quot;Si Ying /private Jing Ji /economy Fa Zhan /develop&quot; is a content word sequence, but it's not a phrase; only &quot;Si Ying /private Jing Ji /economy&quot; in it is a phrase.</Paragraph> <Paragraph position="3"> The purpose of content chunk parsing is to recognize phrases in a sequence of content words.</Paragraph> <Paragraph position="4"> Specifically speaking, the content chunking contains two subtasks: (1) to recognize the maximum phrase in a sequence of content words; (2) to analyze the hierarchical structure within the phrase down to words. Like baseNP chunking(Church, 1988; Ramshaw & Marcus 1995), content chunk parsing is also a kind of shallow parsing. Content chunk parsing is deeper than baseNP chunking in two aspects: (1) a content chunk may contain verb phrases and other phrases even a full sentence as long as the all the components are content words; (2) it may contain recursive NPs. Thus the content chunk can supply more structural information than a baseNP.</Paragraph> <Paragraph position="5"> The motives for content chunk parsing are twofold: (1) Like other shallow parsing tasks, it can simplify the parsing task. This can be explained in two aspects. First, it can avoid the ambiguities brought up by functional words. In Chinese, the most salient syntactic ambiguities are prepositional phrases and the &quot;DE&quot; construction. For prepositional phrases, the difficulty lies in how to determine the right boundary, because almost any constituent can be the object of a preposition. For &quot;DE&quot; constructions, the problem is how to determine its left boundary, since almost any constituent can be followed by &quot;DE&quot; to form a &quot;DE&quot; construction. Second, content chunk parsing can simplify the structure of a sentence. When a content chunk is acquired, it can be replaced by its head word, thus reducing the length of the original sentence. If we get a parse from the reduced sentence with a full parser, then we can get a parse for the original sentence by replacing the head-word nodes with the content chunks from which the head-words are extracted. (2) The content chunk parsing may be useful for applications like information extraction and question answering.</Paragraph> <Paragraph position="6"> When using template matching, a content chunk may be just the correct level of shallow structure for matching with an element in a template.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 PCFG + PF Model </SectionTitle> <Paragraph position="0"> In the experiment we propose a statistical model integrating probabilistic context-free grammar (PCFG) model with a simple probabilistic features (PF) model. In this section we first give the definition for the statistical model and then we will give the method for parameter estimation.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Definition </SectionTitle> <Paragraph position="0"> According to PCFG, each rule r used to expand a node n in a parse is assigned a probability, i.e.:</Paragraph> <Paragraph position="2"> where A -> b is a CFG rule. The probability of a parse T is the product of each rule used to expand each node n in T: We expand PCFG by the way that when a left hand side category A is expanded into a string b , a feature set FS related to b is also generated. Thus, a probability is assigned for expansion of each node n when a rule r is applied:</Paragraph> <Paragraph position="4"> where P(FS |b , A) is probabilistic feature(PF) model and P(b |A) is PCFG model. PF model describes the probability of each feature in feature set FS taking on specific values when a CFG rule A -> b is given. To make the model more practical in parameter estimation, we assume the features in feature set FS are independent from each other,</Paragraph> <Paragraph position="6"> Our model is thus a simplification of more sophisticated models which integrate PCFGs with features, such as those in Magerman(1995), Collins(1997) and Goodman(1997). Compared with these models, our model is more practical when only small training data is available, since we assume the independence between features. For example, in Goodman's probabilistic feature grammar (PFG), each symbol in a PCFG is replaced by a set of features, so it can describe specific constraints on the rule. In the PFG model the generation of each feature is dependent on all the previously generated features, thus likely leading to severe sparse data problem in parameter estimation. Our simplified model assumes independence between the features, thus data sparseness problem can be significantly alleviated.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Parameter Estimation </SectionTitle> <Paragraph position="0"> Let F be a feature associated with a string b , where the possible values for F are f viewed as a random event.</Paragraph> </Section> </Section> class="xml-element"></Paper>