File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-2008_metho.xml
Size: 9,355 bytes
Last Modified: 2025-10-06 14:09:01
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-2008"> <Title>Improving the Accuracy of Subcategorizations Acquired from Corpora</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> #S(EPATTERN :TARGET |yield| :SUBCAT (VSUBCAT NP) </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> In their study, they first acquire fine-grained SCFs using the unsupervised method proposed by Briscoe and Carroll (1997) and Korhonen (2002).</Paragraph> <Paragraph position="3"> Figure 1 shows an example of one acquired SCF entry for a verb &quot;yield.&quot; Each SCF entry has several fields about the observed SCF. I explain here only its portion related to this study. The TARGET field is a word stem, the first number in the CLASSES field indicates an SCF type, and the FREQCNT field shows how often words derivable from the word stem appeared with the SCF type in the training corpus. The obtained SCFs comprise the total 163 SCF types which are originally based on the SCFs in the ANLT (Boguraev and Briscoe, 1987) and COMLEX (Grishman et al., 1994) dictionaries. In this example, the SCF type 24 corresponds to an SCF of transitive verb.</Paragraph> <Paragraph position="4"> They then obtain SCFs for the target lexicalized grammar (the LinGO ERG (Copestake, 2002) in their study) using a handcrafted translation map from these 163 types to the SCF types in the target grammar. They reported that they could achieve a coverage improvement of 4.5% but that average parse time was doubled. This is because they did not use any filtering method for the acquired SCFs to suppress an increase of the lexical ambiguity. We definitely need some method to control the quality of the acquired SCFs.</Paragraph> <Paragraph position="5"> Their method is extendable to any lexicalized grammars, if we could have a translation map from these 163 types to the SCF types in the grammar.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Clustering of Verb SCF Distributions </SectionTitle> <Paragraph position="0"> There is some related work on clustering of verbs according to their SCF probability distributions (Schulte im Walde and Brew, 2002; Korhonen et al., 2003). Schulte im Walde and</Paragraph> <Paragraph position="2"> Brew (2002) used the k-Means (Forgy, 1965) algorithm to cluster SCF distributions for monosemous verbs while Korhonen et al. (2003) applied other clustering methods to cluster polysemic SCF data. These studies aim at obtaining verb semantic classes, which are closely related to syntactic behavior of argument selection (Levin, 1993).</Paragraph> <Paragraph position="3"> Korhonen (2002) made use of SCF distributions for representative verbs in Levin's verb classes to obtain accurate back-off estimates for all the verbs in the classes. In this study, I assume that there are classes whose element words have identical SCF types. I then obtain these classes by clustering acquired SCFs, using information available in the target lexicon, and directly use the obtained classes to eliminate implausible SCFs.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="4" type="metho"> <SectionTitle> 3 Method </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="2" type="sub_section"> <SectionTitle> 3.1 Estimation of Confidence Values for SCFs </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> SCFs whose probabilities exceed a certain threshold are recognized in the lexicon. I hereafter call this threshold recognition threshold. Figure 2 depicts a probability distribution of SCF for apply.</Paragraph> <Paragraph position="3"> In this context, I can regard a confidence value of each SCF as a probability that the probability of that SCF exceeds the recognition threshold.</Paragraph> <Paragraph position="4"> One intuitive way to estimate a confidence value is to assume an observed probability, i.e., relative frequency, is equal to a probability th</Paragraph> <Paragraph position="6"> is a frequency that a word w i appears with SCF s j in corpora). When the relative frequency of s j for a word w i exceeds the recognition threshold, its confidence value con f ij is set to 1, and otherwise con f ij is set to 0. However, an observed probabil null ity is unreliable for infrequent words. Moreover, when we want to encode confidence values of reliable SCFs in the target grammar, we cannot distinguish the confidence values of those SCFs with confidence values of acquired SCFs. The other promising way to estimate a confidence value, which I adopt in this study, is to assume a probability th ij as a stochastic variable in the context of Bayesian statistics (Gelman et al., 1995). In this context, a posteriori distribution of the probability th</Paragraph> <Paragraph position="8"> ) is a priori distribution, and D is the data we have observed. Since every occurrence of SCFs in the data D is independent with each other, the data D can be regarded as Bernoulli trials. When we observe the data D that a word w i appears n times in total and x([?]n) times with SCF</Paragraph> <Paragraph position="10"> its conditional distribution is represented by binominal distribution:</Paragraph> <Paragraph position="12"> To calculate this a posteriori distribution, I need to define the a priori distribution P(th ij ). The question is which probability distribution of th ij can appropriately reflects prior knowledge. In other words, it should encode knowledge we use to estimate SCFs for unknown words. I simply determine it from distributions of observed probability values of s j for words seen in corpora I estimated a priori distribution separately for each type of SCF from words that appeared more than 50 times in the training corpus in the following experiments. a method described in (Tsuruoka and Chikayama, 2001). In their study, they assume a priori distribution as the beta distribution defined as:</Paragraph> <Paragraph position="14"> In order to combine SCF confidence-value vectors for words acquired from corpora and those for words in the lexicon of the target grammar, I also represent an SCF confidence-value vector v</Paragraph> <Paragraph position="16"> where e expresses an unreliability of the lexicon.</Paragraph> <Paragraph position="17"> In this study, I trust the lexicon as much as possible by setting e to the machine epsilon.</Paragraph> </Section> <Section position="2" start_page="2" end_page="4" type="sub_section"> <SectionTitle> 3.2 Clustering of SCF Confidence-Value Vectors </SectionTitle> <Paragraph position="0"> I next present a clustering algorithm of words according to their SCF confidence-value vectors.</Paragraph> <Paragraph position="1"> Given k initial representative vectors called centroids, my algorithm iteratively updates clusters by assigning each data object to its closest centroid The expectation and variance of the beta distribution are made equal to those of the observed probability values. Input: a set of SCF confidence-value</Paragraph> <Paragraph position="3"> confidence-value vectors and recomputing centroids until cluster members become stable, as depicted in Figure 3.</Paragraph> <Paragraph position="4"> Although this algorithm is roughly based on the k-Means algorithm, it is different from k-Means in important respects. I assume the elements of the centroids of the clusters as a discrete value of 0 or 1 because I want to obtain clusters whose element words have the exactly same set of SCFs.</Paragraph> <Paragraph position="5"> I then derive a distance function d to calculate a probability that a data object v</Paragraph> <Paragraph position="7"> ((2) in Figure 3) by comparing a probability that the words in the cluster have an SCF s j and a probability that the words in the cluster do not have the SCF s I next address the way to determine the number of clusters and initial centroids. In this study, I assume that the most of the possible set of SCFs for words are included in the lexicon of the target grammar, and make use of the existing sets of When the lexicon is less accurate, I can determine the number of clusters using other algorithms (Hamerly, 2003). SCFs for the words in the lexicon to determine the number of clusters and initial centroids. I first extract SCF confidence-value vectors from the lexicon of the grammar. By eliminating duplications from them and regarding e =0 in Equation 6, I obtain initial centroids c m . I then initialize the number of clusters k to the number of c m .</Paragraph> <Paragraph position="8"> I finally update the acquired SCFs using the obtained clusters and the confidence values of SCFs in this order. I call the following procedure centroid cut-off t when the confidence values are estimated under the recognition threshold t. Since from the resulting SCFs according to their confidence values con f ij .</Paragraph> <Paragraph position="9"> In the following, I compare centroid cut-off with frequency cut-off and confidence cut-off t, which use relative frequencies and confidence values calculated under the recognition threshold t, respectively. Note that these cut-offs use only corpus-based statistics to eliminate SCFs.</Paragraph> </Section> </Section> class="xml-element"></Paper>