File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/94/a94-1012_concl.xml
Size: 3,252 bytes
Last Modified: 2025-10-06 13:57:07
<?xml version="1.0" standalone="yes"?> <Paper uid="A94-1012"> <Title>Combination of Symbolic and Statistical Approaches for Grammatical Knowledge Acquisition</Title> <Section position="7" start_page="75" end_page="75" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> The statistical analysis discussed in this paper is based on the assumption that types of linguistic knowledge to be acquired are: \[1\] Knowledge for syntactic constructions which is used frequently in the given sublanguage.</Paragraph> <Paragraph position="1"> \[2\] Lexical knowledge such as subcategorization frames and number properties, which is often idiosyncratic to the given sublanguage.</Paragraph> <Paragraph position="2"> \[3\] Knowledge which belongs neither to \[1\] nor to \[2\], but is indispensable to the given corpus.</Paragraph> <Paragraph position="3"> \[1\] implies that knowledge for less frequent constructions can be ignored at the initial stage of linguistic knowledge customization. Such knowledge will be discovered after major defects of the current grammar are rectified, because the GP of a generic hypothesis is defined as being sensitive to the frequency of the hypothesis.</Paragraph> <Paragraph position="4"> \[2\] means that we assume that the set of initially provided grammar rules has a comprehensive coverage of English basic expressions. This assumption is reflected in the way of the initial estimation of LP values. Also note that only when this assumption is satisfied, can the HG produce a reasonable set of hypotheses. On the other hand, because of this assumption, our framework can learn structurally complex and linguistically meaningful lexical descriptions, like a subcategorization frame.</Paragraph> <Paragraph position="5"> \[3\] is reflected in the way of the computation of GP values. A generic hypothesis one of whose instances occurs as a single possible hypothesis that can recover a parsing failure will have the GP value of 1, even though its frequency is very low.</Paragraph> <Paragraph position="6"> The computation mechanism of GP and LP bears a resemblance to the EM algorithm(Dempster et al., 1977; Brown et al., 1993), which iteratively computes maximum likelihood estimates from incomplete data. As the purpose of our statistical analysis is to choose &quot;correct&quot; hypotheses from a hypothesis set which contains unnatural hypotheses as well, our motivation is different from that of the EM algorithm. However, if we consider that the hypothesis deletion is maxmizing the plausibility of &quot;correct&quot; hypotheses, the computation procedures of both algorithms have a strong similarity.</Paragraph> <Paragraph position="7"> The grammatical knowledge acquisition method proposed in this paper will be incorporated into the tool kit for linguistic knowledge customization which we are now developing. In the practical use of our method, a grammar maintainer will be shown a list of hypotheses with high GP values and renew the current version of grammatical knowledge. The renewed knowledge will be used in the next cycle of hypothesis generation and selection to achieve the gradual enlargement of linguistic knowledge.</Paragraph> </Section> class="xml-element"></Paper>