File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-2134_intro.xml
Size: 2,547 bytes
Last Modified: 2025-10-06 14:05:42
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-2134"> <Title>Hypothesis Selection in Grammar Acquisition</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Reusability of existing linguistic knowledge is the most import,~mt requirement for the rapid development of pra.ctical nal, ural \]augtlage l)rocessing systems. In order to realize, automatic customizatiou of existing linguistic knowledge to each applicat;ion domain, we proposed a new approach of linguistic knowledge acquisition, which is a combination of symbolic and statistical approaches \[Kiyono and Tsujii, 1993\].</Paragraph> <Paragraph position="1"> The fi:amework of our al)proach is shown in Figure 1.</Paragraph> <Paragraph position="2"> '1'111&quot;. acquisilion flow starl;s with executing the l)arse of each sentence in a corpus. If parsing lhiled, |,he 'tiypoi;hesis Generator' produces the hyl)otheses of additional gramnu~tical knowh;dge, each of which could recover t;lle incompleteness of the existing grammar After iterating t;his hypothesis generation process for all the senten('es in the corpus, the hypotheses are passed to the statistical analysis procc.ss and finally plausible hypotheses are chosen as new knowh'.dge by observing statistical properties of tile hypotheses.</Paragraph> <Paragraph position="3"> Unlike robusl; parsing \[Mellish, 1989; Goeser, 1992; l)ouglas and I)ale, 1 !)92\] or nou-statisl.ical alll)roach for grallunar a(:lluisil;ioll , our al/proach does Ilol; require a mechanism to detect tile cause of the parsing fail-ure in the sentencial analysis phase and therefore the 'Ilypothesis (~eneral;or' may output ;111 I,he possible hypotheses, l\[owever, the greater part 1)t' hypotheses generated by a simple deductive mechanism are unnatural revisions of the e.xisting grammar. For example, even ~ rule which derives a tot) node category ,9 direcl,ly from the input string of words might be hypol,hesize(I.</Paragraph> <Paragraph position="4"> *a.lso a staff member ol7 Mattsus|dt~t Electric lndustri~d Linguistically unnatural hypqtheses have harnfful et: fects on lille lbllowing corpus-based process, not only making the process inefficient but, also int, erl'ering wil, ll statistical dnl, a as noise. In this paper, some techniques to remove such inadequate hytml, heses are proposed and the results of exl)eriments which show the efl'ecl.iwme.ss of the proposed techniques are also discussed.</Paragraph> </Section> class="xml-element"></Paper>