XML Viewer - c00-2105

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/c00-2105_evalu.xml
Size: 9,281 bytes
Last Modified: 2025-10-06 13:58:32
<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2105">
  <Title>Robust German Noun Chunking With a Probabilistic Context-Free Grammar</Title>
  <Section position="5" start_page="728" end_page="730" type="evalu">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> We performed two main chunking experiments, hfitially, the parser trained the chunk grammar based on the restricted grmnmar described in section 2 according to tbur different training strategies. A preferred training strategy was then applied to investigate the potential of grammar refim;ment and extended training data.</Paragraph>
    <Section position="1" start_page="728" end_page="729" type="sub_section">
      <SectionTitle>
4.1 Training
</SectionTitle>
      <Paragraph position="0"> Ill the frst exlmriment, the chunker version of the grmmnar was trained oil a corpus comprising a 1 million word subcortms of relative clauses, a 1 million word subeorpus of verb final clauses and 2 million words of consecutive text. All data had been extracted from the Huge German Corpus. The test data used for the later evahmtion was not included in the training corpus.</Paragraph>
      <Paragraph position="1"> For training strategy 1, the elmnker gralnmar was first; trained on the whole cortms in mflexiealised mode, i.e. like a PCFG. The tmrmneters were reestimated once in the middle and once at the end of the eorlms. In the next stel) , the grammar was lexicalised, i.e. the parser computed |;tie parse probabilities with the unlexicalised model, lint extracted Dequencies for the lexicalised model. These fl'equencies were summed over the. whole eorl)us. Three more iterations on the whole corpus tbllowed in which the parmneters of the lexicalised model were reestimate(t. null The parameters of the unlexicalised chunker grammar were initialised in the following way: a fl'equeney of 7500 was assigned to all original granunar rules and 0 to the majority of robustness rules. The parmneters were then estimated on the basis of these Dequencies. Because of the smoothing, the t)robabilities of the robustness rules were small lint not zero.</Paragraph>
      <Paragraph position="2"> For training strategy 2, the chunker rules were initialised with frequencies fl'om a grammar without robustness rule extensions, which had been trained mflexiealised on a 4 million subeortms of verb final clauses and a 4 million word subcorpus of relative clauses.</Paragraph>
      <Paragraph position="3"> Training strategy 3 again set the fi'equency of the original rules to 7500 and of tile robustness rules to 0. The parser trained with three unlexicalised iterations over the whole training corpus, reestimating the parameters only at the end of the corpus, ill of der to find out; whether the lexicalised probabilistic parser had been better than tile fully trained mflexicalised parser on the task of chunk parsing. Training strategy 4 repeated this procedure, but with initial- null ising the chunker frequencies on basis of a trained gramnlar.</Paragraph>
      <Paragraph position="4"> For each training strategy, further iterations were added until the precision and recall values ceased to improve.</Paragraph>
      <Paragraph position="5"> For the second part of the experiments, the base grammar was extended with a few simple verb-first and verb-second clause rules. Strategy 4 was applied for training the ehunker (A) on the same training corpus as betbre, i.e. 2 million words of relative and verb final clauses, and 2 million words of unrestricted corpus data from the HGC, (B) on a training corpus consisting of 10 million words of unrestricted corpus data from the HGC.</Paragraph>
    </Section>
    <Section position="2" start_page="729" end_page="729" type="sub_section">
      <SectionTitle>
4.2 Evaluation
</SectionTitle>
      <Paragraph position="0"> The evaluation of tile ctmnker was carried out on noun chunks from 378 unrestricted sentences from the German newspaper Frankfu~'ter Allgcmci~c Zeitun9 (FAZ). Two persons independently annotated all noun chunks in the corpus -a total of 2,140 noun chunks-, according to the noun chunk deftnition in section 2.2, without considering grammar coverage, i.e. noun chunks not actually covered by the grammar (e.g. noun chunk ellipsis such as die klcinc~ \[ \]N) were annotated as such. As labels, we used the identifier NC plus case information: NC. Nom, IqC. Ace, NC. Dat, NC.Gen. In addition, we included identifiers for prepositional phrases where the preposition is nlorphologically merged with the definite article, (el. example (6)), also including case information: PNC.Acc, PNC.Dat.</Paragraph>
      <Paragraph position="1"> For each training strategy described in section 4.1 we evaluated the chunker before the training process and after each training iteration: the model in its current training state parsed the test sentences and extracted the most probable clnmk sequence as defined in section 3. We then compared the extracted noun elmnks with tile haud-ammtated data, according to * the range of the chunks, i.e. (lid the chunker find a chunk at all? . the range and the identifier of the chunks, i.e.</Paragraph>
      <Paragraph position="2"> did the ehunker find a chunk and identify the correct syntactic category and case? Figures 1 and 2 display the results of the evaluation in tile first experiment, deg according to noun chunk range only and according to noun chunk range, syntactic category and case, respectively.</Paragraph>
      <Paragraph position="3"> Bold font highlights the best versions.</Paragraph>
      <Paragraph position="4"> Training strategy 2 with two iterations of lexicalised training produced tile best f-scores tbr noun  was done. The respective precision and recall values were 93.06% and 92.19%. For recognising noun chunks with range, category and case, the best; chunker version was created by training strategy 4, after five iterations of unlexicalised training; precision and recall values were 79.28% and 76.75%, respectively.</Paragraph>
      <Paragraph position="5"> From the experimental results, we can conclude that:  1. initialisation of the chunker grammar frequencies on the basis of a trained grammar improves the untrained version of the elumker, but the difference vanishes in the training process 2. unlexicalised parsing is sufficient for noun chunk  extraction; for extraction of chunks with case ilfformation, unlexicalised training turned out to be even more successflfl than a combination with lexicalised training Figures 3 and 4 display the results of the evaluation concerning the second experilnent, compared to the initial w, lues from the first experiinent. Extending the base grammar and the training corpus slightly increased precision and recall values for recognising noun chunks according to range only.</Paragraph>
      <Paragraph position="6"> The main inlprovement was ill noun chunk recognition according to range, category and case: precision and recall values increased to 83.88% and 83.21%, respectively.</Paragraph>
    </Section>
    <Section position="3" start_page="729" end_page="730" type="sub_section">
      <SectionTitle>
4.3 Failure Analysis
</SectionTitle>
      <Paragraph position="0"> A comparison of the parsed noun chunks with the mmotated data showed that failure in detecting a noun chunk was mainly caused by proper names, for exalnple Neta~j(E~,~t, abbreviations like OSZE, or composita like So~tth Ch, ina Mor,ti,tg Post. The diversity of proper names makes it difficult for tile chunker to learn them properly. On the one hand, the lexieal infornl~tion for proper names is unreliable because Inany proper nalnes were not reeognised as such. On the other hand, most prot)er names are too rare to learn reliable statistics tbr them.</Paragraph>
      <Paragraph position="1"> Minor mistakes were cruised by (a) articles which are morphologically identical to noun chunks consisting of a pronoun, for example den Rc,t,t~,e,'~ (tiLe pensionersd(,t,) was analysed as two noun clumks, dcTt (demonstrative pronoun) and Rent~t,e~'7t, (b) capital letter eonfnsion: since Gerinan nouns typically start with capital letters, sentence beginnings are wrongly interpreted as nouns, for example Wiirden as the conditional of the auxiliary wcrdc~ (to become) is interpreted as the dative case of Wib'dc (dignity), (e) noun chunk internal lumctuation as in seine ' Pa~'t,tcr' ' (his ' partners ').</Paragraph>
      <Paragraph position="2"> Failure in assigning the correct; syntactic category and case to a noun chunk was mainly caused by (a) assigning accusative case instead of nominative case, and (b) assigning dative case or nomina- null tive case instead of a(:cusal;ive (:an(;. 2}he confl,sion between l~ominal;ive an(t accusative case is due (;() 1;he facl; that both cases at(', (~xI)rc'ss(.'(t t)y i(lenl;ical m()ri)tlology in l;he f(;minine and neul;ra\] genders in G(n'man. The morl)hologi(&amp;quot; similarity l)ei:ween a(:(:u,;ative and (lative is less substantial, but esl)ecially prol)er names and bar(; nouns are sl;ill sul)jecl; 1;o (:(mfllsion. As |;lie evaluation resull;s show, the (lisLinction between 1;he cases could be learned in general, bul; morl)hological similaril;y and in addil;ion l;he relatively Dee word order in German impose high demure(Is on the n(;(:essary i)rot)a|)ilil;y model.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML