File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2935_concl.xml

Size: 1,819 bytes

Last Modified: 2025-10-06 13:55:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2935">
  <Title>Language Independent Probabilistic Context-Free Parsing Bolstered by Machine Learning</Title>
  <Section position="7" start_page="234" end_page="234" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have presented a general approach to parsing arbitrary languages based on dependency treebanks that uses a minimum overhead of language-specific information and nevertheless supplies competitive results in some languages (Da, Du). Even better results can be reached if POS tag classifications are used in the categories that are optimized for specific languages (Ge). Markovization usually brings an improvement of up to 2%, a higher gain is reached in Slovene (where many new rules occur in the testset) 2http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html and Chinese (which has the highest number of dependency relations). Comparable results in the literature are Schiehlen's (2004) 81.03% dependency f-score reached on the German NEGRA treebank and Collins et al.'s (1999) 80.0% labelled accuracy on the Czech PDT treebank. Collins (1999) used a lexicalized approach, Schiehlen (2004) used the manually annotated phrasal categories of the treebank.</Paragraph>
    <Paragraph position="1"> Our second result is that context-free parsing can also boost the performance of a simple taggerlike machine learning system. While a maximum-entropy learner on its own achieves competitive results for only three languages (Ar, Po, Sl), competitive results in basically all languages are produced with access to the results of the probabilistic parser.</Paragraph>
    <Paragraph position="2"> Thanks go to Helmut Schmid for providing support with his parser and the Markovization script.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML