File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1087_intro.xml
Size: 2,068 bytes
Last Modified: 2025-10-06 14:03:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1087"> <Title>Noun Phrase Chunking in Hebrew Influence of Lexical and Morphological Features</Title> <Section position="4" start_page="0" end_page="689" type="intro"> <SectionTitle> 2 Previous Work </SectionTitle> <Paragraph position="0"> Text chunking (and NP chunking in particular), first proposed by Abney (1991), is a well studied problem for English. The CoNLL2000 shared task (Tjong Kim Sang et al., 2000) was general chunking. The best result achieved for the shared task data was by Zhang et al (2002), who achieved NP chunking results of 94.39% precision, 94.37% recall and 94.38 F-measure using a generalized Winnow algorithm, and enhancing the feature set with the output of a dependency parser. Kudo and Matsumoto (2000) used an SVM based algorithm, and achieved NP chunking results of 93.72% precision, 94.02% recall and 93.87 F-measure for the same shared task data, using only the words and their PoS tags.</Paragraph> <Paragraph position="1"> Similar results were obtained using Conditional Random Fields on similar features (Sha and Pereira, 2003).</Paragraph> <Paragraph position="2"> The NP chunks in the shared task data are base-NP chunks - which are non-recursive NPs, a definition first proposed by Ramshaw and Marcus (1995). This definition yields good NP chunks for English, but results in very short and uninformative chunks for Hebrew (and probably other Semitic languages).</Paragraph> <Paragraph position="3"> Recently, Diab et al (2004) used SVM based approach for Arabic text chunking. Their chunks data was derived from the LDC Arabic TreeBank using the same program that extracted the chunks for the shared task. They used the same features as Kudo and Matsumoto (2000), and achieved over-all chunking performance of 92.06% precision, 92.09% recall and 92.08 F-measure (The results for NP chunks alone were not reported). Since Arabic syntax is quite similar to Hebrew, we expect that the issues reported below apply to Arabic results as well.</Paragraph> </Section> class="xml-element"></Paper>