File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/p04-2009_concl.xml
Size: 2,361 bytes
Last Modified: 2025-10-06 13:54:09
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-2009"> <Title>Robust VPE detection using Automatically Parsed Text</Title> <Section position="8" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Conclusion and Future work </SectionTitle> <Paragraph position="0"> This paper has presented a robust system for VPE detection. The data is automatically tagged and parsed, syntactic features are extracted and machine learning is used to classify instances. Three different machine learning algorithms, Memory Based Learning, GIS-based and L-BFGS-based maximum entropy modeling are used. They give similar results, with L-BFGS-MaxEnt generally giving the highest performance. Two parsers were used, Charniak's and RASP, achieving similar results. null To summarise the findings : + Using the BNC, which is tagged with a complex tagging scheme but has no parse data, it is possible to get 76% F1 using lexical forms and POS data alone + Using the Treebank, the coarser tagging scheme reduces performance to 67%.</Paragraph> <Paragraph position="1"> Adding extra features, including sentence-level ones, raises this to 74%. Adding empty category information gives 88%, compared to previous results of 48% (Hardt, 1997) + Re-parsing the Treebank data , top performance is 63%, raised to 68% using extra features null + Parsing the BNC, top performance is 71%, raised to 72% using extra features + Combining the parsed data, top performance is 67%, raised to 71% using extra features The results demonstrate that the method can be applied to practical tasks using free text. Next, we will experiment with an algorithm (Johnson, 2002) that can insert empty-category information into data from Charniak's parser, allowing replication of features that need this. Cross-validation experiments will be performed to negate the effects the small test set may cause.</Paragraph> <Paragraph position="2"> As machine learning is used to combine various features, this method can be extended to other forms of ellipsis, and other languages. However, a number of the features used are specific to English VPE, and would have to be adapted to such cases. It is difficult to extrapolate how successful such approaches would be based on current work, but it can be expected that they would be feasible, albeit with lower performance.</Paragraph> </Section> class="xml-element"></Paper>