File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/98/p98-1048_relat.xml

Size: 3,406 bytes

Last Modified: 2025-10-06 14:16:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1048">
  <Title>Sylvain_Delisle @uqtr.uquebec.ca</Title>
  <Section position="5" start_page="311" end_page="312" type="relat">
    <SectionTitle>
5 Related Work
</SectionTitle>
    <Paragraph position="0"> There have been successful attempts at using machine learning in search of a solution for linguistic tasks, e.g. discriminating between discourse and sentential senses of cues (\[Litman 1996\]) or resolution of coreferences in texts (\[McCarthy &amp; Lehnert 1995\]). Like our work, these problems are cast as classification problems, and then machine learning (mainly C4.5) techniques are used to induce classifiers for each class. What makes &amp;quot;these applications different from ours is that they have worked on surface linguistic or mixed surface linguistic and intonational representation, and that the classes are relatively balanced, while in our case the class of compound sentences is much less numerous than the class of non-composite sentences. Such unbalanced classes create problems for the majority of inductive learning systems. A distinctive feature of our work is the fact that we used machine learning techniques to improve an existing rule-based natural language processor from the inside. This contrasts with approaches where there are essentially no explicit rules, such as neural networks (e.g. \[Buo 1996\]), or approaches where the machine learning algorithms attempt to infer--via deduction (e.g. \[Samuelsson 1994\]), induction (e.g. \[Theeramunkong et al.</Paragraph>
    <Paragraph position="1"> 1997\]; \[Zelle &amp; Mooney 1994\]) under user cooperation (e.g. \[Simmons &amp; Yu 1992\]; \[Hermjakob &amp; Mooney 1997\]), transformation-based error-driven learning (e.g. \[Brill 1993\]), or even decision trees (e.g. \[Magerman 1995\])--a grammar from raw or preprocessed data. In our work, we do not wish to acquire a grammar: we have one and want to devise a mechanism to make some of its parts adaptable to the corpus at hand or, to improve some aspect of its performance. Other researchers, such as \[Lawrence et al. 1996\], have compared neural networks and machine learning methods at the task of sentence classification. In this task, the system must classify a string as either grammatical or not. We do not content ourselves with results based on a grammatical/ungrammatical dichotomy.</Paragraph>
    <Paragraph position="2"> We are looking for heuristics, using relevant features, that will do better than the current ones and improve the overall performance of a natural language processor: this is a very difficult problem (see, e.g., \[Huyck &amp; Lytinen 1993\]). One could also look at this problem as one of optimisation of a rule-based system.</Paragraph>
    <Paragraph position="3"> Work somewhat related to ours was conducted by \[Samuelsson 1994\] who used explanation-based generalisation to extract a subset of a grammar that would parse a given corpus faster than the original, larger grammar \[Neumann 1997\] also used EBL but for a generation task. In our case, we are not looking for a subset of the existing rules but, rather, we are looking for brand new rules that would replace and outperform the existing rules. We should also mention the work of \[Soderland 1997\] who also worked on the comparison of automatically learned and hand-crafted rules for text analysis.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML