File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2113_intro.xml

Size: 2,942 bytes

Last Modified: 2025-10-06 14:00:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2113">
  <Title>An empirical lnethod for identifying and translating technical terminology</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Th.ere has been a. continuous interest in corpus-based approa.ches which retrieve words and expressions in connection with a specific domain (we call them technical terms herea.fter). They may correspond to syntactic phra.ses or components of syntactic relationships and ha.ve been found useful in various application area.s, including inibrmation extra.ction, text sumlna.riza.tion, and ma.chine tra.nsla.tion. Am.ong others, a. knowledge of technica\] terminology is indispensa.ble for machine tra.nsla.tion beca.use usage and mea.ning of technica.1 terms a.re often quite different from their literal interpreta.tion.</Paragraph>
    <Paragraph position="1"> One a.pproa.ch for identifying technical terminology is a. rule-ba.sed a.pproa.eh which learns l.oca.1 syntactic patterns from a training corpus. A variety of methods ha.ve been developed within this fra.mework, (Ra.msha.w, 1995) (Arga.mon et al., 1999) (Ca.rdie and Pierce, 1.999) a.nd achieved good results for the considered ta.sk.</Paragraph>
    <Paragraph position="2"> Surprisingly, though, little work ha.s been d.evoted to lea.rning local syntactic pa.tterns besides noun phrases. Another drawback of this a.pproach is tha.t it requires substa.ntiM training corpora, in many cases with pa.rt-of-speech tags.</Paragraph>
    <Paragraph position="3"> An. alternative approa.ch is a. statistical one which retrieves recurrent word sequences as co\]loca.tiolls (Sma.dja., 1993)(Ha.runo et a.1., 1996)(Shimolla.ta et a.1., :1997). This a.pproach is robust and pra.ctical because it uses t)lain text corpora, without a.ny inibrmation dependent on a la.ngua.ge. Unlike the former N)proa.ch, this a.pproach extra.cts va.rious types of local pa.tterns a.t the same time. Therefore, post-processing, such as part of speech ta.gging and syntactic category identifica.tion, is necessary when we a.pply them to NLP applica.tions.</Paragraph>
    <Paragraph position="4"> This pa.per presents a. method for identifying technicM terms froni a. corpus and a.pl)lying them to a. ma.chine tra.nsla.tion system. The proposed method retrieves local pa.tterns by utilizing the n-gram statistics a.nd identifies their syntactic categories with. simple pa.rt-ofspeech teml)la.tes. We ma.ke 3. ma.chine trans\]a.tion dictiona.ry from the retrieved patterns and tra.nslate documents in the Sa.lne doma.in a.s the original corpus.</Paragraph>
    <Paragraph position="5"> In the next section, we briefly describe a pa.ttern-based machine translation. The following section explains how th.e proposed method works in detail. We th.en present experimenta.l results a.nd conclude with a discussion.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML