File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0314_intro.xml

Size: 3,074 bytes

Last Modified: 2025-10-06 14:06:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0314">
  <Title>Inducing Terminology for Lexical Acquisition</Title>
  <Section position="3" start_page="125" end_page="126" type="intro">
    <SectionTitle>
2 Terminology and Lexical
</SectionTitle>
    <Paragraph position="0"> Acquisition.</Paragraph>
    <Paragraph position="1"> In this framework, a term is more than a token or word (to be searched for) as it stands in a more subtle relation with a piece of information in a specific knowledge domain. It is a concept, as it requires a larger number of constraints on the information to be searched for in texts. Furthermore a term conveys a well assessed (usually complex) meaning as long as a user community agrees on its content. As long as we are interested in automatic terminology derivation, we can look at terms as surface canonical forms of (possibly structured) expressions indicating those contents.</Paragraph>
    <Paragraph position="2"> A term is thus characterized by a general commitment about it and this has some effects on its usage. Distributional properties of complex terms (nominals) differ significantly on those of their basic elements. Deviance from usual distributional behavior of single components can be used both as marker of non compositionality and specific hints of domain relevance. The detection of complex terms  assumes a crucial role in improving robust parsing and POS tagging for lexical acquisition, thus supporting a more precise induction of lexical properties (e.g. PP disambiguation rules). This specific view extends and generalizes the classical notion of terminology as used in Information Science.</Paragraph>
    <Paragraph position="3"> Most of the domain specific terms we are interested to are nouns or noun phrases that generally denote concepts in a knowledge domain. In order to approach the problem of terminological induction we thus need:  1. to extract surface forms that are possible candidates as concept markers; 2. to decide which of those candidates are actu null ally concepts within a given knowledge domain, identified by the set of analyzed texts.</Paragraph>
    <Paragraph position="4"> Linguistic principles characterize classes of surface forms as potential terms (step 1). Note that the notion of terminological legal expression here is not equivalent to that of legal noun phrases. Concepts are lexicalized in surface forms via a set of operations that imply semantic specifications. The way syntax operates such specification may be very complex and independent on the notion of grammatical well formedness.</Paragraph>
    <Paragraph position="5"> The decision in step (2) is again sensible to a principled way a language expresses concept specifications but needs also to be specific to the given knowledge domain, i.e. to the underlying sublanguage. Given the body of texts, the selective extraction should be sensitive to the different observed information. In this phase statistics is crucial to control the relevance of linguistically plausible forms of all the guessed terms.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML