File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-1805_concl.xml

Size: 2,183 bytes

Last Modified: 2025-10-06 13:53:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1805">
  <Title>Categorical Ambiguity and Information Content A Corpus-based Study of Chinese</Title>
  <Section position="3" start_page="0" end_page="0" type="concl">
    <SectionTitle>
4. Conclusion
</SectionTitle>
    <Paragraph position="0"> In this paper, we propose an information-based measure for ambiguity in Chinese. The measurement compliments the more familiar distributional data and allows us to investigate directly the categorical information content of each lexical word. We showed in this paper that degree of ambiguity indeed correlates with the number of possible categories of that word. However, degree of ambiguity of a word does not correlates with its frequency, although its tendency to be categorically ambiguous is dependent on frequency.</Paragraph>
    <Paragraph position="1"> The above findings have very important implications for theories and applications in language processing. In terms of representation of linguistic knowledge, it underlines the arbitrariness of the encoding of lexical information, following Saussure. In terms of processing model and empirical prediction, it suggests a model not unlike the theory of unpredictability in physics. Each word is like an electron. While the behavior of a group of words can be accurately predicted by stochastic model, the behavior of any single word is not predictable. In terms of linguistic theory, this is because there are too many rules that may apply to each lexical item at different time and on different levels, hence we cannot predict exactly how these rules the results without knows exactly which ones applied and in what order.</Paragraph>
    <Paragraph position="2"> This view is compatible with the Lexical Diffusion (Wang 1969) view on application of linguistic rules.</Paragraph>
    <Paragraph position="3"> In NLP, this clearly predicts the performance ceiling of stochastic approaches.</Paragraph>
    <Paragraph position="4"> As well as that the ceiling can be surpassed by hybriding with specific lexical heuristic rules covering the 'hard' cases for stochastic approaches, as suggested in Huang et al. (2002).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML