File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/j95-3004_intro.xml

Size: 4,853 bytes

Last Modified: 2025-10-06 14:05:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="J95-3004">
  <Title>Alon Itait Technion Uzzi Ornan t Technion</Title>
  <Section position="4" start_page="385" end_page="387" type="intro">
    <SectionTitle>
3. Former Approaches
</SectionTitle>
    <Paragraph position="0"> Eliminating or reducing the ambiguity at this early stage of automatic processing of Hebrew is crucial for the efficiency and the success rate of parsers and other natural language applications. It should be noted that the morphological ambiguity in Hebrew makes even &amp;quot;simple&amp;quot; applications--as is often considered when dealing with other languages--complicated.</Paragraph>
    <Paragraph position="1">  Moshe Levinger et al. Learning Morpho-Lexical Probabilities One good example for this is full-text retrieval systems (Choueka 1980). Such systems must handle the morphological ambiguity problem. To see that, consider, for example, the case where we look for all the texts with the word HQPH ('encirclement'). Without morphological disambiguation, we get many texts which really include the word H+QPH ('the coffee'), or even HQP+H ('her perimeter') (Ornan 1987). Another application which is more difficult in Hebrew than in other languages is text-to-speech systems, which cannot be implemented in Hebrew without first solving the morphological ambiguity, since in many cases different analyses of a word imply different pronunciations. A much simpler problem occurs in English, where for some words the correct syntactic tag is necessary for pronunciation (Church 1988).</Paragraph>
    <Paragraph position="2"> The notion that this ambiguity problem in Hebrew is very complicated and that it can be dealt with only by using vast syntactic and semantic knowledge has led researchers to look for solutions involving a considerable amount of human interaction. Ornan (1986) for instance, developed a new writing system for Hebrew, called 'The Phonemic Script.' This script enables the user to write Hebrew texts that are morphologically unambiguous, in order to use them later as an input for various kinds of natural language applications. However, since regular Hebrew texts are not written in this script, they first must be transcribed to phonemic texts. Choueka and Lusignan (1985) presented a system for the morphological tagging of large texts that is based on the short context of the word but also depends heavily on human interaction. Methods using the short context of a word in order to resolve ambiguity (usually categorical ambiguity) are very common in English and other languages (DeRose 1988; Church 1988; Karlsson 1990). A system using this approach was developed by Levinger and Ornan in order to serve as a component in their project of morphological disambiguation in Hebrew (Levinger 1992). The main resource, used by this system for disambiguation, is a set of syntactic constraints that were defined manually by the authors and followed two theoretical works that defined short context rules for Hebrew (Pines 1975; Albeck 1992). The syntactic constraints approach, which is an extension of the short context approach, was found to be useful and reliable, but its applicability (based on the proportion of ambiguous words that were fully disambiguated) was very poor. Hence, the overall performance of this system is much less promising in Hebrew than in other languages. These results can be explained by the following properties of the ambiguity problem in Hebrew:  In many cases two or more alternative analyses share the same category, and hence these alternatives satisfy the same syntactic constraints.</Paragraph>
    <Paragraph position="3"> Moreover, there are cases where two or even more analyses share exactly the same morphological attributes and differ only in their lexical entry.</Paragraph>
    <Paragraph position="4"> For instance, the word XLW (~n) has two such morphological analyses: The verb XLH (n~n), fem./masc., plural, third person, past tense ('they became ill').</Paragraph>
    <Paragraph position="5"> The verb XL (Vn), fem./masc., plural, third person, past tense ('they occurred').</Paragraph>
    <Paragraph position="6"> The short context constraints use unambiguous anchors that are often function words such as determiners and prepositions. In English most such function words are unambiguous. In Hebrew, these words are almost always morphologically ambiguous. Moreover, many of them appear as prefixes of the word to be analyzed, and their identification is part of the morphological analysis. We thus have a circularity problem: In order to perform the morphological analysis, we need the short context,  Computational Linguistics Volume 21, Number 3 to identify the short context, we have to find anchors, but in order to find such words, we need first to perform the morphological analysis.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML