File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-3009_intro.xml

Size: 5,386 bytes

Last Modified: 2025-10-06 14:03:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-3009">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Integrated Morphological and Syntactic Disambiguation for Modern Hebrew</Title>
  <Section position="4" start_page="0" end_page="50" type="intro">
    <SectionTitle>
2 Linguistic Data
</SectionTitle>
    <Paragraph position="0"> Phrases and sentences in MH, as well as Arabic and other Semitic languages, have a relatively free word order.2 In gure 1, for example, two distinct syntactic structures express the same grammatical relations. It is typically morphological information rather than word order that provides cues for structural dependencies (e.g., agreement on gender and number in gure 1 reveals the subject-predicate dependency).</Paragraph>
    <Paragraph position="1">  (marking word boundaries with ' ') Furthermore, boundaries of constituents in the syntactic structure of MH sentences need not coincide with word boundaries, as illustrated in gure 2. A MH word may coincide with a single constituent, as in 'ica'3 (go out), it may overlap with an entire phrase, as in 'h ild' (the boy), or it may span across phrases as in 'w kf m h bit' (and when from the house). Therefore, we conclude that in order to perform syntactic analysis (parsing) of MH sentences, we must rst identify the morphological constituents that form MH words.</Paragraph>
    <Paragraph position="2"> There are (at least) three distinct morphological processes in Semitic languages that play a role in word formation. Derivational morphology is a non-concatenative process in which verbs, nouns, and adjectives are derived from (tri-)consonantal roots plugged into templates of consonant/vowel skeletons. The word-forms in table 1, for example, are all derived from the same root, [i][l][d] (child, birth), plugged into different templates. In addition, MH has a rich array of agreement features, such as gender, number and person, expressed in the word's inflectional morphology. Verbs, adjectives, determiners and numerals must agree on the in ectional features with the noun they comple3We adopt the transliteration of (Sima'an et al., 2001). a. 'ild' b. 'iild' c. 'mwld' [i]e[l]e[d] [i]i[l](l)e[d] mw[][l](l)a[d] child deliver a child innate  mark templates' slots for consonantal roots, (..) mark obligatory doubling of roots' consonants.) a. ild gdwl b. ildh gdwlh child.MS big.MS child.FS big.FS a big boy a big girl</Paragraph>
    <Paragraph position="4"> ment or modify. It can be seen in table 2 that the suf x h alters the noun 'ild' (child) as well as its modi er 'gdwl' (big) to feminine gender. Finally, particles that are pre xed to the word may serve different syntactic functions, yet a multiplicity of them may be concatenated together with the stem to form a single word. The word 'wkfmhbit' in gure 2, for instance, is formed from a conjunction w (and), a relativizer kf (when), a preposition m (from), a de nite article h (the) and a noun bit (house). Identifying such particles is crucial for analyzing syntactic structures as they reveal structural dependencies such as subordinate clauses, adjuncts, and prepositional phrase attachments.</Paragraph>
    <Paragraph position="5"> At the same time, MH exhibits a large-scale ambiguity already at the word level, which means that there are multiple ways in which a word can be broken down to its constituent morphemes. This is further complicated by the fact that most vocalization marks (diacritics) are omitted in MH texts. To illustrate, table 3 lists two segmentation possibilities, four readings, and ve meanings of different morphological analyses for the word-form 'fmnh'.4 Yet, the morphological analysis of a word-form, and in particular its morphological segmentation, cannot be disambiguated without reference to context, and various morphological features of syntactically related forms provide useful hints for morphological disambiguation. Figure 3 shows the correct analyses of the form 'fmnh' in different syntactic contexts. Note that the correct analyses maintain agreement on gender and number between the noun and its modi er. In particular, the analysis 'that counted' (b) 4A statistical study on a MH corpus has shown that the average number of possible analyses per word-form was 2.1, while 55% of the word-forms were morphologically ambiguous (Sima'an et al., 2001).</Paragraph>
    <Paragraph position="6">  'fmnh' 'fmnh' 'fmnh' 'fmnh' 'f + mnh' shmena shamna shimna shimna she + mana fat.FS got-fat.FS put-oil.FS oil-of.FS that + counted fat (adj) got fat (v) put-oil (v) her oil (n) that (rel) counted (v)</Paragraph>
    <Section position="1" start_page="50" end_page="50" type="sub_section">
      <SectionTitle>
tactic Contexts
</SectionTitle>
      <Paragraph position="0"> is easily disambiguated, as it is the only one maintaining agreement with the modi ed noun.</Paragraph>
      <Paragraph position="1"> In light of the above, we would want to conclude that syntactic processing must precede morphological analysis; however, this would contradict our previous conclusion. For this reason, independent morphological and syntactic analyzers for MH will not suf ce. We suggest performing morphological and syntactic processing of MH utterances in a single, integrated, framework, thereby allowing shared information to support disambiguation in multiple tasks.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML