File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0502_metho.xml
Size: 22,039 bytes
Last Modified: 2025-10-06 14:08:02
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0502"> <Title>Generating Hebrew verb morphology by default inheritance hierarchies</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 The pi'el verb a0a2a1a4a3 </SectionTitle> <Paragraph position="0"> The purpose of the KATR theory described here is to generate perfect and imperfect forms of strong verbs belonging to various binyanim in Hebrew. In particular, given a verbal lexeme L and a sequence a5 of morphosyntactic properties appropriate for verbs, the theory evaluates the pairing of L with a5 as an inflected verb form. For instance, it evaluates the pairing of the lexeme a6a8a7a10a9 &quot;speak&quot; with the property sequence <perfect 3 sg masc> as the verb form a6a11a7a12a13 a9a15a14 &quot;he spoke&quot;.</Paragraph> <Paragraph position="1"> A theory in KATR is a network of nodes; the network of nodes constituting our verb morphology theory is represented in Figure 1. The overarching organizational principle in this network is hierarchical: The tree structure's terminal nodes represent individual verbal lexemes, and each of the nonterminal nodes in the tree defines default properties shared by the lexemes that it dominates. The status of the boxed nodes is taken up below.</Paragraph> <Paragraph position="2"> Each of the nodes in a theory houses a set of rules.</Paragraph> <Paragraph position="3"> We represent the verb a6a11a7a16a9 by a node: Speak:</Paragraph> <Paragraph position="5"> The node is named Speak, and it has two rules, terminated by a single dot. Our convention is to name the node for a verb by a capitalized English word representing its meaning. We use KATR-style comments (starting with % and continuing to the end of the line) to number the rules so we can refer to them easily.</Paragraph> <Paragraph position="6"> Rule 1 says that a query asking for the root of this verb should produce a three-atom result containing a9 , a7 , and a6 . Our rules assemble Hebrew words in logical order, which appears in this document as leftto-right. We accomplish reversal by rules in a RE-VERSE node, not shown in this paper.</Paragraph> <Paragraph position="7"> Rule 2 says that all other queries are to be referred to the PIEL node, which we introduce below.</Paragraph> <Paragraph position="8"> A query is a list of atoms, such as <root> or <vowel2 perfect 3 sg masc>; in our theory, the atoms generally represent form categories (such as root, binyanprefix, vowel1, cons2), morphosyntactic properties (such as perfect, sg, fem) or specific Hebrew characters.</Paragraph> <Paragraph position="9"> Queries are directed to a particular node. The query directed to a given node is matched against all the rules housed at that node. A rule matches if all the atoms on its left-hand side match the atoms in the query. A rule can match even if its atoms do not exhaust the entire query. In the case of Speak, a query <root perfect> would match both rules, but not a rule begining with <spelling>. When several rules match, KATR picks the best match, that is, the one whose left-hand side &quot;uses up&quot; the most of the query. This algorithm means that Rule 2 of Speak is only used when Rule 1 does not apply, because Rule 1 is always a better match if it applies at all. Rule 2 is called a default rule, because it applies by default if no other rule applies. Default rules define a hierarchical relation among some of the nodes in a KATR theory; thus, in the tree structure depicted in Figure 1, node X dominates node Y iff Y houses a default rule that refers queries to X. KATR generates output based on queries directed to nodes representing individual words. Since these nodes, such as Speak, are not referred to by other nodes, they are called leaves, as opposed to nodes like PIEL, which are called internal nodes.</Paragraph> <Paragraph position="10"> Here is the output that KATR generates for the Speak node and various queries.</Paragraph> <Paragraph position="11"> Our theory represents Hebrew characters and vowels in Unicode characters (Daniels, 1993). We use ' to indicate the accented syllable if it is not the ultima, and we mark shewa na bya47 .</Paragraph> <Paragraph position="12"> The rule for Speak illustrates one of the strategies upon which we build KATR theories: A node representing a category (here, a particular verb) may provide information (here, the letters of the verb's root) needed by more general nodes (here,PIELand the nodes to which it, in turn, refers). We refer to this strategy as priming. As we see below, rules in the more general nodes refer to primed information by means of quoted queries.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 The PIEL node </SectionTitle> <Paragraph position="0"> We now turn to thePIEL node, to which theSpeak node refers.</Paragraph> <Paragraph position="1"> PIEL:</Paragraph> <Paragraph position="3"> As with the Speak node, PIEL defers most queries to its parent, in this case the node called VERB, as Rule 1 indicates.</Paragraph> <Paragraph position="4"> Rule 2 modifies a default that VERB will use, namely, the nature of the second consonant of the root. Pi'el verbs double their second consonant by applying a dagesh. This rule exemplifies a second strategy of KATR theories: A node representing a specific category (here, pi'el verbs) may override information (here, the nature of the second consonant) that is assumed by more general nodes (here, VERB and the nodes to which it, in turn, refers). We refer to this strategy as overriding. Rule 2 is an overriding rule because the value it assigns to the sequence <cons2> is distinct from the value assigned at the VERBnode to whichPIELrefers queries by default.</Paragraph> <Paragraph position="5"> We momentarily defer discussing the strange right-hand side of this rule.</Paragraph> <Paragraph position="6"> The other rules in PIEL are all priming rules.</Paragraph> <Paragraph position="7"> Instead of using angle brackets (&quot;<&quot; and &quot;>&quot;) to match queries, they use braces (&quot;a50 &quot; and &quot;a51 &quot;). This syntax causes the left-hand side of a rule to be treated as a set instead of an ordered list. The rule whose left-hand side is a50 binyanprefix perfecta51 matches any query containing both the atom binyanprefix and the atom perfect, in any order. As before, more than one rule can match a given query, and the rule with the most comprehensive match is chosen. If there are equally good best rules, the KATR theory is considered malformed.</Paragraph> <Paragraph position="8"> In formulating Rules 3a52 5, we assume a distinction between binyan prefixes (specific to particular binyanim) and the personal prefixes (which cross-cut the various binyanim); thus, the form a6a8a7a12a13 a9a31 a43a27 &quot;we will speak&quot; contains the binyan prefix a27 and the personal prefix a43 .</Paragraph> <Paragraph position="9"> An empty right-hand side in a rule means that the result of a matching query is the empty string. In particular, Rule 3,</Paragraph> <Paragraph position="11"> indicates that there is no binyan prefix for pi'el verbs in the perfect form, in contrast to, for instance, hif'il verbs. The next two rules indicate the binyan prefix for a pi'el verb's imperfect forms. By Rule 4, this prefix is generally shewa ( a27); but because the personal prefix a44 cannot co-occur with the binyan prefix shewa, Rule 5 specifies a different binyan prefix for a pi'el verb's first-person singular imperfect form.</Paragraph> <Paragraph position="12"> (We can adjust the combination a44a27 to a44a27a31 as a postprocessing step instead, as we show later when we treat guttural letters.) Every form of a verb separates the three letters of the root by two vowels, which we call vowel1 and vowel2. The pi'el is characterized by the fact that in the imperfect, these vowels are the patah. (by Rule 7) and the tseyre (by Rule 10), as in a6a11a7a12a13 a9a31 a43a27 &quot;we will speak&quot;; in the perfect, they are instead generally the h. iriq (by Rule 6) and the patah. (by Rule 9), as in a37a12 a43a39a6a27 a7a12a31a53a9a15a14 &quot;we spoke&quot;. There is an exception in the perfect third singular masculine (a6a11a7a12a13 a9a21a14 ), as specified in Rule 8.</Paragraph> <Paragraph position="13"> Rules 5 and 8 are examples of a third strategy for building KATR theories: A rule may show an exception to a more general pattern introduced by another rule housed at the same node. For instance, Rule 8 establishes a special value forvowel2for one combination of person, number, and gender, supplanting the more typical value for vowel2 established for imperfect forms by Rule 9. We refer to this strategy as specializing.</Paragraph> <Paragraph position="14"> We now revisit the strange right-hand side of Rule 2. The term on its right-hand side is a node name (ROOT2), a colon, and new query to present to that node. The new query involves a quoted path, &quot;<root>&quot;. KATR treats quoted paths in this context as queries on the node from which we started, that is, Speak. In our case, the right-hand side of this rule is equivalent to ROOT2:<a9a54a7a55a6 >, because of the first rule in the Speak node.</Paragraph> <Paragraph position="15"> ROOT2 is one of a family of three nodes each of which isolates a particular consonant in a verb's triliteral root.</Paragraph> <Paragraph position="17"> The #vars declaration introduces a class of atoms: Hebrew consonant characters. Each of the three ROOT nodes has a single rule that matches a three-consonant sequence, assigning each member of the sequence a local number. The rule selects one of those consonants as the result.</Paragraph> <Paragraph position="18"> These three nodes follow a fourth strategy for writing KATR theories: A node may be invoked solely to provide information (here, a particular consonant in a verb's root) needed by other rules. We refer to this strategy as lookup. Lookup nodes (such as the boxed nodes in Figure 1) do not participate in the hierarchical relationships defined by the network's default rules.</Paragraph> <Paragraph position="19"> To demonstrate that the PIEL node characterizes its binyan, we present the somewhat simpler HOPHAL node as a point of comparison.</Paragraph> <Paragraph position="20"> HOPHAL:</Paragraph> <Paragraph position="22"/> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 The VERB node </SectionTitle> <Paragraph position="0"> Queries on Speak are generally reflected to its parent, PIEL, which then reflects them further to</Paragraph> <Paragraph position="2"> Rules 1a52 3 of VERB determine the three consonants of the root if they have not already been determined by earlier processing. In the case of pi'el verbs, <cons2> has been determined (by Rule 2 at the pi'el node), but the other consonants have not.</Paragraph> <Paragraph position="3"> That is, if we pose the querySpeak:<cons2>, the Speak node reflects it to the PIEL node, which resolves it. But the query Speak:<cons3> is not resolved by PIEL; it is reflected to VERB, which resolves it now by means of lookup.</Paragraph> <Paragraph position="4"> Rule 4 introduces a priming that is needed by the lookup node STEM: Usually, the shortened version of <vowel2> is the shewa. In one binyan, namely hif'il, the shortened version of<vowel2> is special and overrides this priming.</Paragraph> <Paragraph position="5"> Rule 5 is the most complicated. It exemplifies two more strategies of programming KATR theories: (1) Combining: It combines various pieces of morphology, namely those represented by the nodes VERBPREFIX, STEM, and VERBSUFFIX, each of which is referred to by VERB, and (2) Postprocessing: It presents the entire result of that combination to a postprocessing step represented by the node ACCENT.</Paragraph> <Paragraph position="6"> Combining works by invoking each of the nodes VERBPREFIX, STEM, and VERBSUFFIX with the query presented originally to Speak; such a query might be, for example, Speak:<imperfect sg 3 masc>. (The fact that no query list is explicitly presented to those nodes implies that KATR should use the original</Paragraph> <Paragraph position="8"> We choose not to include the vowel following the prefix as part of this node, but rather as part ofSTEM.</Paragraph> <Paragraph position="9"> Such decisions are common in cases of combining; it often makes little difference whether such &quot;boundary&quot; markers are placed at the end of one combining formative or the start of the next one.</Paragraph> <Paragraph position="10"> Rule 1 indicates that for all queries containing the atom perfect, there is no verb prefix. This single rule concisely covers many cases, which are implicitly included because the atoms pertaining to number, person, and gender are omitted. The other rules all apply to the imperfect tense. In the first and second person, the prefix is independent of gender, so the rules there are shorter, again concisely covering multiple cases with only a few rules.</Paragraph> <Paragraph position="11"> Suffixes have a similar node; here we choose to include the vowel that separates the suffix from the stem.</Paragraph> <Paragraph position="12"> VERBSUFFIX:</Paragraph> <Paragraph position="14"> Rules 1, 2, 6, and 15 include the @ character, which we use to indicate that the given syllable should not be accented. Hebrew words are generally accented on the ultima; we place @ on the ultima to force the accent to the penultima. Placing of accents is one of the jobs relegated to the postprocessing step.</Paragraph> <Paragraph position="15"> The left-hand side of rule 13 includes the symbol ++. This symbol tells KATR that even if another, seemingly better rule matches a query, this rule should take precedence if it matches. The situation arises for the query <imperfect pl 1 masc>, for instance. Both rules 13 and 14 match, but the former is preferred. The other way we could have represented this situation is by restricting rule 14 to 2nd or 3rd person, either by explicitly indicating these morphosyntactic properties or by adding the atom !1, which means &quot;not first person&quot;. We choose to use the disambiguator ++ in Rule 13 instead; in the terminology of (Stump, 2001), the ++ symbol identifies rules that apply in &quot;expanded mode&quot;. The most complex node defines the stem part of a verb.</Paragraph> <Paragraph position="16"> STEM:</Paragraph> <Paragraph position="18"> Rule 1 uses combining to assemble the parts of the stem, starting with the binyan prefix, then alternating all the consonants and vowels. Most of these parts are surrounded in quote marks, meaning that these elements are queries to be reflected back to the starting node, in our case, Speak. These queries percolate through Speak, PIEL, and VERB until a priming rule satisfies them.</Paragraph> <Paragraph position="19"> The only exception is that instead of <vowel2>, this rule queries <anyvowel2> without quote marks. The absence of quote marks directs this query to the current node, that is, STEM; the remaining rules determine what vowel is appropriate.</Paragraph> <Paragraph position="20"> Rule 2 indicates that unless another rule is better, anyvowel2 is just vowel2. However, in four cases, vowel2 must be replaced by shortvowel2, typically shewa (primed by the VERB node), but occasionally something else (overridden by hif'il verbs).</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Postprocessing </SectionTitle> <Paragraph position="0"> Many languages have rules of euphony. These rules are often called sandhi operations, based on a term used in Sanskrit morphology. We use the node ACCENT to introduce sandhi operations. Its name comes from the fact that the first operation we needed was to place the accent on the penultima, but we use it for other purposes as well.</Paragraph> <Paragraph position="1"> We begin by defining character classes similar to the $consonant class introduced earlier.</Paragraph> <Paragraph position="3"> Each class contains a subset of the Hebrew characters. We treat some combinations as single characters for this purpose, in particular, the vowels a25a39a22 and a14a33 . The first three classes are defined by enumeration.</Paragraph> <Paragraph position="4"> The fourth class, $accentableVowel, is defined in terms of previously defined classes, specifically, all vowels except those that are unaccentable. Similarly, the$letterclass includes all vowels, consonants, and accents, and the $noAccent class contains all letters except for accentable vowels. These classes are used in the ACCENT node.</Paragraph> <Paragraph position="5"> ACCENT:</Paragraph> <Paragraph position="7"> A query to ACCENT is a fully formed Hebrew word ready for postprocessing, with the endofword tag placed at the end. The first rule is a default that often is overridden by later rules; it says that whatever letter the query starts with, that letter can be removed from the query, and placed as a result. Furthermore, the unmatched portion of the query, indicated by <> on the right-hand side, is to be directed to the ACCENT node for further processing. Rule 2 says that if a resulting query has only endofword, that tag should be removed, and no further processing is needed.</Paragraph> <Paragraph position="8"> Rule 3 places accents in words that contain the @ sign, which we use to indicate &quot;do not accent this syllable.&quot; The left-hand side matches queries that contain an accentable vowel, followed by any number (zero or more, indicated by the Kleene star *) of letters that cannot be accented, followed by a second accentable vowel, followed by the@ mark. Such words must have the @ removed and an accent mark placed after the first accentable vowel matched, as indicated in the right-hand side. The empty<>at the end of the right-hand side directs unused portions of the query to ACCENT for further processing.</Paragraph> <Paragraph position="9"> Rules 4, 5, and 6 deal with shewa near the end of a word. Generally, shewa is deleted at the very end (rule 4), but not if it follows a67 (rule 5) or if the previous vowel is also a shewa (rule 6).</Paragraph> <Paragraph position="10"> 7 Accommodating guttural letters Our current efforts involve accommodating verb roots containing guttural letters. We have found that new rules in the postprocessing step, that is, theAC-CENT node, cover many of the cases.</Paragraph> <Paragraph position="11"> We first introduce postprocessing rules that convert shewa nah. (which we continue to represent as a27) to shewa na (which we represent asa47 ).</Paragraph> <Paragraph position="12"> #vars $longVowel: a25a89a37a12 a14a33 Rule 8 converts shewa nah. to shewa na on the first consonant of the word. We introduce the atom startofword in order to detect this situation, and we modify the reference to the ACCENT node in the VERB node to include this new atom. This rule uses =+= instead of = to separate the two sides. This notation indicates a non-subtractive rule; the right-hand side path encompasses the entire query, including that part matched by the left-hand side, except that the shewa nah. has been replaced by shewa na.</Paragraph> <Paragraph position="13"> After this replacement, KATR continues to process the new query at the same node. The left-hand side uses the ? operator, which means &quot;zero or one instances.&quot; This notation allows a single rule to match situations both with and without a dagesh.</Paragraph> <Paragraph position="14"> The other rules use similar notation. Rule 9 converts the first of two shewas in a row to a shewa na, except at the end of the word. Rule 10 converts a shewa nah. following a long vowel. Rule 11 converts a shewa nah. on a consonant with a dagesh. Rule 12 converts the shewa nah. on the first of two identical consonants.</Paragraph> <Paragraph position="15"> Given the distinction between the two shewas, we now add postprocessing rules that convert a guttural with a shewa na to an appropriate alternative.</Paragraph> <Paragraph position="16"> a74a27a41 a44a41 . Rules 16 and 17 correct the initial a44 in a44 &quot;a76 verbs in the qal.</Paragraph> <Paragraph position="17"> We add other rules, such as the following Rule 18, to correct situations where a guttural letter would otherwise acquire a dagesh.</Paragraph> <Paragraph position="18"> < a14 $guttural a12> = <a13 $guttural> % 18 We have not begun work on weak verbs containing a22 , a37 , and a33 , which might require different approaches. null</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 8 Further work </SectionTitle> <Paragraph position="0"> We continue to develop our Hebrew KATR theory.</Paragraph> <Paragraph position="1"> Our goal is to cover all forms, including the waw consecutive, infinitive, makor, and predicate suffixes, for both strong and weak verbs. We will then turn to nouns, including personal suffixes. Our success so far indicates that KATR is capable of representing Hebrew morphology in a concise yet readable form.</Paragraph> <Paragraph position="2"> Our larger goal is to host a library of KATR theories for various languages as a resource for linguists. Such a library will provide interested researchers with morphological descriptions that can be directly converted into actual word forms and will serve as a substitute, to some extent, for voluminous natural-language and table-based descriptions. In the case of endangered languages, it will act as a repository for linguistic data that may be essential for preservation.</Paragraph> </Section> class="xml-element"></Paper>