File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1079_metho.xml

Size: 13,546 bytes

Last Modified: 2025-10-06 14:13:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1079">
  <Title>PRINCIPAR--An Efficient, Broad-coverage, Principle-based Parser</Title>
  <Section position="3" start_page="482" end_page="484" type="metho">
    <SectionTitle>
3. Implementation of Principles
</SectionTitle>
    <Paragraph position="0"> GB principles are implemented as local and percolation constraints on the items. Local constraints are attached to nodes in the network. All items at a node must satisfy the node's local constraint, l?ercolation constraints are attached to the links in the network. A message can be sent across a link only if the item satisfies the percolation constraint of the link.</Paragraph>
    <Paragraph position="1"> We will only use two examples to give the reader a general idea about how GB principles are interpreted as loc, al and percolation constraints. Interested reader is referred to Lin (1993) for more details.</Paragraph>
    <Paragraph position="2"> 3.1. Bounding rpheory The Bounding Theory (Subjaneency) states that a movement can cross at most one barrier without leaving an intermedia~te trace. An attribute named ~hbarr+-0r is used to implement this l)rinciple. A message containing the attribute value -whbarrier iS used to represent an X-bar structure contMnlng a position out ol7 which a wh-constituent has moved, but without yet crossing a barrier. The wdue +whbarrier means that the movement has Mready crossed one barrier. Certain dominance links in the network are designated as barrier links. Bounding condition is implemented by tile percolation constraints attached to the barrier links, which block any message with +whbarrier and change -whbarrior to +whbarrier before the message is allowed to pass through.</Paragraph>
    <Section position="1" start_page="482" end_page="484" type="sub_section">
      <SectionTitle>
3.2. Case Theory
</SectionTitle>
      <Paragraph position="0"> Case. Theory reqlfires tha.t every lexicM NP be assigned an al)stl'act case. '\]'he implementation of case theory in PI{,INCII~AII, is based on the following attribute vaJues: ca, govern, cm.</Paragraph>
      <Paragraph position="1"> +ca the head is ,~ c~se assigner -ca the head is not a case assigner +govern the head is a governor -govern the head is not a governor -cr~ an NP m-commanded by the head needs case marking The case filter is implemented as follows: 1. LocM constraints attached to the nodes assign +ca to items that represent X-bar structures whose heads are case assigners (P, actiw.' V, and tensed I).</Paragraph>
      <Paragraph position="3"> \[ assign +ca to items with -passzve assign +ca to items with tense attril)nte \]';very item at NI' node is assigned an a.ttribute value -cm, which means that l;he NI' represented by l, he item needs 1,o be case-marked. The -cm al;tril)ute then propagates with tile item as it is sent to el;her nodes. '\]'his item is said t&lt;) be the origin of the -cm attribute.</Paragraph>
      <Paragraph position="4"> Barrier links do not Mlow any item with -cm l;o pass through, \])ceause, once the item goes beyond the 1)arri&lt;:r, the origin Of-era will not be governed, let alone casemarked. null Since each node in X-1)ar strncture has at most one governor, if the governor is not a case assigner, the node will not l)e case-marked. Therei'ore, a case-filter violation is detected if +govern -cm -ca co-occur in an item. On the other han&lt;l, if +govern +ca -cm co-ocetlr itl all item, +,;lien the head daughter of th&lt;; it&lt;,m governs and case:marks the origin of-cm. 'l'he case-filter condition on the origin of -cm is met. '\]'he -cm attril)ute is cleared. The local constraints attached to all the nodes check for the ('.o-occurrences el ca, cm, and govern to ensure &lt;:ase-filter is not violated by any item.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="484" end_page="485" type="metho">
    <SectionTitle>
4. Lexicon
</SectionTitle>
    <Paragraph position="0"> The lexicon in PRINCIPAl{ consists of two hash tables: a primary one in memory and a secondary one on disk. Tile secondary hash ta.= ble contains over 90,000 entries, most of which are constructed automatically by applying a set of extraction and conw:rsion rules to etP tries in Oxford Adwmced \],eaner's l)ictionary and Collins English I)ictionary.</Paragraph>
    <Paragraph position="1"> When a word is looked up, t;he F, rimary hashtable is searched first. If a,n entry for the word is found, the lexical search is done. Otherwise, the secondary hash table is searched. The entry retrieved from the secondary LaI)Ie is inserted into the primary one, so, tha,t when the word is encouutered again only in-memory search will be necessary.</Paragraph>
    <Paragraph position="2"> The primary hash table is lc, aded from a file a.L l;he system start-up. The file also serves as a buffer for changes to the secondary hash tM)le.</Paragraph>
    <Paragraph position="3"> When a lexical entry is ad(led or \]nc, dified, it is saved in the file for the prhnary hash table.</Paragraph>
    <Paragraph position="4"> The entry in the se&lt;:(mdary hash tal)le remains unchanged. Since the i)rimary hash tM)le is a lw~ws consulted first, its entrios override the (;orresponditlg entries in the seco\[ldary La})\]C. The reason why the buffer in needed is that the secondary hash table is designed ill such a way that update speed is sacrificed for the sake of ef\[icie.t retriewd. Therefore, updates to the secondary hash tal)le should I&gt;e done in batch and relatively infrequently.</Paragraph>
    <Paragraph position="5"> The tw(&gt;tier organization of the lexicon is transparent to the l)arser. That is, as far as the. parser is concerned, the lexic&lt;m is an o1&gt; jec{, that, given a word or a phrase, returns its lexical entry or nil if the entry (lees not exist in the lexicon. I,cxical rctrievM is very el\[icient, with over 90,000 entries, the average l;ime to retrieve an entry is 0.002 secon&lt;l.</Paragraph>
    <Section position="1" start_page="484" end_page="485" type="sub_section">
      <SectionTitle>
4.1. Lexical Entries
</SectionTitle>
      <Paragraph position="0"> All, hot@l the lexicon currently ttsed in I)I{IN -C'II&gt;AI{, contains only syl~.tactic information, it; may also be used to hoM other types of ilffof mation. Each lexical entry consists of ai1 eIltry word or phrase and a, list of functions with a,r~tllllClltS: null</Paragraph>
      <Paragraph position="2"> (subcat ((cat v)) (((cat i) -bare inf))) (subcat ((cat v)) (((cat n) (case acc)))) (subcat ((cat v)) (((cat c)))) q'\]le f'/ltlctioII subcat t'eturt/s a stll)c&amp;|,egoriz&amp;-Lion frame of the word. The first argtltneIl(; of t}te function is the attrHmte va,lues of the word  itself. The second argument of the function is a list of attribute value vector for the complements of the word. For example, the above entry means that acknowl edge is a verb that takes an IP, NP or CP as the complement. The lexicon is extensible in that users can define new functions to suit their own needs. Current implementation of the lexicon also includes functions ref and phrase, which are explained in the next two subsections.</Paragraph>
    </Section>
    <Section position="2" start_page="485" end_page="485" type="sub_section">
      <SectionTitle>
4.2. Reference Entries
</SectionTitle>
      <Paragraph position="0"> The lexicon does not contain separate entries for regular variations of words. When a word is not found in the lexicon, the lexleal retriever strips the endings of the word to recow~'r possible base forms of the word and look them up in the lexicon. For example, when the lc'xieal retriever fails to find an entry for &amp;quot;studies,&amp;quot; it searches the lexicon for &amp;quot;studie,&amp;quot; &amp;quot;studi&amp;quot; and &amp;quot;study.&amp;quot; Only the last one of these has an entry in the lexicon and its entry is returned.</Paragraph>
      <Paragraph position="1"> Irregular variations of words are explicitly listed in the lexicon. For example, there is an entry for the word &amp;quot;began.&amp;quot; IIowever, the snbcatgorization frames of &amp;quot;begin&amp;quot; are not listed again under &amp;quot;began.&amp;quot; Instead, the entry contains a ref fimction which returns a reference to the entry for &amp;quot;begin.&amp;quot;</Paragraph>
      <Paragraph position="3"> The first argument of ref is the attribute values of &amp;quot;began.&amp;quot; The second argument contains the base form of the word and a set of attribute names. The lexical items for the word &amp;quot;began&amp;quot; is obtained by unifying its attribute values with the attribute wdues in the lexiea\] entry for &amp;quot;begin.&amp;quot; The advantage of making references to the base form is that when the base form is modified, one does not have to make changes to the entries for its variations.</Paragraph>
      <Paragraph position="4"> 4.a. Phrasal Entries \]'he lexicon also allows for phrases that consist of multiple words. One of the words in a phrase is designated as the head word. The head word should be a word in the phrase that can undergo morphological changes and is the most in frequent. For example, in the phrase, &amp;quot;down payment,&amp;quot; the head word is &amp;quot;payment.&amp;quot; In d~e lexicon, a phrase &amp;quot;wl ... wj .... w,,/' is stored as a string &amp;quot;'Wh ... 'tOn, 101 ... 'U,~h_l.&amp;quot; That is, the first word in the string is always head word and the words Mter &amp;quot;,&amp;quot; should appear before the head word in texts. The runedon phrases converts il, s arguments into a list of phrases where tile entry word is the head.</Paragraph>
      <Paragraph position="5"> l,'or example, the lexical entry for &amp;quot;paymenC' is as follows: (payment (subcat ((cat n) (nform norm))) (phrases (payment, down) (payment, stop) (payment, token) (payment, transfer))) After retrieving the entry for a word, each phrase in the phrase list is compared with the surrounding words in the sentence. If the phrase is found in the sentence, the entry for the phrase is retrieved froin the lexicon.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="485" end_page="486" type="metho">
    <SectionTitle>
5. Reducing Ambiguities
</SectionTitle>
    <Paragraph position="0"> One of the problems with many parsers is that they typically generate far more parses than humans normally do. I&amp;quot;or example, the average number of parses pet' word is 1.35 in (l\]lack et al., 1992). That means that their parser produces, on average, 8 parses for a 7-word sentence, 3d parses for a, l%word sentence, and ld4 l)a.rses for a 17-word seiRe.nce, rphe la.rge number of parse trees make tim l~roe(,ssing at later stages more dillicult and error l)ruTte.</Paragraph>
    <Paragraph position="1"> PI{INCII)AI{ defines a weight for every parse tree. A weight is associated with every word sense and every link in the parse tree.</Paragraph>
    <Paragraph position="2"> \[Pile weight of the parse tree is the total weight of the links and the word senses ~tt the leaf nodes of the tree.</Paragraph>
    <Paragraph position="3"> The packed shared parse forest in PtUN-CIPAI{. is organized in such a way that the parse tree with minimum weight is retrieved first. I~IUNCIPAII, then uses the minimum weight and a predetermined number called BIGWEIGHT, which is currently arbitraryly defined to be 20, to prune the parse forest. Only  the parse trees whose weights are less than (minimum weiglit -F BIGWEIGHT/2) are spared and output.</Paragraph>
    <Paragraph position="4"> The weights of the links and word senses are determined as follows: e 'I'he links fi'om Xbar to an ad,imlct YP have weight=nlGWEIglIW and all the~ other links have weight=l.0.</Paragraph>
    <Paragraph position="5"> * The words in the lexicon ma,y have an attribute rar% which takes wdues from {very, very-very}. If a word sense has the attribute value (rare very), its weight is BIGWEIGIIT. Ifa word sense has the attribute value (rare very-very), its weight is 2xBIGWEIGIIT. Otherwise, the weight is 0, Note that the att;ribute rare is used to indicate the relative frequency among different stmses of the same word.</Paragraph>
    <Paragraph position="6">  the sentence &amp;quot;John read the story a,bout Kim&amp;quot; in Figure 3: in (a), lee about Kim\] is the co,nplement of &amp;quot;story&amp;quot;; in (b), it is the a.djunct of &amp;quot;read&amp;quot;. Since the adjunct dominance link from Vbar to PP has much higher weight than the complement dominance link from Nba.r to PP, the total weight of (a) is much smaller them the weight of (b). Therefore, only (a) is output as the parse tree of the sentence.</Paragraph>
    <Paragraph position="7"> Example 5.2. The lexical entry for tlm word &amp;quot;do&amp;quot; is as follows:  (subcat ((cat i) -passive -per~ (auxform do) -prog (cgorm fin) (tense present))) (subcat ((cat v) (rare very)) (((cat n) (case acc) (nform norm)))) (subcat ((cat v) (rare very-very)) (((cat n) (case ace) (nform norm)) ((cat n) (case acc) (nform norm)))) '\]'ha.t is &amp;quot;do&amp;quot; (:a.n bc an auxiliary verb, a transitive verb or a (li-trmlsitive verb. \[,'igure el shows two parse trees for the sentence &amp;quot;Who did Kim love?&amp;quot; The parse l;ree (a) corrcsI)onds to the correct; understanding of the sentence. hi (b), &amp;quot;did&amp;quot; is analyzed as a bi-tra,nsitive w,'b as in &amp;quot;Who did Kim a fawn'?&amp;quot; lloweww, since the latter sense of the word has an attribute value (rare very-very), tree (17) has much higher weight tha,n tt'ee (a) and only (a,) is otd.lmt, by the i)ai's(~l ..</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML