File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2152_metho.xml

Size: 17,827 bytes

Last Modified: 2025-10-06 14:13:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2152">
  <Title>Hypothesis Scoring over Theta Grids Information in Parsing Chinese Sentences with Serial Verb Constructions</Title>
  <Section position="3" start_page="0" end_page="942" type="metho">
    <SectionTitle>
2 A Theta-grid Chart Parser
</SectionTitle>
    <Paragraph position="0"> Since the mechanism we propose is under the framework of a theta-grid chart parser, in this section, we introduce the parser briefly. Thematic inJbrmation is one of the information sources that can bridge the gap between syntactic and semantic processing phases. In theta-grid theory ITang 1992\], rich thematic information is incorporated for the analysis of human languages. The idea of theta-grid theory is as follows: we use a predicate, say, a verb, as the center of a &amp;quot;grid&amp;quot; and, by finding the theta-roles registered in the lexical entries of this predicate, we can construct a grid formed by this predicate and then construe the sentence (or clause) spanned by this predicate. We think the thcta-grid representation suitable for processing Chinese. This shares similar viewpoint with other work of designing Chinese parser which uses thematic information, such as ICG parser \[Chcn and Huar, g 1990\]. To computationalize theta-grid theory, some control strategies for parsing must be implemented.</Paragraph>
    <Paragraph position="1"> The well-known chart parser \[Kay 19801, which utilizes the data structure &amp;quot;chart&amp;quot; to record the partial parsing results, is suitable for our work. Since it keeps all possible combination of constituents, it can accept sentences with missing thcta roles. Thus, we designed a modified chart parser called TG-Chart parser \[Lin and Soo 1993\] by combining thcta-grid theory and chart parser. Note that currently in our work, only the theta  grids for verbs are considered. For each verb, there are two kinds of theta roles registered: the obligatory roles, which must bc found for this verb to construct a legal &amp;quot;grid&amp;quot;; the optional roles, with their appearance being optional. Takc &amp;quot;~ ~)~&amp;quot; for example, its theta roles are registercd as: +lTh (Pd) Agl; thus, two NPs must bc found in the chart for the constntction of a legal grid (From ,~yntactic clues, both &amp;quot;Ag&amp;quot; and &amp;quot;Th&amp;quot; are always played by NPs. ILiu and See 19931.), while the appearance of a clause to serve as a &amp;quot;Pd&amp;quot; role is optional. A brief dcscriptiou of our parsing algoritlun is as follows: \[Step 1\] Search the sentence for positions of all &amp;quot;verb candidates&amp;quot;. (What we call verb candidates are those words that have the verb-categol7 as one of its syntactic categories in the dictionary.) \[Step 2\] By considcring all possible combination, the chart parser groups the words into syntactic constituents. Syntactic knowlcdge is used in this step.</Paragraph>
    <Paragraph position="2"> \[Step 3\] If only one verb candidate ix lbund in I Slep l\], search the chart \[or constituents which can play the theta rolcs of ttfis verb.</Paragraph>
    <Paragraph position="3"> \[Step 4\] If more than one verb candidate are lbund, call S-model to deterlnine the most preferred structure. S-model will be describcd in scction 3.</Paragraph>
  </Section>
  <Section position="4" start_page="942" end_page="945" type="metho">
    <SectionTitle>
3 The S-model
</SectionTitle>
    <Paragraph position="0"> We design a model which utilizes scoring fimctions and thela-grid theoiy to handle the SVCs problem. This model, called S-model (au abbreviation of &amp;quot;SVCs lmndling model&amp;quot;), consists of four modules: a combinalion genera|or, a combination filter, a score cvaluator, and a struclure selector as shown in Ifigurc 1 I.  As wc know, all verb candidates compete to act as verbs. '/'Ire qt.cstion is: &amp;quot;wlfich candidates can actually act as vcrbs?&amp;quot; and, &amp;quot;what is thcir correlation?&amp;quot;. If we can enumcratc all possible combination and cwfluale their scores respectively, we can determine the most preferred construction. Take the two-verb-candidates case as an example, let the two verb candidates be vl, v2, there arc five combination: (1) only vl is a verb whilc v2 is not, (2) only v2 is a verb, (3) both vl aml v2 arc vcrbs, while there is not any subordination relation bclweeu them. (4) both arc verbs, and vl is subordinatc to v2. (5) both arc verbs, alld v2 is subordinate to vl.</Paragraph>
    <Section position="1" start_page="942" end_page="943" type="sub_section">
      <SectionTitle>
3.1 Combination Generator
</SectionTitle>
      <Paragraph position="0"> Combination Geucrator consisls of two submodules: Verb-string Generator and Subordination-relation Tagger. We illustrate a case with three verb candidates: by sequentially cnmucrating tile biua O' string: 001, 010, 011, 100, 101, llO, 111. The verb string &amp;quot;101&amp;quot; represents the situation where vl and v3 acl as verbs, wlfile v2 docsn't. And then, Subordinatiouorelation Tagger tags these verb strings with possible subordination relations. II divides these strings into three classes according to the occurrences of l's in lhe siring, that is, the number of verb candidates in the sentencc.</Paragraph>
      <Paragraph position="1"> These three classes arc: (I) For the one-1 class (i.c., 001, 010, 100), there is obviously no subordination relation.</Paragraph>
      <Paragraph position="2"> Thai ix, there is only one possible case to consider: this candidate acts as the only verb in Ibis sentence. (2) For the two-I class (i.e., 011. 101, 110), there are three possibilities to consider: vl=v2, vl&lt;v2, and vl&gt;v2. Wc follow the notations used by Pun \[Ibm 1991\], where &amp;quot;vl&gt;v2&amp;quot; means v2 is subordinate to v\[; &amp;quot;vl-~v2&amp;quot;, no subordination relalions exist between the two verbs. (3)  For the three-1 class (i.e., 111), there are seventeen cases. We use abbreviated notations to represent them, where &amp;quot;&gt;&lt;&amp;quot; is the abbreviation of &amp;quot;vl &gt; \[v2&lt;v31&amp;quot;, with square brackets being represented by underlines, meaning that locally v2 is subordinate to v3, and they together form a clause, which then plays a prepositional role for vl, and, for another example, &amp;quot;=&lt;&amp;quot; is the abbreviation of &amp;quot;lvl=v2\] &lt; v3&amp;quot;. These seventeen cases are: ==, =% =% =&gt;, =&gt;, &lt;=, &lt;=, &lt;&lt;, &lt;&lt;, &lt;&gt;, &lt;&gt;, &gt;=, &gt;=, &gt;&lt;, &gt;&lt;, &gt;&gt;, and &gt;&gt;.</Paragraph>
      <Paragraph position="3"> Thcse cases are gcneratcd simply by enumerating possiblc combinations of thesc threc symbols: =, &lt;, and &gt;.</Paragraph>
      <Paragraph position="4"> For each pair of symbols Sj ,S~, two combinations arc possible: S,S 2 and ,5~S 2 . Note that &amp;quot;-:&amp;quot; and &amp;quot;=-&amp;quot; represents the same case; thus, only a single &amp;quot;==&amp;quot; is generated. Therefore, 3x3 x2-1 = 17 cases arc possible. By summarizing classes (1), (2), and (3), Combination Generator generates C~ x 1 + C 3 x 3 + C~ 3 x 17 = 29 cases. It is easy to design a routine which ,~ystematically enumerates these possibilities.</Paragraph>
    </Section>
    <Section position="2" start_page="943" end_page="943" type="sub_section">
      <SectionTitle>
3.2 Combination Filter
</SectionTitle>
      <Paragraph position="0"> The Combination Generator above does not take linguistic knowledge into consideration. Actually, tliere are some cases which will never happen in a real sentence according to syntactic constraints. Thus, it is not necessary to pass it to the score evaluator.</Paragraph>
      <Paragraph position="1"> Combination Filter is responsible for filtering out impossible cases. We illustrate three circumstances.</Paragraph>
      <Paragraph position="2"> Firstly, for &amp;quot;vl &gt; v2&amp;quot;, the Combination Filter will check the theta grid for vl; if there is a Pd or Pc role registered in vl, it is possible, since v2 can be subordinate to vl only if vl also expects a prepositional role; othenvisc, such a case is filtered out. The second circumstance is, when vl has only a single syntactic category, verb, it must act as a verb in the sentence. Thus, the case that v2 acts as a verb while vl doesn't is removed. The third circumstance regards the three-candidates situations.</Paragraph>
      <Paragraph position="3"> Combinalion Generator generates seventeen cases; however, under some circumstances, there are four cases which are impossible: &lt;&lt; &lt;&gt; &lt;&gt; and &gt;&gt; These circumstances happens when the main verb of the prepositional part (i.e., the part marked by a underline.) expects an animate agent. In such circumstances, a VP cannot be subordinate to an &amp;quot;event&amp;quot;. Thus, these four will be filtered out by Combination Filter. For example, the following sentence, with the relation &amp;quot;&lt;&gt;&amp;quot; (i.e., ~-f &lt; ~&gt;~h~\]), is impossible: &amp;quot;;t~ N ~. ~-~h~, ~~'~ I~ &amp;quot; (11~_ thunder hope attend the labor insurance) (Thundering hoped to attend the labor insurance.). It is because &amp;quot;~ ~ &amp;quot;expects an animate NP to act as its Ag, the VP &amp;quot;-~T '~&amp;quot; thus cannot act as itsAg role.</Paragraph>
      <Paragraph position="4"> There are still many linguistic knowledge and constraints which can be used by Combination Filter.</Paragraph>
      <Paragraph position="5"> However, some of them, such as the third circumstance mentioned above, are too specific and thus must be used carefully to avoid over-constraints. Therefore, how to collect and select those constraints and knowledge which are general enough is still our filturc concern.</Paragraph>
      <Paragraph position="6"> The main function of Combination Filter is to improve the performance of the S-model. Note that in this paper, for the beneficiary of brevity, Combination Generator and Combination Filter are designed as two separate modules. However, Combination Filter can behave as an embedded module of Combination Generator so that it can cut off some generating branches which are impossible as early as possible. It is also our future concern.</Paragraph>
    </Section>
    <Section position="3" start_page="943" end_page="944" type="sub_section">
      <SectionTitle>
3.3 Score Evaluator
Whenever Combination Filter passes a feasible case
</SectionTitle>
      <Paragraph position="0"> into Score Evaluator, the Score Evaluator utilizes a scoring function to compute the score of the input case and then, passes the evaluated score to lhe structure selector. We will now describe it:  In our legal domain corpora, there are many occurrences of SVCs. Since our parser is based on tile theta grids, in case of SVCs, different verbs will compete in finding their own theta roles. Thus, some mechanism for arbitrating among verbs for the ownership of each constituent in tile chart must be designed. Just as what Yorick Wilks said, language does not always allow the formation of &amp;quot;lO0%-correct&amp;quot; theories \[ltirst 19811; therefore, we attempt to find a more flexible melhod for recognizing SVCs. We propose a scoring fimction to select a &amp;quot;preferable&amp;quot; construction for the sentence with SVCs. (For the &amp;quot;preference&amp;quot; notion, sec \[Wilks 19751 \[Fass and Wilks 19831.) The scoring fimctiou is called Sfimction, an abbreviation for &amp;quot;SVCs scoring fimction&amp;quot;. S-function is defined as in lfigurc 21, where RWR is the abbreviation of &amp;quot;Ratio of Words included in some phrase with Roles assigned&amp;quot;, RRF, &amp;quot;Ratio of Roles Found&amp;quot;, OBR, &amp;quot;OBligatory Role&amp;quot;, and OPR, &amp;quot;OPtional Role&amp;quot; (Note that OBR and OPR indicate those roles registered in theta grids'.):</Paragraph>
      <Paragraph position="2"> mm~ber of words included in some phrase with roles assigned R WR = number of words in the chmsc</Paragraph>
      <Paragraph position="4"/>
      <Paragraph position="6"> The score is calculated as the average value of scores obtained by each verb in the sentence (as in equation l).</Paragraph>
      <Paragraph position="7"> For each verb, the score is eslimatcd by two factors: first, the ratio of lheta roles found, i.e., RRF, and, second, the ratio of words with roles assigned, i.e., RWR. For detailed formula, see equation (2). The relative significance between obligatory roles and optional roles is heuristically weighted by 2:1, as m (3) and (5); thus, the value ofk is set to be 2. In some cases, the verb finds many theta roles in the clause it constructs, but the words in this clause are not all assigned roles. Wc consider such assignment doesn't constrnc the real construction of the sentence. Thus, to reflect such cases, we calculate RWR by dividing the number of words which are included in some phrase with a role assigned by the total number of words in the clause (see equation 4).</Paragraph>
    </Section>
    <Section position="4" start_page="944" end_page="945" type="sub_section">
      <SectionTitle>
3.3.2 Illustration of S-function
</SectionTitle>
      <Paragraph position="0"> Now, let's illustrate the calculation of S-function by the fbllowin~ examples: ..</Paragraph>
      <Paragraph position="2"> In this example, we demonstrate how to determine whether a verb candidate ean actually act as a verb. in \[Step 11, &amp;quot;~ ~ &amp;quot; (file) and &amp;quot;~,)i &amp;quot; (tell) are both found as &amp;quot;verb candidates&amp;quot;. Here &amp;quot; {~-?,# &amp;quot; has two syntactic categories registered in its lexical entry: the verb and the noun, while &amp;quot; ~ ~\[~ &amp;quot; has only one category, the verb.</Paragraph>
      <Paragraph position="3"> The theta grid for &amp;quot;~ ~ &amp;quot; is ~\[Th Ag\], &amp;quot; @ ~,)~ &amp;quot; I \[Th (Pd) Agl. So, to decide whether &amp;quot; ~ ~)# &amp;quot; is treated as a verb or a noun, there are tbnr cases to be considered:</Paragraph>
      <Paragraph position="5"> In the above, &amp;quot;~ ~ &amp;quot; enveloped by a box means it plays a verb. When it searches for theta roles, &amp;quot;),~?, ~&amp;quot; ,rod &amp;quot;~ g)) &amp;quot; are respectively found as its Ag and Th, the two obligatory theta roles registered in its lexical entry. The score is calculated as folk~ws: For &amp;quot;~ ~ &amp;quot;, there are two obligatory roles, so Base = 2 x 2 = 4. Moreover, in this sentence, &amp;quot;N ~ &amp;quot;, &amp;quot;~ ~ &amp;quot;,&amp;quot; t~_ tt~ &amp;quot;,and &amp;quot;~t~_ ~1~ ,, are all assigned some roles; thus, RWR = 4/4 -- 1. And then,</Paragraph>
      <Paragraph position="7"> From the above discussions, case(l) apparently gets the highest score (1.00). So, the parsed structure in case(I) is preferable to those in the other cases. That is, in this sentence, &amp;quot; ~ ~ &amp;quot; plays the only verh, while &amp;quot; ~ ~ &amp;quot; plays a noun. Therefore, the right syntactic category for &amp;quot;~ff/&amp;quot; in this sentence is determined.</Paragraph>
      <Paragraph position="8"> In this example, we will demonstrate how to determine the rehuionship between verbs. In \[Step 1\], &amp;quot; ~-~i ~ &amp;quot; (request) and &amp;quot;~ Jt~&amp;quot; (divorce) are both found as &amp;quot;verb candidates&amp;quot;. Here &amp;quot; ~-i,~- ~ &amp;quot; and &amp;quot; ~ J/~ &amp;quot; both have two syntactic categories registerexl in its lexical entry: the verb and the minn. The theta grid for &amp;quot;~ ~ &amp;quot; is +\[(Th) Pe Agl, &amp;quot;~t}&amp;quot; +lAg (Ag)l. So, there are five cases to  be considered: (1) &amp;quot;~;~ ~&amp;quot; is treated as the only verb, while &amp;quot;~\[~&amp;quot; a noun. Score :~ 0.15/I = 0.15.</Paragraph>
      <Paragraph position="9"> (2) &amp;quot;~t ~?&amp;quot; is treated as a verb, while &amp;quot;~h~&amp;quot; a noun.</Paragraph>
      <Paragraph position="11"> For &amp;quot;~\[ #~ &amp;quot;, Base= 3. Note that although &amp;quot;~ ~ &amp;quot; is an NP, it cannot play as Ag for &amp;quot; ~(\[ ~ &amp;quot;. It is because it doesn't satisfy the constraint for playing as Ag: an Ag must has a feature &amp;quot;+animate&amp;quot;, according to Gruber's theory that an agent nmst be an entity with intentionality \[Gruber J. S. 19761. The situation that a verb cannot find a theta role is represented by the symbol &amp;quot;r--'l &amp;quot;. So,</Paragraph>
      <Paragraph position="13"> (3) &amp;quot;~ ~ &amp;quot; and &amp;quot;~ ~t~&amp;quot; both are treated as verbs. Seore = (0.134+0.67)/2 = 0.402.</Paragraph>
      <Paragraph position="14"> (4) &amp;quot;~ ~&amp;quot; and &amp;quot;N ~;' both are treated as verbs, with &amp;quot; ~l}&amp;quot; being subordinate to &amp;quot;~ ~&amp;quot;</Paragraph>
      <Paragraph position="16"> From the above discussions, case(4) apparently gets the highest score (0.535). So, the parsed structure in case(4) is preferable to those in the other cases. That is, in this sentence, &amp;quot;N ~&amp;quot; and &amp;quot;~ ~t\[}&amp;quot; both are treated as verbs, while &amp;quot; ~ ~ &amp;quot; is subordinate to &amp;quot; ~'~ J-\]~ &amp;quot;. The clause constructed by &amp;quot;N ~&amp;quot; is assigned the Pe role for &amp;quot;~ ~ &amp;quot;. Thus, this is a SVC sentence; moreover, this kind of SVC is commonly called &amp;quot;sentential objects&amp;quot;.</Paragraph>
    </Section>
    <Section position="5" start_page="945" end_page="945" type="sub_section">
      <SectionTitle>
3.4 Structure Selector
</SectionTitle>
      <Paragraph position="0"> Structure Selector plays a final arbitrator. It collects all feasible cases and their scores. After scores of all cases are evaluated, the competition of all cases is arbitrated by Structure Selector. Structure Selector selects the case with the highest score as the most prelbrred one. The final result is retnrncd to the parser.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML