File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/c00-1021_abstr.xml

Size: 20,848 bytes

Last Modified: 2025-10-06 13:41:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1021">
  <Title>Tagging of very large corpora: Tol)ic-Focus Articulation</Title>
  <Section position="2" start_page="0" end_page="143" type="abstr">
    <SectionTitle>
2 Representing Topic-Focus
</SectionTitle>
    <Paragraph position="0"> Articulation (TFA) in TGTSs</Paragraph>
    <Section position="1" start_page="0" end_page="139" type="sub_section">
      <SectionTitle>
2.1 A I)rief characterization of TFA
</SectionTitle>
      <Paragraph position="0"> 'l'h(; te(:togranunatical tr(,.e struct;ures (TGTSs) should (:alIi;Ul'('. nol, only the syntactic ((l(,.1)(;n/Mmy) relations, lint also the. TFA of the utt(;ran(:es in the corpus, sin(:('. TFA is cx1)resscd l/y grammal;i(:al me,ms and is releva.nt for the meaning of (;he sentenc(; (even for its trut\]t (:onditions), i.e. it; constitutes one of the basic as1)e(:ts of un(l('rlying structures. Tlm scmanli(: reh',van/:c. (hi' TFA can be illustra.t('d 1)y (~xaml)lcs such as (1), wlfi(:h is a translal:i(m of the Czech (.'x. (1') (the capitals (l('amt(*. the. 1)la(:(;m(mt of th/'. int()naCion /:c.ntr(', i.e. I;tm focus t)rol)er): 2  (1) 0,) 1;.,..d.i.4,. i.,..~.vo/..c.,, i.,. t/,.,; StlJ;7'LANI)S. (b) i',, l.h,e ,%cl, hm, ds, lz,'NGLI,2H is ,~'pol,:e',,. (~') (,~) A,..d.id..:,j .~, .,,,.l.,,..,,~ ,,.,. Shctl,.',,.a.~t,::,j4,. 0,~ Tll, 0 VI~ CH.</Paragraph>
      <Paragraph position="1"> ordinl;('d groul)S. This makes it; l)ossibl(, to ret)resent  l;he I;(}(;I;og~rantlllai;i(:al st;rll(:l;llres of all s('dlt('.ilt;es a.q I;lee.q (rath(,., than using more-dimensional net:works); in this point, PDT ditlers fl:om the theoretical assumt)tions of th('. l)ragnian lqmctional Gen('xativ('. Descril)t, ion (now discussed in (Haji~':ov(~ (C/ al., 1998)).</Paragraph>
      <Paragraph position="2"> ~In the 1)rol, otyt)i(:at case the intonation (:e.ntre is characterized 1)y falling (or rising-falling) stress, but there are also cases in which (similarly as in questions, to a certain degree) the centre has a rising stress. This concerns utterances displaying a featm'e of hesitation or incompleteness, of. (M.,); ofte.n also with greet, ings (such as Czech Dobrd jihv \[Good morning\]) a difference of this kind marks the 'starting' token, connected with the expectation of an answering token, which exhibits a riffling sl;ress. Although in it S(~ll|;(*dlCC containing occurrences of l)oth a rising aild &amp; falling sLress the former exl)resses a contrastive (part; of) topic, we l)retier to analyze it its the fOCIlS ill ~ SC'II|;CI\].C( '. wiLhoul, all ()c(;urrellCe of the lal;l;er; in such a l)osition, the rising stress regularly is carried 1)3' an item referring to 'new' information. In written t;ext;s, some occurrences of |;he rising stress are marked 1) 3, a semicolon or by '... '.</Paragraph>
      <Paragraph position="3">  (b) Na Shetlandsk~jch ostvovech se mluv( ANGLICKY.</Paragraph>
      <Paragraph position="4">  The conmmnicative function of the sentence can basically be rendered by understanding its topic (T) as 'what is the sentence about', and its focus (F) as the information that is asserted about the topic, i.e., schematically, the interpretation of the sentence S can be understood as</Paragraph>
      <Paragraph position="6"> Thus, (1)(a) asserts, on its preferred reading (with just the locative modification constituting its focus) about where English is spoken that it is in the Shetlandt, which hardly can be accepted as true w.r.t, what we know of the actual world, if no specific context is present. (1)(b) is understood as true, stating about E. that it is spoken in the S.</Paragraph>
      <Paragraph position="7"> In the TGTSs the order of nodes is such that all parts of T precede all parts of F. Moreover, the order of nodes corresponds to the scale of communicative dynamism (CD, see Section 3 below); a less dynamic node prototypically has the broader scope than a more dynamic one (if the nodes correspond to operators). F proper is then the most dynamic (the rightmost) node.</Paragraph>
      <Paragraph position="8"> TFA is relevant also tbr the semantics of negation: null  (2) John (a) (b)  didn't come because he was ILL.</Paragraph>
      <Paragraph position="9"> The reason for Jolm's not-coming was his illness.</Paragraph>
      <Paragraph position="10"> The reason for John's coming (e.g. to the doctor) was not his illness but something else (e.g. he wanted to invite the doctor for a party).</Paragraph>
      <Paragraph position="11"> With the paraphrase (a), the negated verb 'come' is included in T, i.e. the fact that John's being ill is the cause of an event is asserted about the event that he did not come. With (b), the main verb 'come' alto belongs to T, but what it negated, is the relation between T and F: John came, but what is asserted about his coming is that the cause of this event was not his illness (he might have been ill, though).</Paragraph>
      <Paragraph position="12"> Every node in a TGTS is either contextually bound (CB) or non-bound (NB); this opposition is a linguistic couterpart of the cognitive dichotomy of 'given' vs. hmw', where also an item, if corresponding to a 'given' referent presented as occupying a newly characterized specific position (often in relation to one or more 'given' items), has the feature NB, cf.: (3) Give th, is to YOUR mother. (My parents don't like s~tch gifts.) kno',,,s oth ete,&amp;quot; ,lane.) Ho,.</Paragraph>
      <Paragraph position="13"> ever, th.is time she only invited IIER.</Paragraph>
      <Paragraph position="14"> The indexical pronoun 'your' in (3) and the anaphoric pronoun 'her' in (4) can only rethr to items that in a sense are 'known' in the given situation. However, in these examples, both of them occur as NB; their stress indicates their flmction as F proper of the respective sentence. Prototypically, an NB node belongs to F and a CB node is in T; however, a node not dependent immediately on a finite verb (esp. an adjunct) need not meet this condition. Thus, in (5), 'my' as a shifter, directly determined by the conditions of the discourse, is CB, although belonging to F, since it; depends on a part of F (see (HajiSovi~ et al., 1998) fbr a definition of T and F on the basis of contextual boundness and of syntactic dependency, as well as for other details of the given descriptive frmnework).</Paragraph>
    </Section>
    <Section position="2" start_page="139" end_page="141" type="sub_section">
      <SectionTitle>
2.2 The attribute TFA in PDT
</SectionTitle>
      <Paragraph position="0"> Three values of the attribute TFA are distinguished with every node in a TGTS:  1. T a non-contrastive CB node, which always has a lower degree of CD than its governor, if any; 2. F an NB node (if different from the main verb, then following after its head word in the TGTS) 3. C a contrastive CB node Examples: (5) (VoIby v Izracli.) Po volbdeh.(T) si</Paragraph>
      <Paragraph position="2"> mid,'a(F).</Paragraph>
      <Paragraph position="3"> (Headline in tile newspapers: Elections ill Israel.) After the elections(T), the Itraelis(T) get used(F) to a new(F) Prime Minister(F).</Paragraph>
      <Paragraph position="5"> but as a politician(C) he does not ex(:el(l?).</Paragraph>
      <Paragraph position="6"> The instructions for the assigmnent of the values of TFA can be briefly sl)e('itied as follows, if the surface word order and the 1)osition of the intonation center (IC, see fl)otnote 2 above) is taken illtO account, as well as /;he %ysi:(;mie' (canonical) ordering of the kinds of dependents (wtfich, in fact, (:ml difl'er with differ(mr hc.ad words; SO is Sl)e(:itie(t either in the valen(:y flames i1: the in(livi(hml lexi(:al entries, or, if i)ossibh.', fl)r whole lexical (:lasses and sub- null (:lasses): 1.</Paragraph>
      <Paragraph position="7"> 2.</Paragraph>
      <Paragraph position="8"> 3.</Paragraph>
      <Paragraph position="9"> 4.</Paragraph>
      <Paragraph position="10"> ( &amp;quot; * ,. ,, : the bearer )\] \] C ~ I i' t;vt)i(:allv the right-most del)endent of the verl)  if the IC is placed (m a nod(~ other than the rightmost one, th(', (:Oml)lem(',ntai;ions 1)laced after IC ~&gt; T a left side (lepend(mt; of the verl) ~ T o1' C, except for cases in which it (:learly ('arri(&gt;  th(: verb and lhose of its d(:l)endents tlmt stan(l \])el;weell l;he ver\]) all(l the F-llotl(: (se(: 1) and thai; re'e. or(h'.red (without all interv(ufing sisW.r node) a(:(:or(lil~g to SO ~ F; among sisi;(:r nodes, all those carrying ~.\[&amp;quot; %llow afl, er all those with C, and all those ('attying F follow after all those with T; there a.rc two sets of (':(:eptions: (a,) a. fo(;llS sensitive i)m'tiele can (:arry F even when i)l&amp;quot;e(:edillg its governing node that carries C, of. Se(:ti(m :3.2 be\]O\V null (b) ~ node M ca,rrying T or C can tblh)w after its nlol;her node if a node with F is 1)resent alnong the nodes subordilmte to M, })ut is M)sent both mnong the sisters of M mM among its superot'dinate nodes (here the reh~tion of 'superor(linate' and %ubor(linat(;' is the tra nsil:ive (:losm(: of 'governing' mid Mq)(;ndent'); (:f. the :lOtion of 'l)roxy fo(:us', (:hara(:terized in (ilaji~ovit el; al., 1998), and extortples such as (Kierdh, o u(7-it, ele j.si tam vidS.lQ l/idS1 jscm tam u~.ite.Ic ch.emie \[lit. (Which t;eacher.A(:cus have-you the,e ,~ee.?) I s~w the,'e (the) te.cl~e,&amp;quot; of-chemisl;ry\], with which the Patieltt 'ltrTitch', follows after the verl) in the ui:derlying tle(}, although it carries 3.' Note: For Cze('h, the SO of the main tyl)eS of dependency has 1)een found (on the 1)asis of eml)irical mmlysis of texts and of experiments with groups of speakers, see (Sgall eL al., 1995)) to h~vc (with most verbs and other heads) the tbllowing form, as for the main kinds of dependents: A(:tor- i rl'(~mt)oral ,:: Lo(:atiolt -:;  lnstrmnent ,: Addressee-; 1)aticnt 1,2Ithet a 5. eml)(~(hh',d a.t;tril)utes =~&gt; F (unless they are on\]y re, l)eat(',(l or restored) 6. il:dexic, a l expre, ssiolm (jd lIl, l,v \[youl, l,(,d' I,:,)wl, t(.z:,j Ihei'e.l, we~,k for,.s of p~'onouns, pronomina.1 expressions with a gene,.',~l .,(;~:.i,,g (,.;Z~do I,~o,nebodyl, :i~d,~o',~ \[once upon a timel...) ~ T (except ill cases of (:ontrast or as bearers of IC) 7. si;rong forms of pronouns --&gt; F (after  t)rel)osil:ions an(l in coordinated (:OllStru(:t;ions: l;he, assignment of T or F in @zc(:h is gui(lcd by (;it(', g(mcral rules l through 4) 8. restored lmdes, deleted in the surf:we forms of s(~,ll{,(~llces ~ T; we devote Section 2.3 below to l;he 1)lacelllOllt of the, restored nodes Note: There are special cases of (:oordination, both in Cech and in English, which do not mee, t this eolMition: e.g. in &amp;quot;l'hey drank white a.nd red win('? the firsl; occurr(m('e of %vine', which m~y be NB, is delet;ed in the surface (and restored in the TGTS).</Paragraph>
      <Paragraph position="11"> 9. a node N dei)endent to the left in a way not meeting the conditiol: of 1)rojectivity: C (this node is then placed lllore to the right, to meet that condition; these and ;~Let us not(: that Dirc(:tional.3 ('where to') tbllows aft, er Patient in Czc(:h as well as in Fmglish and also in Gc, rman, a(:cording to the Cml)irical research discussed in (M.,); t:lms i( is not exact to characot;riz(; the canonical order of German as a &amp;quot;mirror image&amp;quot; of that of English.  other movements are discussed in Section 2.4: below) 10. the nodes subor(linate to such an N move together with it and get T or F (according to the rules above) Note: The resulting TGTSs are projective, i.e. t br every pair of nodes x, y in a TGTS it holds that if x depends on y and x follows (precedes) 37, then every node z following (preceding) y and preceding (following) x is subordinate to y. Thus, 'not to meet the condition of projectivity' concerns tim 'analytic' trees; this means, in other words, that this condition would not be met if the positions of x and y in the left-to-right order of the nodes in the TOTS (in the 'underlying word order') always corresponded to their positions in the surface (morphemic and %nalytic') word order.</Paragraph>
      <Paragraph position="12"> Example (with a very simplified linearized notation of the TGTS, in which every dependent is closed in its pair of parentheses):</Paragraph>
      <Paragraph position="14"/>
    </Section>
    <Section position="3" start_page="141" end_page="142" type="sub_section">
      <SectionTitle>
2.3 The position of a restored node
</SectionTitle>
      <Paragraph position="0"> The degree of CD of a node that is being restored (i.e. supposed to have been deleted in the surface form of the sentence), and thus also its position in the underlying word order, is determined on the basis of its relationship to its governing node. Since such a node ahnost always is contextually bound (with the exception of the specific case of coordinated structures, see the Note after point 8 in Section 2.2 above), it is placed to the left of its governing word; more specifically:  (a) if the restored node RN depends on a verb, then: (b) (c) (aa) if RN is not the single item depending  on the given verb token, then RN is to be added in the 'Wackernagel position'; null (ab) if RN has no sister nodes, then it is placed at the beginning of the clause; if RN is restored as depending on a noun (or adjective), I{N is placed as the least dynamic dependent of this governing word; if more than one node are inserted as depending on one and the same item, then their order should confornl to tile systemic (%anonical') ordering of the valency slots (see the remark on SO in Section 2.2 above, point 4).</Paragraph>
      <Paragraph position="1"> Point (a) appears to be substantiated by the fact that e.g. the subject t)ronolln appears ill the zero form in Czech under similar conditions as the weak, clitic pronouns, for which the position imlnediately to the left of the verb is typical, cf. sentences such as VSera (on) p~'igel pozd5 \]Yesterday (he) canto here late\], Janu (oni) nevidSli \[lit.: .Jane-Accus they have-not-seen\], o1&amp;quot; (On) spal \[He was-sleeping\]. This concerns also such deletable items as e.g. the Directional with pfijet \[arrive\], cf. Jan dnes (sere/tam) ncp~'~;jcl \[lit. .John to-day (he,'e/there) has-not-arrived\]. The appropriateness of these preliminary rules is being checked during the tagging procedure, the results of which will be of importance for a more exact (and more complete) formulation of the relevant parts of the description of the sentence structure of Czech. This aspect  of the useflflness of the corpus tagging concerns also many ol;h(;r 1)oinl;s of grammar.</Paragraph>
    </Section>
    <Section position="4" start_page="142" end_page="142" type="sub_section">
      <SectionTitle>
2.4 Underlying and surface word order
</SectionTitle>
      <Paragraph position="0"> Within the tagging procedure, tim differences between the two levels of the left-to-right order can be described 1)y movelnent rules, a preliminary tbrm of which can be brietly characterized as follows: 1. if a node 1111 carries C and a node M2 del)ending on M1 is 1)laced to the right of a node M3 superordinate to M1 in the surtkce word order, then M1 is placed immediately to the left of M2 in the resulting tree; cf.</Paragraph>
      <Paragraph position="1"> e.g. &amp;o,'tov~',c (M1) o,, .# (M3) dob,&amp;quot;,'j (M2) Ilit. (As a) sportsman he is goodl, see ex.</Paragraph>
      <Paragraph position="2">  (6) in Section 2.2 2. if the 1)ositions of the nodes MI, M2 and M3 differ front l)oint \] only in t;hat M1 (h&gt;  pends on M2, then again M1 is placed immediately to the left of M2 ill the resul/:ing tree; of. exanll)le (7) ill Se(:i;ion 2.2 a\])ove, in which jdsot occut)ies the position of M\], d,ivod that of M2, and neni that of M3, or:  (9) ,lirku (M1)j.sme pld'novdi(M3) po,~'la, l, (M2) do F'r~n(:i('.</Paragraph>
      <Paragraph position="3"> IliL George.Ac(:us (M1) we-1)la\],ned (M3) 1;o-send (M2)to \]Clan(:e\] 3. ~ compar~tive of an ~Mje(:tive thai; \])rece(les  its governing 1).OUll in t;he surface is moved to the right of this noun in (,,xamt)les such as vdt.C/i re&amp;to nc~ 13o,s'to',, \[a. hn'ger town than Boston\]; I;his surface order probably should be limited (by a rule of grammar) to cases in which the two nouns 1)elong to a single semantic sul)class.</Paragraph>
      <Paragraph position="4"> 4. in sentences exhibiting a secondary placement of IC, the bearer of IC occupies the rightmost 1)osition in the resulting l:ree; cf. example (\])(b) in Section 2.1 al)ove, in whi('h 'English' is tile t bt:us prol)er; the assuinl)gion underlying the. t)lacemenl; of IC in a written text is that g~ written form of ~ sentence may correspond to dit\[erent (siloken) sentences, according to the differences of the 1)lacement of IC in the al)l)ropriate way of 1)renouncing t;he sentence.</Paragraph>
      <Paragraph position="5"> 3 The special case of focus sensitive particles Since the focus sensitive particles are idengified (1)y the flmctor value RHEM for 'rhematizer' or 'focalizer'), it is possitfle to use PDT also for a sl)ecitication of their occurrences in different positions 1)oth in the det)endency structure of the sentence and in its TFA. Tile starting hyl)otheses, which might be checked on the basis of PDT, are. as tbllows (of. (Hajieov5 ctal., 1998)):</Paragraph>
    </Section>
    <Section position="5" start_page="142" end_page="143" type="sub_section">
      <SectionTitle>
3.1 Focus sensitive particles in
</SectionTitle>
      <Paragraph position="0"> i)rototypical positions The 1)rotol;yl)ical syntactic position of a foc, alizer ca.ll t)e understood as that of a dependent of a verb node; thus, in examples like (10) or (11), it is 1)ossible to specit:y lhe scope of the foealizer as the whole subtree subordinated to lhe verl) (where &amp;quot;sul)ordilml;ed' is undersl;ood as t\]le transitive closure of klel)en(lent' in the reflexive s('.nsc, so I:hat the, verl) itself is in('luded); the st'Ol)e is divided into 1)a(:kground and focus of the fl)calizer (ti:'), as will 1)e specified in 3.2. Thus, in the interl)retation of (10) on the reading ret)rcsented (with many siml)lifications) by (10') it is included that (according to what P.</Paragraph>
      <Paragraph position="1"> knows) among l;hose whom % saw there was noone else t;han M (i.e. while 'T. saw' constil;lll;es l;he 1)ackground of 'only', its fl&amp;quot; is 'Mary'). Similarly, if in (11) the negation (all;hough ex\]n'css('d l)y ~t prefix in Czech) is handled as a det)cn(lelfl: of the \,er\]), its bad{ground is the subject and tt' includes 1)oth the vcrl) an(l t;he oh.iect.</Paragraph>
    </Section>
    <Section position="6" start_page="143" end_page="143" type="sub_section">
      <SectionTitle>
3.2 Focus sensitive particles in the
</SectionTitle>
      <Paragraph position="0"> hierarchy of comnmnicative dynamism The primary position of a focalizer ill a TR is at the boundary between tile topic and the focus of the verb clause and the tbcus of tile clause is then identical to tile focus of tile focalizer. If a fbcalizer is included in the topic, then its focus contains those items which in the TR are placed between this focalizer mid the next item ularked as C to tile right and are nlore dynamic than the tbcalizer).</Paragraph>
      <Paragraph position="1"> It should be noted that CD is understood here as a partial ordering defined so that: (i) in every set of a head and its daughter nodes, every daughter node placed to the right of its head is more dynamic than evcry daughter node placed to the left of its head; (ii) the relation 'more dynanfic' is deternlined by the irrettexive trausil;ive closure of (i). ~i'hus, e.g. in the TI{ (10'), 'knows' is more dynalnic than 'Paul' and less dynmnic than 'saw' according to tile point (i), and both 'only' and 'Mary', being more dynanlic titan 'saw', are more dynmnie than ~knows' according to the point (ii); however, ~Thomas' is neither more nor less dynamic than 'knows'. If (10) is cutbedded into a more conlplex sentence as (a part of) its topic, titan 'Mary' is more dynanfic thml %nly' and has the f~atm'e C; thus, e.g. with 'Since Paul knows that Thomas saw only Mary, he is not afraid', 'Mary' constitutes the whole fl of 'only', similarly as in (10').</Paragraph>
      <Paragraph position="2"> Tile underlying word order W (a linear ordering) is then defined on the basis of CD, with (iii) and (iv) holding tbr every two nodes x and y in a tree: (iii) if node x is nlore dynamic than node y, then x tbllows y under W; (iv) if node x follows node y under W, node u is subordinated to x and node z is subordinate to y, then u tbllows both y and z, and x follows z under W.</Paragraph>
      <Paragraph position="3"> Among tile non-prototyt)ical , secondary positions of tbcalizers, there are also the cases of their clustering (e.g. 'not only'), as well as the sentences in which a focalizer itself constitutes the whole locus of tile sentence ('He DID realize this').</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML