File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-2017_metho.xml
Size: 14,348 bytes
Last Modified: 2025-10-06 14:09:47
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-2017"> <Title>Minimalist Parsing of Subjects Displaced from Embedded Clauses in Free Word Order Languages</Title> <Section position="4" start_page="97" end_page="101" type="metho"> <SectionTitle> 3 The Parser in Action </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="97" end_page="98" type="sub_section"> <SectionTitle> 3.1 A Run-through </SectionTitle> <Paragraph position="0"> Our parser (2004) is incremental, meaning that it does not have access to the end of the sentence at the beginning of a derivation. It is also &quot;semantically greedy&quot;, meaning that it attempts to satisfy the semantic requirements (through checking) as soon as possible. So each step in the derivation consists of attempting to see whether or not checking can be accomplished using the current items in the &quot;processing buffer&quot; and those in the &quot;input queue,&quot; and if not, shifting a word from the input queue onto the processing buffer. The distinction is marked, in our notation, by a |: the words and trees before |are in the processing buffer, and those that are after |are in the input queue.</Paragraph> <Paragraph position="1"> The algorithm also prefers move before merge.</Paragraph> <Paragraph position="2"> This also ensures that trees do not have multiple pending resolvable semantic dependencies, which can represent a state of ambiguity in determining which dependency to resolve and how.</Paragraph> <Paragraph position="3"> We will now present an example parse of the above sentence. But we will first present the general outline of the parse, rather than the full details using the formal representation; after that, we will demonstrate the formalism. We sketch the steps of the parse first so that we can deduce what features we would need to make it work with the system.</Paragraph> <Paragraph position="4"> We first start with everything in the input queue, after the |: (4) |tametsi tu scio quam sis curiosus Now we need to shift (hear) two words for any parsing operations to be performed. So we shift tametsi and tu. tametsi (&quot;although&quot;) consists of tamen, et, and si: &quot;nevertheless&quot;, &quot;and&quot;, and &quot;if.&quot; These suggest that tametsi is part of a CP, and, most likely, Force. Since tu has been displaced from the embedded clause, probably for prosodic reasons, it likely has features that can be gleaned from the intonation and the context, such as Focus. Since these are part of our CP system, we merge them.</Paragraph> <Paragraph position="5"> (5) tametsi tametsi tu scio quam sis curiosus Now we have to shift scio. But the verb scio does not have a complement and cannot merge with tametsi until it is a complete VP. The same is true for quam (&quot;how&quot;) and sis since sis (&quot;you are&quot;) needs a complement: curiosus. So the system waits to shift everything and then merges sis and curiosus.</Paragraph> <Paragraph position="6"> (6) tametsi tametsi tu scio quam sis sis curiosus Now we can merge sis and quam, since sis now has a complement. Latin is a pro-drop language, so we can perform the merge without having an explicit subject, which is currently part of another tree. (7) tametsi tametsi tu scio quam quam sis sis curiosus quam has been given its complement. Now as a complete CP, it is ready to be a complement of scio. (8) tametsi tametsi tu scio scio quam quam sis sis curiosus We have a CP (the tametsi tree) and a VP (scio), and we need to merge them to form one CP.</Paragraph> <Paragraph position="7"> tametsi (9) tametsi tametsi tu scio scio quam quam sis sis curiosus So this leaves us in the position of having a tu and sis in one tree. However, we cannot bring them together. In Sayeed and Szpakowicz (2004), we required (in order to limit tree searches) that movement during parsing be to positions that command the trace of movement. Clearly, tu does not command sis. We only permitted raising, so what should we raise? If we raised the entire CP, we would get a tree in which neither tu nor sis commands the other. We would have to make another move to get sis to command tu. So we take a simpler route and just move sis. Note that sis still projects after the merge, seeing that sis holds the requirement for a subject--tu is now in what would be known as a specifier position. It does not matter that tu does not presently command its trace; this is something in our account of parsing that differs from GB and minimalist accounts of movement in generation. Instead, the position with which it must be merged after movement can be the one that commands the original position. This allows the target position to be the one that projects, as sis has.</Paragraph> </Section> <Section position="2" start_page="98" end_page="101" type="sub_section"> <SectionTitle> 3.2 Now with Features </SectionTitle> <Paragraph position="0"> Now all dependencies are satisfied, and we have a complete tree. What we need to accomplish next is an account of the features required for this parse under the system in Sayeed and Szpakowicz (2004).</Paragraph> <Paragraph position="1"> We add one extra characteristic to Sayeed and Szpakowicz (2004) which we will explain in greater detail in forthcoming work: optionally-checked features; this is required primarily to avoid having to imagine empty categories when parsing such phenomena as dropped subjects, which exists in Latin.</Paragraph> <Paragraph position="2"> First of all, let us account for the lexical entries of the initial two words, tametsi and tu. We need features that represent the discursive effect represented by the displacement of tu. We shall assume that this is Focus. Also, however, we need a feature that will prepare tametsi to merge with scio. So we represent these two as (12) tametsi: {UNCH?(Disc:Focus), UNCH(Type:V)} tu: {unch(Disc:Focus) - unch(Case:Nom, Pers:2, Num:Sg)} Features are grouped together into feature bundles, which allow simultaneous checking of features. Note that the ? in one of the feature bundles of tametsi means that it is optional; it does not have to be checked with a focus feature on an adjacent constituent if such a feature does not exist, but it must if there is one.</Paragraph> <Paragraph position="3"> For tu we are using feature paths as we defined in Sayeed and Szpakowicz (2004); what is to the right of a feature path cannot be checked before what is to the left. In this case, we must check the focus feature before we can check tu as a constituent of its proper VP (headed by sis).</Paragraph> <Paragraph position="4"> We express the trees using the same horizontal indented representation as in Sayeed and Szpakowicz (2004). We use this notation because the nodes of this tree are too large for the &quot;normal&quot; tree representation used above. So we start with (13) |tametsi tu scio quam sis curiosus We need to shift two words before we can do anything. We thus create nodes with the above features.</Paragraph> <Paragraph position="6"> The Focus features can be checked. Using our system, unch and UNCH feature bundles are compatible for checking, and the node with the UNCH feature projects. This form of merge among the items already shifted can only be performed with the roots of adjacent trees. We specified this to prevent long-distance searches of the processing buffer.</Paragraph> <Paragraph position="8"> When UNCH and unch features bundles are checked, their features are unified (and replaced with the result of unification). UNCH and unch become CH and ch. Meanwhile, tametsi has acquired the features of tu in the CH bundle. The purpose of this mechanism is to transfer information up the tree in order to support incremental parsing of discontinuous NP constituents, but we find an additional use for this below.</Paragraph> <Paragraph position="9"> We make one change here to the unification of feature bundles as described by Sayeed and Szpakowicz (2004): when we replace feature bundles with the result of unification, we replace them with the features of the entire path with which we are checking. This ensures that in the process of checking, we do not &quot;hide&quot; features that are further on in the path. So tametsi also gains the gender, person, and case features. This is actually quite a logical extension of the idea we expressed in Sayeed and Szpakowicz (2004) that a feature being checked with a feature further down a path should be compatible with all the previous features on the path. In both cases, the system should reflect the idea that features further down a path are dependent on the checking status of previous features. As with unification in general, compatibility means lack of a conflict in t : ph pairs (i.e., no case conflicts, and so on). Now, as per 6, we need to shift all the remaining words into the buffer before we get a compatible set.</Paragraph> <Paragraph position="10"> So we need to determine lexical entries for all of the remaining words. First, scio:</Paragraph> <Paragraph position="12"> We once again use a feature path. In this case, it means that scio (&quot;know&quot;) must have a wh-phrase complement2 before it is ready to be checked by something that takes a VP complement (such as a complementizer). So this leads us to an entry for quam: (17) quam: {UNCH?(Disc:Focus), UNCH(Type:V) - unch(Wh:0)} For quam, we also have an optional Focus feature, because it is the head of a CP as tametsi is above. (We might have other optional discourse features there, but they would be superfluous for this discussion.) And, like tametsi, it has a feature that allows it to take a VP complement. Checking this feature releases the wh-feature that allows it to become the complement of scio.</Paragraph> <Paragraph position="13"> Now we only need entries for sis and curiosus</Paragraph> <Paragraph position="15"> We use an optional feature for the requirement of a nominative subject on sis, subjects being optional in Latin. However, we do require it to take an accusative object. We are able to shift everything as we did prior to 6.</Paragraph> <Paragraph position="17"> Now sis and curiosus can merge. The resulting merger between compatible unch and UNCH features, by Sayeed and Szpakowicz (2004), also causes the contents of those feature bundles to be unified.</Paragraph> <Paragraph position="19"> Now that the left feature on the feature path on sis is checked, the verb type feature is free. It can check with the corresponding feature on quam.</Paragraph> <Paragraph position="21"> Feature paths allow quam to merge with scio as in 8.</Paragraph> <Paragraph position="23"> And, lastly, scio merges with the CP headed by tametsi.</Paragraph> <Paragraph position="25"> We now have a single tree, but we are in the predicament of 9. We need to be able to move sis to a position where it commands tu. And that means moving it to join with tametsi.</Paragraph> <Paragraph position="26"> In Sayeed and Szpakowicz (2004), we proposed a mechanism by which adjuncts displaced from discontinuous NPs could reunite with their NPs even if the NP had already been merged as a constituent of a verb. This was by allowing adjuncts to merge with the verb if the verb had a compatible CH feature (without actually checking the adjunct feature bundle). A CH feature advertises that the verb had previously merged with a compatible noun, since unification would have given the noun's features to the CH feature bundle.</Paragraph> <Paragraph position="27"> In this case, tametsi does have a CH feature bundle that appears compatible with sis, but UNCH features are not features that cause adjunctions in our system. We propose a minimal stipulation that will solve this problem: (24) UNCH features (i.e., features that indicate a requirement for a constituent) can be moved or merged to meet compatible CH features.</Paragraph> <Paragraph position="28"> The main problem with 24 is the possibility that unnecessary movements caused by UNCH features may occur in such a way that the UNCH feature would be moved out of the way of compatible unch features.</Paragraph> <Paragraph position="29"> But this is likely not a problem. Our system prefers to exhaust all possible movements before mergers in parsing. So, if an UNCH feature had been in the tree, and an unch feature is introduced later at the root (as specified in Sayeed and Szpakowicz (2004)), the constituent containing the UNCH feature would immediately have moved to claim it.</Paragraph> <Paragraph position="30"> Then if a compatible CH feature arrived, it would not matter, since the UNCH feature would itself have been checked. But if a compatible CH feature had been in the tree before the compatible unch feature had joined, what then? The constituent containing the UNCH feature would move to join it. Then the unch feature would join the tree. It would still command the UNCH feature, which would move to claim it.</Paragraph> <Paragraph position="31"> There is only one unsafe case: if the CH feature arrives before the unch feature, and it is part of a head whose constituents contain a compatible unch feature on the wrong constituent, then the UNCH feature would be checked with the wrong constituent according to the mechanism above. After all, the UNCH feature would command the incorrect unch feature. This possibility, however, can only exist if there is another displaced item in the tree containing the original CH that is compatible with the UNCH feature but displaced from some other phrase. This requires further investigation into Latin grammar, as it seems unlikely that such constructions exist, given the rarity of displacement in the first place.</Paragraph> <Paragraph position="32"> So let us implement our solution:</Paragraph> <Paragraph position="34"> Note that the maximal projections move, not the heads of constituent trees. The maximal projections are the highest node containing the features, and we always take the highest node according to Sayeed and Szpakowicz (2004). Now sis commands tu. We can move tu.</Paragraph> <Paragraph position="35"> All optional unchecked features have been eliminated, and the derivation is complete.</Paragraph> </Section> </Section> class="xml-element"></Paper>