File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0402_intro.xml
Size: 3,145 bytes
Last Modified: 2025-10-06 14:03:54
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0402"> <Title>Control Strategies for Parsing with Freer Word-Order Languages</Title> <Section position="4" start_page="9" end_page="10" type="intro"> <SectionTitle> 2 FWO Parsing as Search within a </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="9" end_page="9" type="sub_section"> <SectionTitle> Powerset Lattice </SectionTitle> <Paragraph position="0"> A standard chart-parser views constituents as extending over spans, contiguous intervals of a linear string. In FWO parsing, constituents partition the input into not necessarily contiguous subsequences, which can be thought of as bit vectors whose AND is 0 and whose OR is BE A0BD, given an initial D2-length input string. For readability, and to avoid making an arbitrary choice as to whether the leftmost word should correspond to the most significant or least significant bit, we will refer to these constituents as subsets of CUBDBMBMBMD2CV rather than as D2-length bit vectors. For simplicity and because of our heightened awareness of the importance of goal-directedness to FWO parsing (see the discussion in the previous section), we will only outline the strictly top-down variant of our strategy, although natural analogues do exist for the other orientations.</Paragraph> </Section> <Section position="2" start_page="9" end_page="10" type="sub_section"> <SectionTitle> 2.1 State </SectionTitle> <Paragraph position="0"> State is: CWC6BNBVCPD2BUCEBNCACTD5BUCECX.</Paragraph> <Paragraph position="1"> The returned result is: UsedBV or failure.</Paragraph> <Paragraph position="2"> convention. To our knowledge, the first to apply it to the order of RHS categories, which only makes sense once one drops the implicit linear ordering implied by the RHSs of context-free grammar rules, was Daniels and Meurers (2002). Following Penn and Haji-Abdolhosseini (2003), we can characterize a search state under these assumptions using one non-terminal, C6,and two subsets/bit vectors, the CanBV and ReqBV.</Paragraph> <Paragraph position="3"> CanBV is the set of all words that can be used to build an C6, and ReqBV is the set of all words that must be used while building the C6.CanBV always contains ReqBV, and what it additionally contains are optional words that may or may not be used. If search from this state is successful, i.e., C6 is found using ReqBV and nothing that is not in CanBV, then it returns a UsedBV,the subset of words that were actually used. We will assume here that our FWO grammars are not so free that one word can be used in the derivation of two or more sibling constituents, although there is clearly a generalization to this case.</Paragraph> </Section> <Section position="3" start_page="10" end_page="10" type="sub_section"> <SectionTitle> 2.2 Process </SectionTitle> <Paragraph position="0"> Search(CWC6BNBVBNCACX) can then be defined in the constraint solver as follows: A top-down parse of an D2-length string begins with the state consisting of the distinguished category, CB, of the grammar, and BVCPD2BUCE BP</Paragraph> </Section> </Section> class="xml-element"></Paper>