File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/c00-1049_concl.xml
Size: 2,804 bytes
Last Modified: 2025-10-06 13:52:43
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1049"> <Title>Layout and Language: Integrating Spatial and Linguistic Knowledge for Layout Understanding Tasks</Title> <Section position="5" start_page="336" end_page="340" type="concl"> <SectionTitle> 4 Conclusions </SectionTitle> <Paragraph position="0"> This l)aper has outlined a set of problents 1)articular to the encoding of complex docmneng eh',ments in tlat or partially marked up files. The at)l)lit:ation of ~ siml)h', language nlodel in conjunction witll algorithms sensitive to the layout chal'acteristics of the docuulent elenlents ill terms of spatial ti;atures is in'oposed as a general solution to these problems.</Paragraph> <Paragraph position="1"> The, method relies on the, persistence of the language ill which the document is written in tel'ms of the ulodel used to recognize it.</Paragraph> <Paragraph position="2"> ill the flltul'e, we intend to al)ply this approach to the implementation of a general layout analysis preprocessor. An interesting Dature of the interaction between tile language model and the 1wout of the document ix that the 1)erformance of a syst, enl ix (lilly sensitive to the quality of the language model at tile I)oints at wtfich it interacts with tile layout of tile docunlent. Consequently, a gelmral imrl)ose model built fronl a corpus of marked Ill) docmnents may be used to deternline a subset of the cohesive textblocks ill a document. Those blocks may then be used to derive more language data, possil)ly specific to the documellt, and then tim process repeated until no nlore interactions are left ambiguous.</Paragraph> <Paragraph position="3"> were then used for the creation of a simple bigram model.</Paragraph> <Paragraph position="4"> For example a paracNrapli occur. Applying ithis7 of text is gram(ng-@i-dgI 'observettion to the\] wherever the line \[br6~aks} segmetltation of a double collm~n of text will indicaEe ., - where the \].ine b~eaks 6ccur:.' Foriexample, a paragraph occur. Applying this of t'ext is \[gramrfiatical:. hlS~-r~ft:'~26i~ to the the line breaks seglnentation of a double column of text will indicate where the line breaks occur.</Paragraph> <Paragraph position="5"> !s0mdtimePS sentences may conspire to form ifais~ !positive s of rivers of white spaceiwl~ic~ appear itO separate blocks!.</Paragraph> <Paragraph position="6"> Sometimes.sentences may conspire to forra false ibbsitiV6s\]~ rivers of white space which If a higr~un model is use.d, the probal)ility that word ~,J is followed by word w' mary be expressed as; a probability as p(w' I w) and assigned a value between 0 and 1. If the probabilities are those shown in to the right then the continuation for A would be X and the contimtation point for B would be Y.</Paragraph> </Section> class="xml-element"></Paper>