File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/c00-1025_concl.xml
Size: 3,968 bytes
Last Modified: 2025-10-06 13:52:43
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1025"> <Title>Mining Tables from Large Scale HTML Texts</Title> <Section position="6" start_page="170" end_page="171" type="concl"> <SectionTitle> 5 Presentalion of Table Extraction </SectionTitle> <Paragraph position="0"> The results of table interprctatioll arc a sequence of attributc-wfluc pairs. Consider the tour example. Table 8 shows the extracted pairs. We can find ihe following two phenomena: (I) A cell may be a vahle of lliOre \[h~tll ()tic attribute.</Paragraph> <Paragraph position="1"> (2) A cell may ael as an attribute in one case, and a value in another case.</Paragraph> <Paragraph position="2"> We can concatenate two attributes logelher by using phenomenon (1). l;or example, &quot;35,450&quot; is a value of &quot;Single Room&quot; and &quot;Economic Class&quot;, thus &quot;Single Room-Econonfic Class&quot; is formed. Besides l\[lal, we Call find attribute hierarchy by using l)hcnomcnon (2). For example, &quot;Single 1),oom&quot; is a value o1&quot; &quot;Price&quot;, and &quot;Price&quot; is a vahie of &quot;Adult&quot;, so that we can create a hierarchy &quot;Adult-Price-Single Room&quot;. Merging the results from these two phononlena, we can create the in/erl~rclations that we listed in Section 1. For example, from the two facts: &quot;35,450&quot; is a wflue of &quot;Single Room-L;conomic Class&quot;, and &quot;Adult-Price-Single Room&quot; is a hierarchical attribute, we can infer that 35,450 is a vahie o1' &quot;Adult-Price-Single Rooin-Economic Class&quot;. In this way, we can transform unstructured data into more slrtictured representatioil for fttrther applications. Consider an application in quest|O|\] al\]d answering. Giver a query like &quot;how much is the piice of a double |oom for all adult&quot;, the keywoMs are &quot;price&quot;, &quot;double room&quot;, and &quot;adult&quot;. After consulting the database learning from HTMI. lexls, two wflues, 32,500 and 1,430 with attributes economic class and extension, are reported. With this table mining technology, knowledge lhat can be employed is beyond text level.</Paragraph> <Paragraph position="3"> Conclusion in this paper, we propose a systematic way to mine tables from HTML texts. Table filtering, table recognition, table interpretation and application of table extraction are discussed. The cues l'ron\] HTML lags and information in taMe cells are employed to recognize and interpret tables. The F-measure for table recognition is 86.50%.</Paragraph> <Paragraph position="4"> There are still other spaces to improve performance. The cues from context of tables and the traversal paths of HTML pages may be also useful. In the text surrounding tables, writers usually explain the meaning of tables. For example, which row (or column) denotes what kind ol' meanings. From the description, we can know which cell may be an attribute, and along the same row (column) we can find their value cells. Besides that, the text can also show the selnantics ot' the cells. For exalnple, the table cell may be a monetary expression that denotes the price of a tour package. In this way, even money marker is not present in the table cell, we can still know it is a monetary expression.</Paragraph> <Paragraph position="5"> Note that HTML texts can be chained through hyperlinks like &quot;previous&quot; and &quot;next&quot;. The context can be expanded further. Their effects on table mining will be studied in the future. Besides the possible extensions, another research line that can be considered is to set up a corpus for evaluation o1' attribute-value relationship. Because the role of a cell (attribute or value) is relative to other cells, to develop answering keys is indispensable for table interpretation.</Paragraph> </Section> class="xml-element"></Paper>