File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/p05-2018_concl.xml
Size: 2,436 bytes
Last Modified: 2025-10-06 13:54:43
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-2018"> <Title>Centrality Measures in Text Mining: Prediction of Noun Phrases that Appear in Abstracts</Title> <Section position="6" start_page="106" end_page="107" type="concl"> <SectionTitle> 5 Conclusions and Future work </SectionTitle> <Paragraph position="0"> We have studied four kinds of centrality measures in order to identify prominent noun phrases in text documents. Overall, the centrality heuristic itself does not demonstrate its superiority. Among four centrality measures, degree centrality performs the best in the heuristic when the NP network is constructed at the sentence level, which indicates other centrality measures obtained from the subgraphs can not represent very well the prominence of the NPs in the global NP network. When the NP network is constructed at the document level, the differences between the centrality measures become negligible. However, networks formed at the document level overlook the connections between sentences as there is only one kind of link; on the other hand, NP networks formed at the sentence level ignore connections between sentences. We plan to extend our study to construct NP networks with weighted links. The key problem will be how to determine the weights for links between two NPs in the same sentence, in the same paragraph but different sentences, and in different paragraphs.</Paragraph> <Paragraph position="1"> We consider introducing the concept of entropy from Information Theory to solve this problem.</Paragraph> <Paragraph position="2"> In our experiments with YaDT, it seems the ways of forming NP network are not critical. We learn that, at least in this circumstance, the decision trees algorithm is more robust than the centrality heuristic. When using all features in YaDT, recall reaches 0.95, which means the decision trees find out 95% of CNPs in the abstracts from the text documents, without increasing mistakes as the precision is improved at the same time. Using all features in YaDT achieves better results than using centrality feature or frequency individually with other features implies centrality features may capture somewhat different information from the text. To make this research more robust, we will include reference resolution into our study. We will also include centrality measures as sentence features in producing extractive summaries.</Paragraph> </Section> class="xml-element"></Paper>