首页 | 本学科首页   官方微博 | 高级检索  
     


Extractive multi-document summarization using population-based multicriteria optimization
Affiliation:1. Department of Computer Science, University of Kerala, Kariavattom, Kerala, India;2. Department of Computer Science, T.K.M College of Engineering, Kollam, Kerala, India;1. Department of Industrial Engineering, Istanbul Commerce University, Küçükyal? E5 Kav?a?? ?nönü Cad. No: 4, Küçükyal? 34840, Istanbul, Turkey;2. Istanbul Medeniyet University Faculty of Engineering and Natural Sciences, Department of Industrial Engineering 34700 Üsküdar, Istanbul, Turkey;1. International Business School, Shaanxi Normal University, Xi’an 710119, China;2. School of Economics & Management, China University of Petroleum, Qingdao 266580, China;3. School of Management, Wuhan University of Technology, Wuhan 430070, China;4. Department of Electrical & Computer Engineering, University of Alberta, Alberta T6G 2R3, Canada;5. LeBow College of Business, Drexel University, Philadelphia, PA 19104, USA;6. Department of Electrical and Computer Engineering, Faculty of Engineering, King Abdulaziz University, Jeddah 21589, Saudi Arabia;7. Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland;1. Facultad de Informática, Universidad Complutense de Madrid, Spain;2. Computer Science Dpt., Universidad Católica del Maule, Chile
Abstract:Multi-document summarization is the process of extracting salient information from a set of source texts and present that information to the user in a condensed form. In this paper, we propose a multi-document summarization system which generates an extractive generic summary with maximum relevance and minimum redundancy by representing each sentence of the input document as a vector of words in Proper Noun, Noun, Verb and Adjective set. Five features, such as TF_ISF, Aggregate Cross Sentence Similarity, Title Similarity, Proper Noun and Sentence Length associated with the sentences, are extracted, and scores are assigned to sentences based on these features. Weights that can be assigned to different features may vary depending upon the nature of the document, and it is hard to discover the most appropriate weight for each feature, and this makes generation of a good summary a very tough task without human intelligence. Multi-document summarization problem is having large number of decision parameters and number of possible solutions from which most optimal summary is to be generated. Summary generated may not guarantee the essential quality and may be far from the ideal human generated summary. To address this issue, we propose a population-based multicriteria optimization method with multiple objective functions. Three objective functions are selected to determine an optimal summary, with maximum relevance, diversity, and novelty, from a global population of summaries by considering both the statistical and semantic aspects of the documents. Semantic aspects are considered by Latent Semantic Analysis (LSA) and Non Negative Matrix Factorization (NMF) techniques. Experiments have been performed on DUC 2002, DUC 2004 and DUC 2006 datasets using ROUGE tool kit. Experimental results show that our system outperforms the state of the art works in terms of Recall and Precision.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号