首页 | 本学科首页   官方微博 | 高级检索  
     


Using clustering and dynamic mutual information for topic feature selection
Authors:Jian‐min Xu  Shu fang Wu  Jie Zhu
Affiliation:1. College of Management, Hebei University, , Baoding, 071002 China;2. Department of Information Engineering, Hebei Software Institute, , Baoding, 071000 China;3. Department of Information Management, The Central Institute for Correctional Police, , Baoding, 071000 China
Abstract:A good feature selection method should take into account both category information and high‐frequency information to select useful features that can effectively display the information of a target. Because basic mutual information (BMI) prefers low‐frequency features and ignores high‐frequency features, clustering mutual information is proposed, which is based on clustering and makes effective high‐frequency features become unique, better integrating category information and useful high‐frequency information. Time is an important factor in topic detection and tracking (TDT). In order to improve the performance of TDT, time difference is integrated into clustering mutual information to dynamically adjust the mutual information, and then another algorithm called the dynamic clustering mutual information (DCMI) is given. In order to obtain the optimal subsets to display topics information, an objective function is proposed, which is based on the idea that a good feature subset should have the smallest distance within‐class and the largest distance across‐class. Experiments on TDT4 corpora using this objective function are performed; then, comparing the performances of BMI, DCMI, and the only existed topic feature selection algorithm Incremental Term Frequency‐Inverted Document Frequency (ITF‐IDF), these performance information will be displayed by four figures. Computation time of DCMI is previously lower than BMI and ITF‐IDF. The optimal normalized‐detection performance (Cdet)norm of DCMI is decreased by 0.3044 and 0.0970 compared with those of BMI and ITF‐IDF, respectively.
Keywords:feature selection  topic  mutual information  time difference  dynamic
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号