首页 | 本学科首页   官方微博 | 高级检索  
     


Text Clustering Using Frequent Weighted Utility Itemsets
Authors:Tram Tran  Tho Thi Ngoc Le  Ngoc Thanh Nguyen
Affiliation:1. University of Information Technology, Vietnam National University, Ho Chi Minh City, Vietnam;2. Faculty of Information Technology, Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam;3. Faculty of Computer Science and Management, Wroclaw University of Science and Technology, Wroclaw, Poland
Abstract:ABSTRACT

Text clustering is an important topic in text mining. One of the most effective methods for text clustering is an approach based on frequent itemsets (FIs), and thus, there are many related algorithms that aim to improve the accuracy of text clustering. However, these do not focus on the weights of terms in documents, even though the frequency of each term in each document has a great impact on the results. In this work, we propose a new method for text clustering based on frequent weighted utility itemsets (FWUI). First, we calculate the Term Frequency (TF) for each term in documents to create a weight matrix for all documents. The weights of terms in documents are based on the Inverse Document Frequency. Next, we use the Modification Weighted Itemset Tidset (MWIT)-FWUI algorithm for mining FWUI from a number matrix and the weights of terms in documents. Finally, based on frequent utility itemsets, we cluster documents using the MC (Maximum Capturing) algorithm. The proposed method has been evaluated on three data sets consisting of 1,600 documents covering 16 topics. The experimental results show that our method, using FWUI, improves the accuracy of the text clustering compared to methods using FIs.
Keywords:Frequent itemsets  frequent weighted utility itemsets  quantitative databases  text clustering  weight of terms
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号