Text Clustering Using Frequent Weighted Utility Itemsets期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Text Clustering Using Frequent Weighted Utility Itemsets

Authors:	Tram Tran Tho Thi Ngoc Le Ngoc Thanh Nguyen

Affiliation:	1. University of Information Technology, Vietnam National University, Ho Chi Minh City, Vietnam;2. Faculty of Information Technology, Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam;3. Faculty of Computer Science and Management, Wroclaw University of Science and Technology, Wroclaw, Poland

Abstract:	ABSTRACT Text clustering is an important topic in text mining. One of the most effective methods for text clustering is an approach based on frequent itemsets (FIs), and thus, there are many related algorithms that aim to improve the accuracy of text clustering. However, these do not focus on the weights of terms in documents, even though the frequency of each term in each document has a great impact on the results. In this work, we propose a new method for text clustering based on frequent weighted utility itemsets (FWUI). First, we calculate the Term Frequency (TF) for each term in documents to create a weight matrix for all documents. The weights of terms in documents are based on the Inverse Document Frequency. Next, we use the Modification Weighted Itemset Tidset (MWIT)-FWUI algorithm for mining FWUI from a number matrix and the weights of terms in documents. Finally, based on frequent utility itemsets, we cluster documents using the MC (Maximum Capturing) algorithm. The proposed method has been evaluated on three data sets consisting of 1,600 documents covering 16 topics. The experimental results show that our method, using FWUI, improves the accuracy of the text clustering compared to methods using FIs.

Keywords:	Frequent itemsets frequent weighted utility itemsets quantitative databases text clustering weight of terms

设为首页 | 免责声明 | 关于勤云 | 加入收藏