Document clustering method using dimension reduction and support vector clustering to overcome sparseness期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Document clustering method using dimension reduction and support vector clustering to overcome sparseness

Affiliation:	1. Department of Statistics, Cheongju University, 298, Daeseong-ro Sangdang-gu, Cheongju, Chungbuk 360-764, Republic of Korea;2. Graduate School of Management of Technology, Korea University, 1, 5-Ka, Anam-dong Sungbuk-ku, Seoul 136-701, Republic of Korea;3. Division of Industrial Management Engineering, Korea University, 1, 5-Ka, Anam-dong Sungbuk-ku, Seoul 136-701, Republic of Korea;1. Department of Information Management at Fortune Institute of Technology, Kaohsiung, Taiwan;2. Thecus Technology Corporation, Taiwan;3. Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan;1. Chongqing Key Lab of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing, China;2. Graduate Telecommunications and Networking Program, University of Pittsburgh, PA, USA;3. China Internet Research Lab, China Science and Technology Network, Computer Network Information Center, Chinese Academy of Sciences, Beijing, China;4. Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China;1. Department of Business Administration, Lunghwa University of Science and Technology, Taiwan;2. Department of Finance, MingDao University, Taiwan;3. Business School, the University of Nottingham, United Kingdom;1. Institute of Mathematical Sciences, Faculty of Science, University of Malaya, Lembah Pantai, Kuala Lumpur 50603, Malaysia;2. School of Mathematics and Statistics F07, The University of Sydney, NSW 2006, Australia;3. Discipline of Business Analytics, The University of Sydney, Business School, NSW 2006, Australia;1. Department of International Trade, Jinwen University of Science and Technology, No. 99, Anzhong Rd., Xindian Dist., New Taipei City 23154, Taiwan;2. Department of Industrial Management, Lunghwa University of Science and Technology, No. 300, Sec. 1, Wanshou Rd., Guishan Shiang, Taoyuan County 33306, Taiwan;3. Department of Industrial Management and Enterprise Information, Aletheia University, 32, Chen-Li Street, Tamsui, New Taipei City 251, Taiwan

Abstract:	Many studies on developing technologies have been published as articles, papers, or patents. We use and analyze these documents to find scientific and technological trends. In this paper, we consider document clustering as a method of document data analysis. In general, we have trouble analyzing documents directly because document data are not suitable for statistical and machine learning methods of analysis. Therefore, we have to transform document data into structured data for analytical purposes. For this process, we use text mining techniques. The structured data are very sparse, and hence, it is difficult to analyze them. This study proposes a new method to overcome the sparsity problem of document clustering. We build a combined clustering method using dimension reduction and K-means clustering based on support vector clustering and Silhouette measure. In particular, we attempt to overcome the sparseness in patent document clustering. To verify the efficacy of our work, we first conduct an experiment using news data from the machine learning repository of the University of California at Irvine. Second, using patent documents retrieved from the United States Patent and Trademark Office, we carry out patent clustering for technology forecasting.

Keywords:	Document clustering Sparseness problem Patent clustering Dimension reduction K-means clustering based on support vector clustering Silhouette measure
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏