首页 | 本学科首页   官方微博 | 高级检索  
     


Document clustering method using dimension reduction and support vector clustering to overcome sparseness
Affiliation:1. Department of Statistics, Cheongju University, 298, Daeseong-ro Sangdang-gu, Cheongju, Chungbuk 360-764, Republic of Korea;2. Graduate School of Management of Technology, Korea University, 1, 5-Ka, Anam-dong Sungbuk-ku, Seoul 136-701, Republic of Korea;3. Division of Industrial Management Engineering, Korea University, 1, 5-Ka, Anam-dong Sungbuk-ku, Seoul 136-701, Republic of Korea;1. Department of Information Management at Fortune Institute of Technology, Kaohsiung, Taiwan;2. Thecus Technology Corporation, Taiwan;3. Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan;1. Chongqing Key Lab of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing, China;2. Graduate Telecommunications and Networking Program, University of Pittsburgh, PA, USA;3. China Internet Research Lab, China Science and Technology Network, Computer Network Information Center, Chinese Academy of Sciences, Beijing, China;4. Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China;1. Department of Business Administration, Lunghwa University of Science and Technology, Taiwan;2. Department of Finance, MingDao University, Taiwan;3. Business School, the University of Nottingham, United Kingdom;1. Institute of Mathematical Sciences, Faculty of Science, University of Malaya, Lembah Pantai, Kuala Lumpur 50603, Malaysia;2. School of Mathematics and Statistics F07, The University of Sydney, NSW 2006, Australia;3. Discipline of Business Analytics, The University of Sydney, Business School, NSW 2006, Australia;1. Department of International Trade, Jinwen University of Science and Technology, No. 99, Anzhong Rd., Xindian Dist., New Taipei City 23154, Taiwan;2. Department of Industrial Management, Lunghwa University of Science and Technology, No. 300, Sec. 1, Wanshou Rd., Guishan Shiang, Taoyuan County 33306, Taiwan;3. Department of Industrial Management and Enterprise Information, Aletheia University, 32, Chen-Li Street, Tamsui, New Taipei City 251, Taiwan
Abstract:Many studies on developing technologies have been published as articles, papers, or patents. We use and analyze these documents to find scientific and technological trends. In this paper, we consider document clustering as a method of document data analysis. In general, we have trouble analyzing documents directly because document data are not suitable for statistical and machine learning methods of analysis. Therefore, we have to transform document data into structured data for analytical purposes. For this process, we use text mining techniques. The structured data are very sparse, and hence, it is difficult to analyze them. This study proposes a new method to overcome the sparsity problem of document clustering. We build a combined clustering method using dimension reduction and K-means clustering based on support vector clustering and Silhouette measure. In particular, we attempt to overcome the sparseness in patent document clustering. To verify the efficacy of our work, we first conduct an experiment using news data from the machine learning repository of the University of California at Irvine. Second, using patent documents retrieved from the United States Patent and Trademark Office, we carry out patent clustering for technology forecasting.
Keywords:Document clustering  Sparseness problem  Patent clustering  Dimension reduction  K-means clustering based on support vector clustering  Silhouette measure
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号