首页 | 本学科首页   官方微博 | 高级检索  
     


A weighted multivariate Fuzzy C-Means method in interval-valued scientific production data
Affiliation:1. Department of Electronics Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region;2. Department of Computer Science and Technology, Soochow University, Suzhou 215006, China;1. Chongqing Key Lab of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing, China;2. Graduate Telecommunications and Networking Program, University of Pittsburgh, PA, USA;3. China Internet Research Lab, China Science and Technology Network, Computer Network Information Center, Chinese Academy of Sciences, Beijing, China;4. Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China;1. Department of Business Administration, Lunghwa University of Science and Technology, Taiwan;2. Department of Finance, MingDao University, Taiwan;3. Business School, the University of Nottingham, United Kingdom;1. Fraunhofer INT, Appelsgarten 2, D-53879 Euskirchen, Germany;2. Ghent University, Faculty of Economics and Business Administration, Tweekerkenstraat 2, B-9000 Gent, Belgium;1. Department of Economics, National Chung Cheng University, Taiwan;2. Department of International Trade, Kun Shan University, Taiwan
Abstract:Clustering is the process of organizing objects into groups whose members are similar in some way. Most of the clustering methods involve numeric data only. However, this representation may not be adequate to model complex information which may be: histogram, distributions, intervals. To deal with these types of data, Symbolic Data Analysis (SDA) was developed. In multivariate data analysis, it is common some variables be more or less relevant than others and less relevant variables can mask the cluster structure. This work proposes a clustering method based on fuzzy approach that produces weighted multivariate memberships for interval-valued data. These memberships can change at each iteration of the algorithm and they are different from one variable to another and from one cluster to another. Furthermore, there is a different relevance weight associated to each variable that may also be different from one cluster to another. The advantage of this method is that it is robust to ambiguous cluster membership assignment since weights represent how important the different variables are to the clusters. Experiments are performed with synthetic data sets to compare the performance of the proposed method against other methods already established by the clustering literature. Also, an application with interval-valued scientific production data is presented in this work. Clustering quality results have shown that the proposed method offers higher accuracy when variables have different variabilities.
Keywords:Clustering  Symbolic Data Analysis  Weighted multivariate membership  Scientific production data
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号