首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
为了深入挖掘校园无线网络轨迹行为数据信息,采用基于密度的聚类方法对校园内用户的轨迹行为进行特征聚类。由于基于密度的聚类算法通常采用距离作为相似性度量方式,为了有效衔接此类聚类算法,先将用户相似度矩阵通过转换函数转变为距离矩阵。引入离群点检测算法,将离群点检测算法与聚类算法相结合,减少参数的输入个数,增加聚类的聚合程度。改进后的聚类算法可以有效检测出数据轨迹的异常,帮助高校通过对学生上网记录的处理找到浏览信息与大部分同学不一致的人,缩小目标范围,进行有针对性的处理。通过定性分析和实验对比验证,确定两种基于离群点检测的共享最近邻的快速搜索密度峰值聚类适用于校园无线网络行为轨迹相似度矩阵的处理,邓恩指数等聚类内部指标及整体性能优于同类算法。  相似文献   

3.
In this paper, a novel clustering method in the kernel space is proposed. It effectively integrates several existing algorithms to become an iterative clustering scheme, which can handle clusters with arbitrary shapes. In our proposed approach, a reasonable initial core for each of the cluster is estimated. This allows us to adopt a cluster growing technique, and the growing cores offer partial hints on the cluster association. Consequently, the methods used for classification, such as support vector machines (SVMs), can be useful in our approach. To obtain initial clusters effectively, the notion of the incomplete Cholesky decomposition is adopted so that the fuzzy c‐means (FCM) can be used to partition the data in a kernel defined‐like space. Then a one‐class and a multiclass soft margin SVMs are adopted to detect the data within the main distributions (the cores) of the clusters and to repartition the data into new clusters iteratively. The structure of the data set is explored by pruning the data in the low‐density region of the clusters. Then data are gradually added back to the main distributions to assure exact cluster boundaries. Unlike the ordinary SVM algorithm, whose performance relies heavily on the kernel parameters given by the user, the parameters are estimated from the data set naturally in our approach. The experimental evaluations on two synthetic data sets and four University of California Irvine real data benchmarks indicate that the proposed algorithms outperform several popular clustering algorithms, such as FCM, support vector clustering (SVC), hierarchical clustering (HC), self‐organizing maps (SOM), and non‐Euclidean norm fuzzy c‐means (NEFCM). © 2009 Wiley Periodicals, Inc.4  相似文献   

4.
Effective fuzzy c-means clustering algorithms for data clustering problems   总被引:3,自引:0,他引:3  
Clustering is a well known technique in identifying intrinsic structures and find out useful information from large amount of data. One of the most extensively used clustering techniques is the fuzzy c-means algorithm. However, computational task becomes a problem in standard objective function of fuzzy c-means due to large amount of data, measurement uncertainty in data objects. Further, the fuzzy c-means suffer to set the optimal parameters for the clustering method. Hence the goal of this paper is to produce an alternative generalization of FCM clustering techniques in order to deal with the more complicated data; called quadratic entropy based fuzzy c-means. This paper is dealing with the effective quadratic entropy fuzzy c-means using the combination of regularization function, quadratic terms, mean distance functions, and kernel distance functions. It gives a complete framework of quadratic entropy approaching for constructing effective quadratic entropy based fuzzy clustering algorithms. This paper establishes an effective way of estimating memberships and updating centers by minimizing the proposed objective functions. In order to reduce the number iterations of proposed techniques this article proposes a new algorithm to initialize the cluster centers.In order to obtain the cluster validity and choosing the number of clusters in using proposed techniques, we use silhouette method. First time, this paper segments the synthetic control chart time series directly using our proposed methods for examining the performance of methods and it shows that the proposed clustering techniques have advantages over the existing standard FCM and very recent ClusterM-k-NN in segmenting synthetic control chart time series.  相似文献   

5.
核聚类算法   总被引:112,自引:0,他引:112  
该文提出了一种用于聚类分析的核聚类方法,通过利用Mercer核,作者把输入空间的样本映射到高维特征空间后,在特征空间中进行聚类,由于经过了核函数的映射,使原来没有显现的特征突出来,从而能够更好地聚类,该核聚类方法在性能上比以典的聚类算法有较大的改进,具有更快的收敛速度以及更为准确的聚类,仿真实验的结果证实了核聚类方法的可行性和有效性。  相似文献   

6.
The process of clustering groups together data points so that intra-cluster similarity is maximized while inter-cluster similarity is minimized. Support vector clustering (SVC) is a clustering approach that can identify arbitrarily shaped cluster boundaries. The execution time of SVC depends heavily on several factors: choice of the width of a kernel function that determines a nonlinear transformation of the input data, solution of a quadratic program, and the way that the output of the quadratic program is used to produce clusters. This paper builds on our prior SVC research in two ways. First, we propose a method for identifying a kernel width value in a region where our experiments suggest that clustering structure is changing significantly. This can form the starting point for efficient exploration of the space of kernel width values. Second, we offer a technique, called cone cluster labeling, that uses the output of the quadratic program to build clusters in a novel way that avoids an important deficiency present in previous methods. Our experimental results use both two-dimensional and high-dimensional data sets.  相似文献   

7.
An improved cluster labeling method for support vector clustering   总被引:5,自引:0,他引:5  
The support vector clustering (SVC) algorithm is a recently emerged unsupervised learning method inspired by support vector machines. One key step involved in the SVC algorithm is the cluster assignment of each data point. A new cluster labeling method for SVC is developed based on some invariant topological properties of a trained kernel radius function. Benchmark results show that the proposed method outperforms previously reported labeling techniques.  相似文献   

8.
基于孤立点检测的入侵检测方法研究   总被引:3,自引:0,他引:3       下载免费PDF全文
本文提出了一种基于孤立点检测的核聚类入侵检测方法。方法的基本思想是首先将输入空间中的样本映射到高维特征空间中,并通过重新定义特征空间中数据点到聚类之间的距离来生成聚类,并根据正常类比例N来确定异常数据类别,然后再用于真实数据的检测。该方法具有更快的收敛速度以及更为准确的聚类,并且不需要用人工的或其他的方法来对训练集进行分类。实验采用了KDD99的测试数据,结果表明,该方法能够比较有效的检测入侵行为。  相似文献   

9.
模糊核聚类的自适应算法   总被引:2,自引:2,他引:2  
李侃  刘玉树 《控制与决策》2004,19(5):595-597
针对模糊聚类算法在样本特征不明显时不能取得很好的聚类效果,以及现有的模糊聚类算法需要事先确定聚类数,随机性强、容易陷入局部最优等弱点,将核函数和有效性函数引入到模糊聚类中,提出了模糊核聚类的自适应算法,此方法在性能上比经典的聚类算法有了较大的改进,取得了更好的聚类效果,实验结果证实了该方法的有效性和可行性.  相似文献   

10.
针对现有基于距离的离群点检测算法在处理大规模数据时效率低的问题,提出一种基于聚类和索引的分布式离群点检测(DODCI) 算法。首先利用聚类方法将大数据集划分成簇;然后在分布式环境中的各节点处并行创建各个簇的索引;最后使用两个优化策略和两条剪枝规则以循环的方式在各节点处进行离群点检测。在合成数据集和整理后的KDD CUP数据集上的实验结果显示,在数据量较大时该算法比Orca和iDOoR算法快近一个数量级。理论和实验分析表明,该算法可以有效提高大规模数据中离群点的检测效率。  相似文献   

11.
Outlier Detection Algorithms in Data Mining Systems   总被引:6,自引:0,他引:6  
The paper discusses outlier detection algorithms used in data mining systems. Basic approaches currently used for solving this problem are considered, and their advantages and disadvantages are discussed. A new outlier detection algorithm is suggested. It is based on methods of fuzzy set theory and the use of kernel functions and possesses a number of advantages compared to the existing methods. The performance of the algorithm suggested is studied by the example of the applied problem of anomaly detection arising in computer protection systems, the so-called intrusion detection systems.  相似文献   

12.
徐鲲鹏  陈黎飞  孙浩军  王备战 《软件学报》2020,31(11):3492-3505
现有的类属型数据子空间聚类方法大多基于特征间相互独立假设,未考虑属性间存在的线性或非线性相关性.提出一种类属型数据核子空间聚类方法.首先引入原作用于连续型数据的核函数将类属型数据投影到核空间,定义了核空间中特征加权的类属型数据相似性度量.其次,基于该度量推导了类属型数据核子空间聚类目标函数,并提出一种高效求解该目标函数的优化方法.最后,定义了一种类属型数据核子空间聚类算法.该算法不仅在非线性空间中考虑了属性间的关系,而且在聚类过程中赋予每个属性衡量其与簇类相关程度的特征权重,实现了类属型属性的嵌入式特征选择.还定义了一个聚类有效性指标,以评价类属型数据聚类结果的质量.在合成数据和实际数据集上的实验结果表明,与现有子空间聚类算法相比,核子空间聚类算法可以发掘类属型属性间的非线性关系,并有效提高了聚类结果的质量.  相似文献   

13.
应用核函数度量的紧致性和分离性,给出了一种新的聚类有效性指标KKW,由KKW指标得到最优聚类数并用于修正核函数模糊聚类算法(MKFCM),由于经过了修正核函数的映射,使原来没有显现的特征突显出来。用MKFCM对Wine和glass数据集进行聚类,每一类的聚类正确度大于90%;对于缺失数据的Wisconsin Breast Cancer 数据,错分率为4.72%。该聚类方法在性能上比经典聚类算法有所改进,具有更快的收敛速度以及较高的准确度。仿真实验的结果证实了修正核聚类方法的可行性和有效性。  相似文献   

14.
王治和  王淑艳  杜辉 《计算机工程》2021,47(5):88-96,103
模糊C均值(FCM)聚类算法无法识别非凸数据,算法中基于欧式距离的相似性度量只考虑数据点之间的局部一致性特征而忽略了全局一致性特征。提出一种利用密度敏感距离度量创建相似度矩阵的FCM算法。通过近邻传播算法获取粗类数作为最佳聚类数的搜索范围上限,以解决FCM算法聚类数目需要人为预先设定和随机选定初始聚类中心造成聚类结果不稳定的问题。在此基础上,改进最大最小距离算法,得到具有代表性的样本点作为初始聚类中心,并结合轮廓系数自动确定最佳聚类数。基于UCI数据集和人工数据集的实验结果表明,相比经典FCM、K-means和CFSFDP算法,该算法不仅具有识别复杂非凸数据的能力,而且能够在保证聚类性能和稳定性的前提下加快收敛速度。  相似文献   

15.
A new data clustering algorithm Density oriented Kernelized version of Fuzzy c-means with new distance metric (DKFCM-new) is proposed. It creates noiseless clusters by identifying and assigning noise points into separate cluster. In an earlier work, Density Based Fuzzy C-Means (DOFCM) algorithm with Euclidean distance metric was proposed which only considered the distance between cluster centroid and data points. In this paper, we tried to improve the performance of DOFCM by incorporating a new distance measure that has also considered the distance variation within a cluster to regularize the distance between a data point and the cluster centroid. This paper presents the kernel version of the method. Experiments are done using two-dimensional synthetic data-sets, standard data-sets referred from previous papers like DUNN data-set, Bensaid data-set and real life high dimensional data-sets like Wisconsin Breast cancer data, Iris data. Proposed method is compared with other kernel methods, various noise resistant methods like PCM, PFCM, CFCM, NC and credal partition based clustering methods like ECM, RECM, CECM. Results shown that proposed algorithm significantly outperforms its earlier version and other competitive algorithms.  相似文献   

16.
As an exploratory approach, the clustering of fMRI time series has proved its effectiveness in analyzing the functional MRI, especially in the detection of activated regions. Due to the arbitrary distribution of fMRI time series in the temporal domain, imposing simple assumption on the data structure usually could be misleading and limit the detector's performance. Therefore, a true data-driven clustering algorithm that adapts to the data structure is preferred, and only high-level control over the clustering procedure is desired. Support vector clustering (SVC) is a suitable one in some extent because of its advantages, such as no cluster shape restriction, no need to explicitly specify the number of clusters, and the mechanism in outlier elimination. In this work, we propose an extension of the SVC to step further toward a data-sensitive detector. This approach is named as ellipsoidal support vector clustering (ESVC). To be robust to noise, the clustering is performed on features extracted from the fMRI time series via Fourier transform. Experimental results on simulated and real data sets demonstrate the effectiveness of incorporating data structure in clustering fMRI time series.  相似文献   

17.
张悦  刘杰  李航 《计算机工程》2013,39(3):46-50,55
现有孤立点检测方法大多数都需要预先设定孤立点个数,若设定不准确将降低孤立点检测的准确性。针对该问题,提出一种基于概率的孤立点检测方法。结合基于密度的DBSCAN算法与中位数求方差的方法,对待检测数据集进行聚类,提取出不包含在任何聚类中的可疑孤立点并进行分析,从而确定最终孤立点。该方法所检测的数据与时间因素线性无关,不必预先设定孤立点个数及聚类数,并且对噪声数据具有较强的抗干扰能力。IRIS测试数据集上的实验结果表明,该方法能够有效地识别孤立点。  相似文献   

18.
The support vector clustering (SVC) algorithm consists of two main phases: SVC training and cluster assignment. The former requires calculating Lagrange multipliers and the latter requires calculating adjacency matrix, which may cause a high computational burden for cluster analysis. To overcome these difficulties, in this paper, we present an improved SVC algorithm. In SVC training phase, an entropy-based algorithm for the problem of calculating Lagrange multipliers is proposed by means of Lagrangian duality and the Jaynes’ maximum entropy principle, which evidently reduces the time of calculating Lagrange multipliers. In cluster assignment phase, the kernel matrix is used to preliminarily classify the data points before calculating adjacency matrix, which effectively reduces the computing scale of adjacency matrix. As a result, a lot of computational savings can be achieved in the improved algorithm by exploiting the special structure in SVC problem. Validity and performance of the proposed algorithm are demonstrated by numerical experiments.  相似文献   

19.
Outlier detection is an important data mining task with many contemporary applications. Clustering based methods for outlier detection try to identify the data objects that deviate from the normal data. However, the uncertainty regarding the cluster membership of an outlier object has to be handled appropriately during the clustering process. Additionally, carrying out the clustering process on data described using categorical attributes is challenging, due to the difficulty in defining requisite methods and measures dealing with such data. Addressing these issues, a novel algorithm for clustering categorical data aimed at outlier detection is proposed here by modifying the standard \(k\)-modes algorithm. The uncertainty regarding the clustering process is addressed by considering a soft computing approach based on rough sets. Accordingly, the modified clustering algorithm incorporates the lower and upper approximation properties of rough sets. The efficacy of the proposed rough \(k\)-modes clustering algorithm for outlier detection is demonstrated using various benchmark categorical data sets.  相似文献   

20.
针对传统毒气检测系统混合检测中适用性差、检测误差率高的不足,提出基于离群模糊核聚类算法的PID毒气检测系统设计。在系统硬件设计中选择了性能更强的STM32F2X型MCU,并设计了专门用于毒气类别分析的功能模块;在软件算法和主控程序的设计中,采用了离群模糊核聚类算法提高对毒气数据的聚类分析能力,以此改善毒气检测的准确性。实验结果表明,提出的PID毒气检测系统能够识别出多种天然毒气和化学毒气,在毒气浓度的检测误差方面也能够控制在2%以内。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号