首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper proposes a hybrid technique for color image segmentation. First an input image is converted to the image of CIE L*a*b* color space. The color features “a” and “b” of CIE L*a*b* are then fed into fuzzy C-means (FCM) clustering which is an unsupervised method. The labels obtained from the clustering method FCM are used as a target of the supervised feed forward neural network. The network is trained by the Levenberg-Marquardt back-propagation algorithm, and evaluates its performance using mean square error and regression analysis. The main issues of clustering methods are determining the number of clusters and cluster validity measures. This paper presents a method namely co-occurrence matrix based algorithm for finding the number of clusters and silhouette index values that are used for cluster validation. The proposed method is tested on various color images obtained from the Berkeley database. The segmentation results from the proposed method are validated and the classification accuracy is evaluated by the parameters sensitivity, specificity, and accuracy.  相似文献   

2.
State-of-the-art near-duplicate video clip (NDVC) detection for novelty re-ranking uses non-semantic low-level features (color/texture) to detect and eliminate “content-based NDVC” and increases content level novelty in the top results. However, humans may perceive a video as near duplicate from a semantic perspective as well. In this paper, we propose concept-based near-duplicate video clip (CBNDVC) detection technique for novelty re-ranking. We identify “semantic NDVC”, making use of the semantic features (events/concepts) and re-rank the top results to increase the content as well as semantic novelty. Videos are represented as a multivariate time series of confidence values of relevant concepts and thereafter discovery of CBNDVC clusters is achieved by conceptual clustering. Obtained results show higher precision and recall from the user’s perspective.  相似文献   

3.
How to address the challenges of the “curse of dimensionality” and “scalability” in clustering simultaneously? In this paper, we propose arbitrarily oriented synchronized clusters (ORSC), a novel effective and efficient method for subspace clustering inspired by synchronization. Synchronization is a basic phenomenon prevalent in nature, capable of controlling even highly complex processes such as opinion formation in a group. Control of complex processes is achieved by simple operations based on interactions between objects. Relying on the weighted interaction model and iterative dynamic clustering, our approach ORSC (a) naturally detects correlation clusters in arbitrarily oriented subspaces, including arbitrarily shaped nonlinear correlation clusters. Our approach is (b) robust against noise and outliers. In contrast to previous methods, ORSC is (c) easy to parameterize, since there is no need to specify the subspace dimensionality or other difficult parameters. Instead, all interesting subspaces are detected in a fully automatic way. Finally, (d) ORSC outperforms most comparison methods in terms of runtime efficiency and is highly scalable to large and high-dimensional data sets. Extensive experiments have demonstrated the effectiveness and efficiency of our approach.  相似文献   

4.
Clustering data streams has drawn lots of attention in the last few years due to their ever-growing presence. Data streams put additional challenges on clustering such as limited time and memory and one pass clustering. Furthermore, discovering clusters with arbitrary shapes is very important in data stream applications. Data streams are infinite and evolving over time, and we do not have any knowledge about the number of clusters. In a data stream environment due to various factors, some noise appears occasionally. Density-based method is a remarkable class in clustering data streams, which has the ability to discover arbitrary shape clusters and to detect noise. Furthermore, it does not need the nmnber of clusters in advance. Due to data stream characteristics, the traditional density-based clustering is not applicable. Recently, a lot of density-based clustering algorithms are extended for data streams. The main idea in these algorithms is using density- based methods in the clustering process and at the same time overcoming the constraints, which are put out by data streanFs nature. The purpose of this paper is to shed light on some algorithms in the literature on density-based clustering over data streams. We not only summarize the main density-based clustering algorithms on data streams, discuss their uniqueness and limitations, but also explain how they address the challenges in clustering data streams. Moreover, we investigate the evaluation metrics used in validating cluster quality and measuring algorithms' performance. It is hoped that this survey will serve as a steppingstone for researchers studying data streams clustering, particularly density-based algorithms.  相似文献   

5.
The consensus clustering technique combines multiple clustering results without accessing the original data. Consensus clustering can be used to improve the robustness of clustering results or to obtain the clustering results from multiple data sources. In this paper, we propose a novel definition of the similarity between points and clusters. With an iterative process, such a definition of similarity can represent how a point should join or leave a cluster clearly, determine the number of clusters automatically, and combine partially overlapping clustering results. We also incorporate the concept of “clustering fragment” into our method for increased speed. The experimental results show that our algorithm achieves good performances on both artificial data and real data.  相似文献   

6.
Spectral clustering in multi-agent systems   总被引:2,自引:2,他引:0  
We examine the application of spectral clustering for breaking up the behavior of a multi-agent system in space and time into smaller, independent elements. We propose clustering observations of individual entities in order to identify significant changes in the parameter space (like spatial position) and detect temporal alterations of behavior within the same framework. Available knowledge of important interactions (events) between entities is also considered. We describe a novel algorithm utilizing iterative subdivisions where clusters are pre-processed at each step to counter spatial scaling, rotation, replay speed, and varying sampling frequency. A method is presented to balance spatial and temporal segmentation based on the expected group size, and a validity measure is introduced to determine the optimal number of clusters. We demonstrate our results by analyzing the outcomes of computer games and compare our algorithm to K-means and traditional spectral clustering.  相似文献   

7.
8.
In the current paper we present a method for assessing cluster stability. This method, combined with a clustering algorithm, yields an estimate of the data partition, namely, the number of clusters. We adopt the cluster stability standpoint where clusters are imagined as islands of “high” density in a sea of “low” density. Explicitly, a cluster is associated with its high density core. Our approach offers to evaluate the goodness of a cluster by the similarity amongst the entire cluster and its core. We propose to measure this resemblance by two-sample tests or by probability distances between appropriate probability distributions. The distances are calculated on clustered samples drawn from the source population according to two different distributions. The first law is the underlying set distribution. The second law is constructed so that it represents the clusters’ cores. Here, a variant of the k-nearest neighbor density estimation is applied, so that items belonging to cores have a much higher chance to be selected. As the sample distribution is unknown a distribution-free two-sample test is required to examine the mentioned correspondence. For constructing such a test, we use distance functions built on negative definite kernels. In practice, outliers in the samples and limitations of the clustering algorithm heavily contribute to the noise level. As a result of this shortcoming the distance values have to be determined for many pairs of samples and therefore an empirical distance's distribution is obtained. The distribution is dependent on the examined number of clusters. To prevent this property for biasing the results we normalize the distances. It is conjectured that the true number of clusters yields the most concentrated normalized distribution. To measure the concentration we use the sample mean and the sample 25th percentile. The paper exhibits the good performance of the proposed method on synthetic and real-world data.  相似文献   

9.
Customer segmentation based on temporal variation of subscriber preferences is useful for communication service providers (CSPs) in applications such as targeted campaign design, churn prediction, and fraud detection. Traditional clustering algorithms are inadequate in this context, as a multidimensional feature vector represents a subscriber profile at an instant of time, and grouping of subscribers needs to consider variation of subscriber preferences across time. Clustering in this case usually requires complex multivariate time series analysis‐based models. Because conventional time series clustering models have limitations around scalability and ability to accurately represent temporal behaviour sequences (TBS) of users, that may be short, noisy, and non‐stationary, we propose a latent Dirichlet allocation (LDA) based model to represent temporal behaviour of mobile subscribers as compact and interpretable profiles. Our model makes use of the structural regularity within the observable data corresponding to a large number of user profiles and relaxes the strict temporal ordering of user preferences in TBS clustering. We use mean‐shift clustering to segment subscribers based on their discovered profiles. Further, we mine segment‐specific association rules from the discovered TBS clusters, to aid marketers in designing intelligent campaigns that match segment preferences. Our experiments on real world data collected from a popular Asian communication service provider gave encouraging results.  相似文献   

10.
This article describes a multiobjective spatial fuzzy clustering algorithm for image segmentation. To obtain satisfactory segmentation performance for noisy images, the proposed method introduces the non-local spatial information derived from the image into fitness functions which respectively consider the global fuzzy compactness and fuzzy separation among the clusters. After producing the set of non-dominated solutions, the final clustering solution is chosen by a cluster validity index utilizing the non-local spatial information. Moreover, to automatically evolve the number of clusters in the proposed method, a real-coded variable string length technique is used to encode the cluster centers in the chromosomes. The proposed method is applied to synthetic and real images contaminated by noise and compared with k-means, fuzzy c-means, two fuzzy c-means clustering algorithms with spatial information and a multiobjective variable string length genetic fuzzy clustering algorithm. The experimental results show that the proposed method behaves well in evolving the number of clusters and obtaining satisfactory performance on noisy image segmentation.  相似文献   

11.
On clustering massive text and categorical data streams   总被引:4,自引:4,他引:0  
In this paper, we will study the data stream clustering problem in the context of text and categorical data domains. While the clustering problem has been studied recently for numeric data streams, the problems of text and categorical data present different challenges because of the large and un-ordered nature of the corresponding attributes. Therefore, we will propose algorithms for text and categorical data stream clustering. We will propose a condensation based approach for stream clustering which summarizes the stream into a number of fine grained cluster droplets. These summarized droplets can be used in conjunction with a variety of user queries to construct the clusters for different input parameters. Thus, this provides an online analytical processing approach to stream clustering. We also study the problem of detecting noisy and outlier records in real time. We will test the approach for a number of real and synthetic data sets, and show the effectiveness of the method over the baseline OSKM algorithm for stream clustering.  相似文献   

12.
The basic goal of scene understanding is to organize the video into sets of events and to find the associated temporal dependencies. Such systems aim to automatically interpret activities in the scene, as well as detect unusual events that could be of particular interest, such as traffic violations and unauthorized entry. The objective of this work, therefore, is to learn behaviors of multi-agent actions and interactions in a semi-supervised manner. Using tracked object trajectories, we organize similar motion trajectories into clusters using the spectral clustering technique. This set of clusters depicts the different paths/routes, i.e., the distinct events taking place at various locations in the scene. A temporal mining algorithm is used to mine interval-based frequent temporal patterns occurring in the scene. A temporal pattern indicates a set of events that are linked based on their relationship with other events in the set, and we use Allen's interval-based temporal logic to describe these relations. The resulting frequent patterns are used to generate temporal association rules, which convey the semantic information contained in the scene. Our overall aim is to generate rules that govern the dynamics of the scene and perform anomaly detection. We apply the proposed approach on two publicly available complex traffic datasets and demonstrate considerable improvements over the existing techniques.  相似文献   

13.
一种基于主题的文本聚类方法   总被引:3,自引:0,他引:3  
现有的文本聚类方法难以正确识别和描述文本的主题,从而难以实现按照主题对文本进行聚类。本文提出了一种新的基于主题的文本聚类方法: LFIC。该方法能够准确识别文本主题并根据文本的主题对其进行聚类。本方法定义和抽取了“主题元素”,并利用其进行基本类索引。同时还整合利用了语言学特征。实验表明,LFIC的聚类准确率达到94.66%,优于几种传统聚类方法。  相似文献   

14.
15.
Motion state “Motion state of a ping-pong ball consists of the flying state and spin state.” estimation and trajectory prediction of a spinning ball are two important but challenging issues for both the promotion of the next generation of robotic table tennis systems and the research on motion analysis of spinning-flying objects. Due to the Magnus force acting on the ball, the flying state “Flying state denotes the real-time translational velocity.” and spin state “Spin state denotes the real-time rotational velocity.” are coupled, which makes the accurate estimation of them a huge challenge. In this paper, we first derive the Extended Continuous Motion Model (ECMM) by clustering the trajectories into multiple categories with a K-means algorithm and fitting them respectively using Fourier series. The ECMM can easily adapt to all kinds of trajectories. Based on the ECMM, we propose a novel motion state estimation method using Expectation-Maximization (EM) algorithm, which in result contributes to an accurate trajectory prediction. In this method, the category in ECMM is treated as a latent variable, and the likelihood of motion state is formulated as a Gaussian Mixture Model (GMM) of the differences between the trajectory predictions and observations. The effectiveness and accuracy of the proposed method is verified by offline evaluation using a collected dataset, as well as online evaluation that the humanoid robotic table tennis system “Wu & Kong” successfully hits the high-speed spinning ball.  相似文献   

16.
We develop a general sequence-based clustering method by proposing new sequence representation schemes in association with Markov models. The resulting sequence representations allow for calculation of vector-based distances (dissimilarities) between Web user sessions and thus can be used as inputs of various clustering algorithms. We develop an evaluation framework in which the performances of the algorithms are compared in terms of whether the clusters (groups of Web users who follow the same Markov process) are correctly identified using a replicated clustering approach. A series of experiments is conducted to investigate whether clustering performance is affected by different sequence representations and different distance measures as well as by other factors such as number of actual Web user clusters, number of Web pages, similarity between clusters, minimum session length, number of user sessions, and number of clusters to form. A new, fuzzy ART-enhanced K-means algorithm is also developed and its superior performance is demonstrated.  相似文献   

17.
Clustering, while systematically applied in anomaly detection, has a direct impact on the accuracy of the detection methods. Existing cluster-based anomaly detection methods are mainly based on spherical shape clustering. In this paper, we focus on arbitrary shape clustering methods to increase the accuracy of the anomaly detection. However, since the main drawback of arbitrary shape clustering is its high memory complexity, we propose to summarize clusters first. For this, we design an algorithm, called Summarization based on Gaussian Mixture Model (SGMM), to summarize clusters and represent them as Gaussian Mixture Models (GMMs). After GMMs are constructed, incoming new samples are presented to the GMMs, and their membership values are calculated, based on which the new samples are labeled as “normal” or “anomaly.” Additionally, to address the issue of noise in the data, instead of labeling samples individually, they are clustered first, and then each cluster is labeled collectively. For this, we present a new approach, called Collective Probabilistic Anomaly Detection (CPAD), in which, the distance of the incoming new samples and the existing SGMMs is calculated, and then the new cluster is labeled the same as of the closest cluster. To measure the distance of two GMM-based clusters, we propose a modified version of the Kullback–Libner measure. We run several experiments to evaluate the performances of the proposed SGMM and CPAD methods and compare them against some of the well-known algorithms including ABACUS, local outlier factor (LOF), and one-class support vector machine (SVM). The performance of SGMM is compared with ABACUS using Dunn and DB metrics, and the results indicate that the SGMM performs superior in terms of summarizing clusters. Moreover, the proposed CPAD method is compared with the LOF and one-class SVM considering the performance criteria of (a) false alarm rate, (b) detection rate, and (c) memory efficiency. The experimental results show that the CPAD method is noise resilient, memory efficient, and its accuracy is higher than the other methods.  相似文献   

18.
Fuzzy c-means clustering with spatial constraints is considered as suitable algorithm for data clustering or data analyzing. But FCM has still lacks enough robustness to employ with noise data, because of its Euclidean distance measure objective function for finding the relationship between the objects. It can only be effective in clustering ‘spherical’ clusters, and it may not give reasonable clustering results for “non-compactly filled” spherical data such as “annular-shaped” data. This paper realized the drawbacks of the general fuzzy c-mean algorithm and it tries to introduce an extended Gaussian version of fuzzy C-means by replacing the Euclidean distance in the original object function of FCM. Firstly, this paper proposes initial kernel version of fuzzy c-means to aim at simplifying its computation and then extended it to extended Gaussian kernel version of fuzzy c-means. It derives an effective method to construct the membership matrix for objects, and it derives a robust method for updating centers from extended Gaussian version of fuzzy C-means. Furthermore, this paper proposes a new prototypes learning method and it obtains initial cluster centers using new mathematical initialization centers for the new effective objective function of fuzzy c-means, so that this paper tries to minimize the iteration of algorithms to obtain more accurate result. Initial experiment will be done with an artificially generated data to show how effectively the new proposed Gaussian version of fuzzy C-means works in obtaining clusters, and then the proposed methods can be implemented to cluster the Wisconsin breast cancer database into two clusters for the classes benign and malignant. To show the effective performance of proposed fuzzy c-means with new initialization of centers of clusters, this work compares the results with results of recent fuzzy c-means algorithm; in addition, it uses Silhouette method to validate the obtained clusters from breast cancer datasets.  相似文献   

19.
一种基于网格方法的高维数据流子空间聚类算法   总被引:4,自引:0,他引:4  
基于对网格聚类方法的分析,结合由底向上的网格方法和自顶向下的网格方法,设计了一个能在线处理高维数据流的子空间聚类算法。通过利用由底向上网格方法对数据的压缩能力和自顶向下网格方法处理高维数据的能力,算法能基于对数据流的一次扫描,快速识别数据中位于不同子空间内的簇。理论分析以及在多个数据集上的实验表明算法具有较高的计算精度与计算效率。  相似文献   

20.

针对传统数据流聚类算法聚类信息损失大、不准确的缺点, 提出一种基于维度最大熵的数据流聚类算法. 采用动态数据直方图将数据维度划分为不同的维度组, 计算各维度最大熵划分维度空间簇, 将相同维度簇的数据聚集成微簇, 通过比较微簇的信息熵大小及其分布特点实现数据流的异常检测. 该方法提升了聚类速度, 克服了传统数据流聚类算法信息丢失的缺点. 实验结果表明, 所提出算法能够提高数据流异常检测的准确性和有效性.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号