首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sparse kernel spectral clustering models for large-scale data analysis   总被引:1,自引:0,他引:1  
Kernel spectral clustering has been formulated within a primal-dual optimization setting allowing natural extensions to out-of-sample data together with model selection in a learning framework. This becomes important for predictive purposes and for good generalization capabilities. The clustering model is formulated in the primal in terms of mappings to high-dimensional feature spaces typical of support vector machines and kernel-based methodologies. The dual problem corresponds to an eigenvalue decomposition of a centered Laplacian matrix derived from pairwise similarities within the data. The out-of-sample extension can also be used to introduce sparsity and to reduce the computational complexity of the resulting eigenvalue problem. In this paper, we propose several methods to obtain sparse and highly sparse kernel spectral clustering models. The proposed approaches are based on structural properties of the solutions when the clusters are well formed. Experimental results with difficult toy examples and images show the applicability of the proposed sparse models with predictive capabilities.  相似文献   

2.
This paper presents an analysis of some regularization aspects in continuous-time model identification. The study particulary focuses on linear filter methods and shows that filtering the data before estimating their derivatives corresponds to a regularized signal derivative estimation by minimizing a compound criterion whose expression is given explicitly. A new structure based on a null phase filter corresponding to a true regularization filter is proposed and allows to discuss the filter phase effects on parameter estimation by comparing its performances with those of the Poisson filter-based methods. Based on this analysis, a formulation of continuous-time model identification as a joint system input-output signal and model parameter estimation is suggested. In this framework, two linear filter methods are interpreted and a compound criterion is proposed in which the regularization is ensured by a model fitting measure, resulting in a new regularization filter structure for signal estimation.  相似文献   

3.
如何有效地约简频繁项集的数量是目前数据挖掘研究的热点。对频繁项集进行聚类是该问题的解决方法之一。由于生成子是全体频繁项集的无损精简表示,故对生成子进行聚类与对全体频繁项集进行聚类具有相同的效果。提出了一种基于生成子的频繁项集聚类算法。首先,利用最小描述长度原理,讨论了选择生成子进行聚类的合理性;其次,给出了生成子的剪枝策略及挖掘算法;最后,在一种新的项集相似性的度量标准的基础上,给生成子的聚类算法。实验结果表明,该方法可有效地减少项集的数量,并具有较高的挖掘效率。  相似文献   

4.
Rooted at the exponential possibility model recently developed by Tanaka and his colleagues, a new clustering criterion or concept is introduced and a possibility theoretic clustering algorithm is proposed. The new algorithm is characterized by a novel formulation and is distinctive in determining an appropriate number of clusters for a given dataset while obtaining a quality clustering result. The proposed algorithm can be easily implemented using an alternative minimization iterative procedure and its parameters can be effectively initialized by the Parzen window technique and Yager’s probability–possibility transformation. Our experimental results demonstrate its success in artificial datasets and large image segmentation. In order to reduce the complexity of large image segmentation, we propose to integrate the new clustering algorithm with a biased sampling procedure based on Epanechnikov kernel functions. As demonstrated by the preliminary experimental results, the possibility theoretic clustering is effective in image segmentation and its integration with a biased sampling procedure offers an attractive framework of large image processing.  相似文献   

5.
混合模型成份数估计是医学图像聚类分析和密度估计的关键。针对基于信息准则的佑计方法存在过拟合问题,提出了一种新的基于高斯混合模型特征函数的估计方法。首先定义医学图像高斯混合模型的特征函数,然后构造了一个基于特征函数的混合模型成份佑计准则,最后设计了该准则的实现算法。新的估计方法通过选择合适的参数调控对数特征函数,让惩罚函数起到平衡作用。模拟数据和真实数据实验表明,此方法确定的混合模型的成份数K比其他经典的信息准则方法确定的更合理,避免了医学图像的过拟合问题。  相似文献   

6.
针对传统的划分聚类算法不能够发现任意形状的簇的缺点,本文引入一种能够有效反映样本间相似度的距离度量---基于路径的距离度量,并设计了新的目标准则函数,从而进一步提高算法的有效性。实验表明本文的算法能够自动确定聚类的个数,可以发现任意形状的类,对孤立点不敏感,而且具有高质量的聚类效果。  相似文献   

7.
基于增量型聚类的自动话题检测研究   总被引:1,自引:0,他引:1  
张小明  李舟军  巢文涵 《软件学报》2012,23(6):1578-1587
随着网络信息飞速的发展,收集并组织相关信息变得越来越困难.话题检测与跟踪(topic detection and tracking,简称TDT)就是为解决该问题而提出来的研究方向.话题检测是TDT中重要的研究任务之一,其主要研究内容是把讨论相同话题的故事聚类到一起.虽然话题检测已经有了多年的研究,但面对日益变化的网络信息,它具有了更大的挑战性.提出了一种基于增量型聚类的和自动话题检测方法,该方法旨在提高话题检测的效率,并且能够自动检测出文本库中话题的数量.采用改进的权重算法计算特征的权重,通过自适应地提炼具有较强的主题辨别能力的文本特征来提高文档聚类的准确率,并且在聚类过程中利用BIC来判断话题类别的数目,同时利用话题的延续性特征来预聚类文档,并以此提高话题检测的速度.基于TDT-4语料库的实验结果表明,该方法能够大幅度提高话题检测的效率和准确率.  相似文献   

8.
适用于区间数据的基于相互距离的相似性传播聚类   总被引:1,自引:0,他引:1  
谢信喜  王士同 《计算机应用》2008,28(6):1441-1443
符号聚类是对传统聚类的重要扩展,而区间数据是一类常见的符号数据。传统聚类中使用的对称性度量不一定适用于度量区间数据,且算法初始化也一直是干扰聚类的严重问题。因此,提出了一种适用于区间数据的度量--相互距离,并在此度量的基础上采用了一种全新的聚类方法--相似性传播聚类,解决了初始化干扰问题,从而得出了适用于区间数据的基于相互距离的相似性传播聚类。通过理论阐述和实验比较,说明了该算法比基于欧氏聚类的K-均值算法要好。  相似文献   

9.
Because of shrinking budgets, transportation agencies are facing severe challenges in the preservation of deteriorating pavements. There is an urgent need to develop a methodology that minimizes maintenance and rehabilitation (M&R) cost. To minimize total network M&R cost of clustering pavement segments, we propose an integer programming model similar to an uncapacitated facility location problem (UFLP) that clusters pavement segments contiguously. Based on the properties of contiguous clustered pavement segments, we have transformed the clustering problem into an equivalent network flow problem in which each possible clustering corresponds to a path in the proposed acyclic network model. Our proposed shortest-path algorithm gives an optimal clustering of segments that can be calculated in a time polynomial to the number of segments. Computational experiments indicate our proposed network model and algorithm can efficiently deal with real-world spatial clustering problems.  相似文献   

10.
11.
This paper proposes a clustering asset allocation scheme which provides better risk-adjusted portfolio performance than those obtained from traditional asset allocation approaches such as the equal weight strategy and the Markowitz minimum variance allocation. The clustering criterion used, which involves maximization of the in-sample Sharpe ratio (SR), is different from traditional clustering criteria reported in the literature. Two evolutionary methods, namely Differential Evolution and Genetic Algorithm, are employed to search for such an optimal clustering structure given a cluster number. To explore the clustering impact on the SR, the in-sample and the out-of-sample SR distributions of the portfolios are studied using bootstrapped data as well as simulated paths from the single index market model. It was found that the SR distributions of the portfolios under the clustering asset allocation structure have higher mean values and skewness but approximately the same standard deviation and kurtosis than those in the non-clustered case. Genetic Algorithm is suggested as a more efficient approach than Differential Evolution for the purpose of solving the clustering problem.  相似文献   

12.
Financial time series forecasting has become a challenge because of its long-memory, thick tails and volatility persistence. Multifractal process has recently been proposed as a new formalism for this problem. An iterative Markov-Switching Multifractal (MSM) model was introduced to the literature. It is able to capture many of the important stylized features of the financial time series, including long-memory in volatility, volatility clustering, and return outliers. The model delivers stronger performance both in- and out-of-sample than GARCH-type models in long-term forecasts. To enhance MSM’s short-term prediction accuracy, this paper proposes a support vector machine (SVM) based MSM approach which exploits MSM model to forecast volatility and SVM to model the innovations. To verify the effectiveness of the proposed approach, two stock indexes in the Chinese A-share market are chosen as the forecasting targets. Comparing with some existing state-of-the-art models, the proposed approach gives superior results. It indicates that the proposed model provides a promising alternative to financial short-term volatility prediction.  相似文献   

13.
In this paper, globally asymptotical stabilization problem for a class of planar switched nonlinear systems with an output constraint via smooth output feedback is investigated. To prevent output constraint violation, a common tangent‐type barrier Lyapunov function (tan‐BLF) is developed. Adding a power integrator approach (APIA) is revamped to systematically design state‐feedback stabilizing control laws incorporating the common tan‐BLF. Then, based on the designed state‐feedback controllers and a constructed common nonlinear observer, smooth output‐feedback controllers, which can make the system output meet the predefined constraint during operation, are proposed to deal with the globally asymptotical stabilization problem of planar switched nonlinear systems under arbitrary switchings. A numerical example is employed to verify the proposed method.  相似文献   

14.
Operations flow based similarity is an important criterion for grouping variants. Similarity coefficient for product variants with networked sequence of operations has not been considered in the literature. Previously proposed similarity coefficients, which are based on operation/assembly sequence, focused on variants with serial operations sequences where the order of processing operations is fixed; while in practice, there are many part/product variants with flexible operations sequence options. A novel similarity coefficient for part/product variants is proposed based on the networked operations sequence similarity inspired by the analysis used in the field of biology (e.g. enzymes structures comparison). An extension of the proposed coefficient is also presented with an example for illustration. A more comprehensive similarity coefficient is developed by including operations similarity and production volume criteria. The popular operations similarity coefficient, called Jaccard's similarity, is applied and extended. A new coefficient using volume similarity criterion is also developed. Part/product variants are then clustered and grouped based on the combined similarity coefficient using the average linkage clustering (ALC) algorithm. The main applications of the proposed similarity coefficient are addressed. The grouped variants are sequenced as a secondary application of the proposed similarity coefficient. The sequence obtained from the proposed approach is compared with that obtained from a developed mathematical model. The result shows the accuracy of the proposed sequencing approach and can serve as a good preliminary sequence. A case study is also provided for demonstration.  相似文献   

15.
针对拉普拉斯特征映射的新增样本点延拓问题,提出一种基于邻域信息的新增样本点延拓方法:假设新增样本点与邻域保持线性关系,使用稀疏编码方法求解线性系数,再由这些系数在低维空间重构得到新增样本点的低维表示。使用1-NN分类算法对新增样本点的低维表示进行分类,实验结果表明,与基于全局信息的稀疏编码重构方法相比,基于邻域信息的稀疏编码重构算法使用更少的时间取得更高的分类准确率,说明该方法的有效性。此外,该方法可以推广至其他非线性降维方法的新增样本点问题。  相似文献   

16.
陶志勇  刘晓芳  王和章 《计算机应用》2018,38(12):3433-3437
针对高斯混合模型(GMM)聚类算法对初始值敏感且容易陷入局部极小值的问题,利用密度峰值(DP)算法全局搜索能力强的优势,对GMM算法的初始聚类中心进行优化,提出了一种融合DP的GMM聚类算法(DP-GMMC)。首先,基于DP算法寻找聚类中心,得到混合模型的初始参数;其次,采用最大期望(EM)算法迭代估计混合模型的参数;最后,根据贝叶斯后验概率准则实现数据点的聚类。在Iris数据集下,DP-GMMC聚类准确率可达到96.67%,与传统GMM算法相比提高了33.6个百分点,解决了对初始聚类中心依赖的问题。实验结果表明,DP-GMMC对低维数据集有较好的聚类效果。  相似文献   

17.
This paper proposes a new and reliable segmentation approach based on a fusion framework for combining multiple region-based segmentation maps (with any number of regions) to provide a final improved (i.e., accurate and consistent) segmentation result. The core of this new combination model is based on a consensus (cost) function derived from the recent information Theory based variation of information criterion, proposed by Meila, and allowing to quantify the amount of information that is lost or gained in changing from one clustering to another. In this case, the resulting consensus energy-based segmentation fusion model can be efficiently optimized by exploiting an iterative steepest local energy descent strategy combined with a connectivity constraint. This new framework of segmentation combination, relying on the fusion of inaccurate, quickly and roughly calculated, spatial clustering results, emerges as an appealing alternative to the use of complex segmentation models existing nowadays. Experiments on the Berkeley Segmentation Dataset show that the proposed fusion framework compares favorably to previous techniques in terms of reliability scores.  相似文献   

18.
This paper introduces and studies a real in-port ship routing and scheduling problem faced by chemical shipping companies. We show that this problem can be modeled as a Traveling Salesman Problem with Pickups and Deliveries, Time Windows and Draft Limits (TSPPD-TWDL). We propose a mathematical formulation for the TSPPD-TWDL and suggest a solution method based on forward dynamic programming (DP) to solve the problem. A set of label extension rules are also proposed to accelerate and enhance the performance of the algorithm. Computational studies show that the label extension rules are essential to the DP-algorithm, and the proposed solution method is able to solve real-sized in-port routing and scheduling problems in chemical shipping efficiently.  相似文献   

19.
基于改进ENN2 聚类算法的多故障诊断方法   总被引:1,自引:0,他引:1  
针对可拓神经网络无法解决多故障诊断的问题,建立问题模型,将多故障诊断问题转化为多特征样本的聚类问题。从模型结构和学习算法两个方面对ENN2进行改进,提出基于改进ENN2聚类算法的多故障诊断方法,并对其参数和时间复杂度进行分析。采用工程实例对所提出的方法进行验证,结果表明,所提出的方法能够解决离线的多故障诊断问题,且得到的诊断模型可用于在线状态监控,具有较好的应用前景。  相似文献   

20.
Penalized probabilistic clustering   总被引:1,自引:0,他引:1  
Lu Z  Leen TK 《Neural computation》2007,19(6):1528-1567
While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on gaussian mixture models (GMM) of the data distribution. We express clustering preferences in a prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. The model parameters are fit with the expectation-maximization (EM) algorithm. Our model provides a flexible framework that encompasses several other semisupervised clustering models as its special cases. Experiments on artificial and real-world problems show that our model can consistently improve clustering results when pairwise relations are incorporated. The experiments also demonstrate the superiority of our model to other semisupervised clustering methods on handling noisy pairwise relations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号