首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Late fusion multi-view clustering (LFMVC) algorithms aim to integrate the base partition of each single view into a consensus partition. Base partitions can be obtained by performing kernel k-means clustering on all views. This type of method is not only computationally efficient, but also more accurate than multiple kernel k-means, and is thus widely used in the multi-view clustering context. LFMVC improves computational efficiency to the extent that the computational complexity of each iteration is reduced from O(n3) to O(n) (where n is the number of samples). However, LFMVC also limits the search space of the optimal solution, meaning that the clustering results obtained are not ideal. Accordingly, in order to obtain more information from each base partition and thus improve the clustering performance, we propose a new late fusion multi-view clustering algorithm with a computational complexity of O(n2). Experiments on several commonly used datasets demonstrate that the proposed algorithm can reach quickly convergence. Moreover, compared with other late fusion algorithms with computational complexity of O(n), the actual time consumption of the proposed algorithm does not significantly increase. At the same time, comparisons with several other state-of-the-art algorithms reveal that the proposed algorithm also obtains the best clustering performance.  相似文献   

2.
Multiple kernel clustering based on local kernel alignment has achieved outstanding clustering performance by applying local kernel alignment on each sample. However, we observe that most of existing works usually assume that each local kernel alignment has the equal contribution to clustering performance, while local kernel alignment on different sample actually has different contribution to clustering performance. Therefore this assumption could have a negative effective on clustering performance. To solve this issue, we design a multiple kernel clustering algorithm based on self-weighted local kernel alignment, which can learn a proper weight to clustering performance for each local kernel alignment. Specifically, we introduce a new optimization variable- weight-to denote the contribution of each local kernel alignment to clustering performance, and then, weight, kernel combination coefficients and cluster membership are alternately optimized under kernel alignment frame. In addition, we develop a three-step alternate iterative optimization algorithm to address the resultant optimization problem. Broad experiments on five benchmark data sets have been put into effect to evaluate the clustering performance of the proposed algorithm. The experimental results distinctly demonstrate that the proposed algorithm outperforms the typical multiple kernel clustering algorithms, which illustrates the effectiveness of the proposed algorithm.  相似文献   

3.
Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters. A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets. This paper focuses on cluster analysis based on neutrosophic set implication, i.e., a k-means algorithm with a threshold-based clustering technique. This algorithm addresses the shortcomings of the k-means clustering algorithm by overcoming the limitations of the threshold-based clustering algorithm. To evaluate the validity of the proposed method, several validity measures and validity indices are applied to the Iris dataset (from the University of California, Irvine, Machine Learning Repository) along with k-means and threshold-based clustering algorithms. The proposed method results in more segregated datasets with compacted clusters, thus achieving higher validity indices. The method also eliminates the limitations of threshold-based clustering algorithm and validates measures and respective indices along with k-means and thresholdbased clustering algorithms.  相似文献   

4.
田源  王洪涛 《计量学报》2016,37(6):582-586
为了提高图像边缘特征提取质量,采取了量子核聚类算法。首先把像素映射量子编码,在码元建立域内对像素块进行随机采样;然后通过聚类距离计算数据点和每一个聚类核心的距离,把数据向量分配到距离最小的核心向量中,核函数确定有效影响范围;最后对像素聚类相异性分析,给出了算法流程。实验仿真显示这种算法对图像边缘特征提取轮廓清晰,连贯性好,评价指标MS和聚类准确率较好,算法收敛快。  相似文献   

5.
Learning from imbalanced data is one of the greatest challenging problems in binary classification, and this problem has gained more importance in recent years. When the class distribution is imbalanced, classical machine learning algorithms tend to move strongly towards the majority class and disregard the minority. Therefore, the accuracy may be high, but the model cannot recognize data instances in the minority class to classify them, leading to many misclassifications. Different methods have been proposed in the literature to handle the imbalance problem, but most are complicated and tend to simulate unnecessary noise. In this paper, we propose a simple oversampling method based on Multivariate Gaussian distribution and K-means clustering, called GK-Means. The new method aims to avoid generating noise and control imbalances between and within classes. Various experiments have been carried out with six classifiers and four oversampling methods. Experimental results on different imbalanced datasets show that the proposed GK-Means outperforms other oversampling methods and improves classification performance as measured by F1-score and Accuracy.  相似文献   

6.
Medical image segmentation is a preliminary stage of inclusion in identification tools. The correct segmentation of brain Magnetic Resonance Imaging (MRI) images is crucial for an accurate detection of the disease diagnosis. Due to in‐homogeneity, low distinction and noise the segmentation of the brain MRI images is treated as the most challenging task. In this article, we proposed hybrid segmentation, by combining the clustering methods with Hidden Markov Random Field (HMRF) technique. This aims to decrease the computational load and improves the runtime of segmentation method, as MRF methodology is used in post‐processing the images. Its evaluation has performed on real imaging data, resulting in the classification of brain tissues with dice similarity metric. These results indicate the improvement in performance of the proposed method with various noise levels, compared with existing algorithms. In implementation, selection of clustering method provides better results in the segmentation of MRI brain images.  相似文献   

7.
In order to improve performance and robustness of clustering, it is proposed to generate and aggregate a number of primary clusters via clustering ensemble technique. Fuzzy clustering ensemble approaches attempt to improve the performance of fuzzy clustering tasks. However, in these approaches, cluster (or clustering) reliability has not paid much attention to. Ignoring cluster (or clustering) reliability makes these approaches weak in dealing with low-quality base clustering methods. In this paper, we have utilized cluster unreliability estimation and local weighting strategy to propose a new fuzzy clustering ensemble method which has introduced Reliability Based weighted co-association matrix Fuzzy C-Means (RBFCM), Reliability Based Graph Partitioning (RBGP) and Reliability Based Hyper Clustering (RBHC) as three new fuzzy clustering consensus functions. Our fuzzy clustering ensemble approach works based on fuzzy cluster unreliability estimation. Cluster unreliability is estimated according to an entropic criterion using the cluster labels in the entire ensemble. To do so, the new metric is defined to estimate the fuzzy cluster unreliability; then, the reliability value of any cluster is determined using a Reliability Driven Cluster Indicator (RDCI). The time complexities of RBHC and RBGP are linearly proportional with the number of data objects. Performance and robustness of the proposed method are experimentally evaluated for some benchmark datasets. The experimental results demonstrate efficiency and suitability of the proposed method.  相似文献   

8.
Searching on the Web has never been an easy task. Even if semantic information is successfully inferred from a user query, how can we benefit from it? The most popular remedy today is to categorize the Web in advance. By gathering similar Web resources into a group, the search performance should increase even though search engines still have little idea about the semantics part. To categorize a set of Web resources according to meta-information associated with them, at first, one has to analyze the relationships between meta-information and Web resources. However, the result will be severely affected by the ambiguous nature of the Web. As a result, the goal of this research is to propose a new labeling method to enhance both the efficiency and accuracy of Web resources categorization.  相似文献   

9.
In many real-world optimization problems, the underlying objective and constraint function(s) are evaluated using computationally expensive iterative simulations such as the solvers for computational electro-magnetics, computational fluid dynamics, the finite element method, etc. The default practice is to run such simulations until convergence using termination criteria, such as maximum number of iterations, residual error thresholds or limits on computational time, to estimate the performance of a given design. This information is used to build computationally cheap approximations/surrogates which are subsequently used during the course of optimization in lieu of the actual simulations. However, it is possible to exploit information on pre-converged solutions if one has the control to abort simulations at various stages of convergence. This would mean access to various performance estimates in lower fidelities. Surrogate assisted optimization methods have rarely been used to deal with such classes of problem, where estimates at various levels of fidelity are available. In this article, a multiple surrogate assisted optimization approach is presented, where solutions are evaluated at various levels of fidelity during the course of the search. For any solution under consideration, the choice to evaluate it at an appropriate fidelity level is derived from neighbourhood information, i.e. rank correlations between performance at different fidelity levels and the highest fidelity level of the neighbouring solutions. Moreover, multiple types of surrogates are used to gain a competitive edge. The performance of the approach is illustrated using a simple 1D unconstrained analytical test function. Thereafter, the performance is further assessed using three 10D and three 20D test problems, and finally a practical design problem involving drag minimization of an unmanned underwater vehicle. The numerical experiments clearly demonstrate the benefits of the proposed approach for such classes of problem.  相似文献   

10.
Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms. In supervised learning, dealing with the problem of class imbalance is still considered to be a challenging research problem. Various machine learning techniques are designed to operate on balanced datasets; therefore, the state of the art, different under-sampling, over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets, but highly skewed datasets still pose the problem of generalization and noise generation during resampling. To over-come these problems, this paper proposes a majority clustering model for classification of imbalanced datasets known as MCBC-SMOTE (Majority Clustering for balanced Classification-SMOTE). The model provides a method to convert the problem of binary classification into a multi-class problem. In the proposed algorithm, the number of clusters for the majority class is calculated using the elbow method and the minority class is over-sampled as an average of clustered majority classes to generate a symmetrical class distribution. The proposed technique is cost-effective, reduces the problem of noise generation and successfully disables the imbalances present in between and within classes. The results of the evaluations on diverse real datasets proved to provide better classification results as compared to state of the art existing methodologies based on several performance metrics.  相似文献   

11.
鉴于词语知识对提高文本聚类性能的价值,提出了一种用线性插值方式把词典词语之间的量化关系和余弦相似度结合起来的文本相似度计算方法.在实现文本聚类之前,基于词典中一个词条和其释义在语义上等价的假设,构建出词条和释义中词语之间的量化关系,并把这种量化关系值作为文本聚类用到的知识.在k-均值聚类算法的框架下,这种以线性插值方式构造的新的相似度,给文本聚类系统性能带来了明显的提高.实验结果说明从词典中获取的词语量化关系对将来的文本聚类研究可能会有潜在的贡献.  相似文献   

12.
Spatial dependence into environmental data is an influential criterion in clustering processes, as the resulting clustering outputs depend very much upon such spatial structure. As classical methods do not take spatial dependence in consideration, the inclusion of this structure produces unexpected but more realistic results and clusters of curves that may not be similar in shape or behavior. In this paper, clustering is made using the KMSCFD algorithm for spatially correlated functional data. The methodology was developed through weighting the distance matrix between the curves with the trace-variogram calculated with the coefficients of the basis functions resulting from a data smoothing operation. For the validation of the method, a number of simulated scenarios were tested together with an application to Normalized Difference Vegetation Index data derived from a high elevation ecosystem in the Ecuadorian Andes. Quality indices are implemented to obtain the appropriate number of clusters. The analysis showed five different regions that were latitudinally distributed.  相似文献   

13.
模块划分是复杂产品模块化设计的基础,划分的优劣直接关系到客户定制产品设计、制造和装配的效率。针对现有的研究在模块划分方案形成以后缺乏对模块优劣的合理评价,该文通过引入D-S证据理论,从装配复杂性、可制造性、模块性、稳定性、造型结果与体积紧凑性6个指标对模块创建方案的满意度进行不确定和不完全性的综合评判,提出了一种双层模块化创建方案评价方法。该方法符合人的思维判断过程,具有一定的灵活性、有效性和合理性。  相似文献   

14.
Multi-objective scheduling problems: Determination of pruned Pareto sets   总被引:1,自引:0,他引:1  
There are often multiple competing objectives for industrial scheduling and production planning problems. Two practical methods are presented to efficiently identify promising solutions from among a Pareto optimal set for multi-objective scheduling problems. Generally, multi-objective optimization problems can be solved by combining the objectives into a single objective using equivalent cost conversions, utility theory, etc., or by determination of a Pareto optimal set. Pareto optimal sets or representative subsets can be found by using a multi-objective genetic algorithm or by other means. Then, in practice, the decision maker ultimately has to select one solution from this set for system implementation. However, the Pareto optimal set is often large and cumbersome, making the post-Pareto analysis phase potentially difficult, especially as the number of objectives increase. Our research involves the post Pareto analysis phase, and two methods are presented to filter the Pareto optimal set to determine a subset of promising or desirable solutions. The first method is pruning using non-numerical objective function ranking preferences. The second approach involves pruning by using data clustering. The k-means algorithm is used to find clusters of similar solutions in the Pareto optimal set. The clustered data allows the decision maker to have just k general solutions from which to choose. These methods are general, and they are demonstrated using two multi-objective problems involving the scheduling of the bottleneck operation of a printed wiring board manufacturing line and a more general scheduling problem.  相似文献   

15.
一种改进的k-means文档聚类初值选择算法   总被引:9,自引:0,他引:9  
提出了一种改进的基于最小最大原则的k-means文档聚类初始值选择算法.该方法首先构造相似度矩阵,然后利用最小最大原则对相似度矩阵进行分析,从而选择初始聚点并自动确定聚类k值.实验结果表明利用该方法找到的k值比较接近真实值.  相似文献   

16.
Epigenetics is the study of phenotypic variations that do not alter DNA sequences. Cancer epigenetics has grown rapidly over the past few years as epigenetic alterations exist in all human cancers. One of these alterations is DNA methylation; an epigenetic process that regulates gene expression and often occurs at tumor suppressor gene loci in cancer. Therefore, studying this methylation process may shed light on different gene functions that cannot otherwise be interpreted using the changes that occur in DNA sequences. Currently, microarray technologies; such as Illumina Infinium BeadChip assays; are used to study DNA methylation at an extremely large number of varying loci. At each DNA methylation site, a beta value (β) is used to reflect the methylation intensity. Therefore, clustering this data from various types of cancers may lead to the discovery of large partitions that can help objectively classify different types of cancers as well as identify the relevant loci without user bias. This study proposed a Nested Big Data Clustering Genetic Algorithm (NBDC-GA); a novel evolutionary metaheuristic technique that can perform cluster-based feature selection based on the DNA methylation sites. The efficacy of the NBDC-GA was tested using real-world data sets retrieved from The Cancer Genome Atlas (TCGA); a cancer genomics program created by the National Cancer Institute (NCI) and the National Human Genome Research Institute. The performance of the NBDC-GA was then compared with that of a recently developed metaheuristic Immuno-Genetic Algorithm (IGA) that was tested using the same data sets. The NBDC-GA outperformed the IGA in terms of convergence performance. Furthermore, the NBDC-GA produced a more robust clustering configuration while simultaneously decreasing the dimensionality of features to a maximum of 67% and of 94.5% for individual cancer type and collective cancer, respectively. The proposed NBDC-GA was also able to identify two chromosomes with highly contrasting DNA methylations activities that were previously linked to cancer.  相似文献   

17.
说话人分段聚类的任务是将一段语音中由同一说话人发出的语音聚合起来。文中提出了一种基于交叉对数似然度(Cross Log-likelihood Ratio,CLR)和贝叶斯信息判据(Bayesian information criterion,BIC)相结合的说话人聚类算法。交叉对数似然度用于计算语音段间的相似度;而贝叶斯判据则提供了一种比较适当的停止聚类的准则,该算法结合了两种方法的优点,在无监督说话人聚类中得到了较好的应用。实验结果表明,基于交叉对数似然度和贝叶斯判据的说话人聚类方法,比单纯利用交叉对数似然度的方法准确度高。  相似文献   

18.
局部结构特征在数据分析过程中具有重要的作用.为获得简单有效的数据集局部结构化特征检测方法,本文结合重采样误差分析和传统的近邻选择方法提出了一种检测局部结构特征的方向一致性度量—粗略不相似性度量.该度量是一种优化的近邻选择方法,不仅考虑了传统的欧氏距离排序,而且考虑了局部方向结构特征.因其计算和存储复杂度小以及具有优越的结构检测性能,可应用于无监督学习形成一种层次化的子图聚类算法—RDClust,与经典聚类算法相比,其优势在于:一是计算复杂度较小,是近似线性算法;二是无需对类的形状和分布形式做任何的假设,可自动体现数据集的局部结构;三是有一个近邻参数,且该参数对结果较鲁棒.在人工和真实数据集上的实验显示了新的度量方式应用于新算法的优越性能.  相似文献   

19.
In medical imaging, segmenting brain tumor becomes a vital task, and it provides a way for early diagnosis and treatment. Manual segmentation of brain tumor in magnetic resonance (MR) images is a time‐consuming and challenging task. Hence, there is a need for a computer‐aided brain tumor segmentation approach. Using deep learning algorithms, a robust brain tumor segmentation approach is implemented by integrating convolution neural network (CNN) and multiple kernel K means clustering (MKKMC). In this proposed CNN‐MKKMC approach, classification of MR images into normal and abnormal is performed by CNN algorithm. At next, MKKMC algorithm is employed to segment the brain tumor from the abnormal brain image. The proposed CNN‐MKKMC algorithm is evaluated both visually and objectively in terms of accuracy, sensitivity, and specificity with the existing segmentation methods. The experimental results demonstrate that the proposed CNN‐MKKMC approach yields better accuracy in segmenting brain tumor with less time cost.  相似文献   

20.
Service accessibility is defined as the access of a community to the nearby site locations in a service network consisting of multiple geographically distributed service sites. Leveraging new statistical methods, this article estimates and classifies service accessibility patterns varying over a large geographic area (Georgia) and over a period of 16 years. The focus of this study is on financial services but it generally applies to any other service operation. To this end, we introduce a model-based method for clustering random time-varying functions that are spatially interdependent. The underlying clustering model is nonparametric with spatially correlated errors. We also assume that the clustering membership is a realization from a Markov random field. Under these model assumptions, we borrow information across functions corresponding to nearby spatial locations resulting in enhanced estimation accuracy of the cluster effects and of the cluster membership as shown in a simulation study. Supplementary materials including the estimation algorithm, additional maps of the data, and the C++ computer programs for analyzing the data in our case study are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号