共查询到19条相似文献,搜索用时 46 毫秒
1.
2.
3.
随机森林是近些年发展起来的新集成学习算法,具有较好的分类准确率。针对该算法计算复杂度较高的不足,提出了一种基于谱聚类划分的随机森林算法。首先,利用聚类效果较好的谱聚类算法对原始样本集的每一类进行聚类处理。然后,在每一聚类簇中随机选取一个样本作为代表,组成新训练样本集合。最后,在新训练样本集上训练随机森林分类器。该算法通过谱聚类技术对原始样本进行了初步划分,将位置相近的多个样本用簇内的一个样本代表,较大程度地减少了训练样本的个数。在Corel Image图像识别数据集上的实验表明,算法可以用较少的分类时间达到较高的分类精度。 相似文献
4.
本文提出了一种新颖的基于非负矩阵分解的谱聚类集成SAR图像分割框架.首先,个体分割结果的产生采用基于Nystrom逼近的谱聚类方法,使用不同的尺度参数,得到具有差异性的个体分割结果;其次,使用非负矩阵分解的方法来合并这些个体分割结果,使用非负矩阵分解方法的优点在于其合乎人类大脑感知的直观体验,并具有明确的物理含义;最后,根据合并得到的像素点隶属度关系得到SAR图像分割结果.为了验证本文方法的有效性,对3幅纹理图像和4幅SAR图像进行分割实验,并对比K-means方法、基于Nystrom逼近的谱聚类方法、Meta-clustering方法,本文的方法无论是定性还是定量分析都是较好的,并具有一定的实用性. 相似文献
5.
聚类集成是数据挖掘研究的一个热点。它是利用同一数据集的多个聚类划分集成在一起,以提高聚类分析的性能。当前相关研究大多没有考虑进行集成的聚类成员的质量,因此较差的成员会对集成结果产生不良影响。文中提出了一种基于加权co-occurrence矩阵的聚类集成算法(WCSCE)。该方法首先计算出聚类成员基于属性值的co-occurrence矩阵,然后对聚类成员的质量进行简单评价并赋予权重,生成加权co-occurrence矩阵,进而产生集成结果。最后通过实验验证了该算法的有效性,并提高了聚类质量。 相似文献
6.
基于灰度直方图和谱聚类的文本图像二值化方法 总被引:7,自引:0,他引:7
在自动文本提取中,经定位获得的字符区域需二值化后方能有效识别,由于背景的复杂,常用的阈值化方法不能有效分割自然环境下的字符图像。该文提出了一种基于谱聚类的图像二值化方法,该方法利用规范化切痕(Normalized cut, Ncut)作为谱聚类测度,结合灰度直方图计算相似性矩阵,并通过实验确定最佳的直方图等级数,与通常基于像素级相似矩阵相比,算法的空间复杂度和计算复杂性都大为降低。实验结果表明,针对自然场景下的字符图像,该文方法的二值化结果优于常用的阈值分割结果。 相似文献
7.
8.
9.
一种基于随机游动的聚类算法 总被引:2,自引:0,他引:2
该文提出一种改进的随机游动模型,并在此模型的基础上,发展了一种数据聚类算法。在此算法中,数据集中的样本点根据改进的随机游动模型,生成有权无向图G(V,E,d),其中每个样本点对应图G的一个顶点,并且假设每个顶点为可以在空间中移动的Agent。随后计算每个顶点向其邻集中顶点转移的概率,在随机选定邻集中的一个顶点作为转移方向后,移动一个单位距离。在所有样本点不断随机游动的过程中,同类的样本点就会逐渐的聚集到一起,而不同类的样本点相互远离,最后使得聚类自动形成。实验结果表明,基于随机游动的聚类算法能使样本点合理有效地被聚类,同时,与其他算法对比也说明了此算法的有效性。 相似文献
10.
11.
12.
We propose a novel multi-view document clustering method with the graph-regularized concept factorization (MVCF). MVCF makes full use of multi-view features for more comprehensive understanding of the data and learns weights for each view adaptively. It also preserves the local geometrical structure of the manifolds for multi-view clustering. We have derived an efficient optimization algorithm to solve the objective function of MVCF and proven its convergence by utilizing the auxiliary function method. Experiments carried out on three benchmark datasets have demonstrated the effectiveness of MVCF in comparison to several state-of-the-art approaches in terms of accuracy, normalized mutual information and purity. 相似文献
13.
为了满足对XML文档集合进行数据挖掘需求,本文提出了根据XML文档树的语义信息和结构信息来计算其结构相似度,通过结构相似度构造其结构相似度矩阵,在此基础上应用DBSCAN算法来对XML文档集合进行聚类.与其他聚类算法相比,其聚类的速度得到了很大的提高. 相似文献
14.
15.
Ruizhang HUANG Ruina BAI Yanping CHEN Yongbin QIN Xinyu CHENG Youliang TIAN 《通信学报》2005,41(8):155-164
In response to the problems traditional multi-view document clustering methods separate the multi-view document representation from the clustering process and ignore the complementary characteristics of multi-view document clustering,an iterative algorithm for complementary multi-view document clustering——CMDC was proposed,in which the multi-view document clustering process and the multi-view feature adjustment were conducted in a mutually unified manner.In CMDC algorithm,complementary text documents were selected from the clustering results to aid adjusting the contribution of view features via learning a local measurement metric of each document view.The complementary text document of the results among the dimensionality clusters was selected by CMDC,and used to promote the feature tuning of the clusters.The partition consistency of the multi-dimensional document clustering was solved by the measure consistency of the dimensions.Experimental results show that CMDC effectively improves multi-dimensional clustering performance. 相似文献
16.
Self-organization of wireless sensor networks, which involves network decomposition into connected clusters, is a challenging task because of the limited bandwidth and energy resources available in these networks. In this paper, we make contributions towards improving the efficiency of self-organization in wireless sensor networks. We first present a novel approach for message-efficient clustering, in which nodes allocate local “growth budgets” to neighbors. We introduce two algorithms that make use of this approach. We analyze the message complexity of these algorithms and provide performance results from simulations. The algorithms produce clusters of bounded size and low diameter, using significantly fewer messages than the earlier, commonly used, Expanding Ring approach. Next, we present a new randomized methodology for designing the timers of cluster initiators. This methodology provides a probabilistic guarantee that initiators will not interfere with each other. We derive an upper bound on the expected time for network decomposition that is logarithmic in the number of nodes in the network. We also present a variant that optimistically allows more concurrency among initiators and significantly reduces the network decomposition time. However, it produces slightly more clusters than the first method. Extensive simulations over different topologies confirm the analytical results and demonstrate that our proposed methodology scales to large networks. 相似文献
17.
Using partitioning in sensor networks to create clusters for routing, data management, and for controlling communication has been proven as a way to ensure long range deployment and to deal with sensor network shortcomings such as limited energy and short communication ranges. Choosing a cluster head within each cluster is important because cluster heads use additional energy for their responsibilities and that burden needs to be carefully passed around among nodes in a cluster. Many existing protocols either choose cluster heads randomly or use nodes with the highest remaining energy. We present an Energy Constrained minimum Dominating Set based efficient clustering called ECDS to model the problem of optimally choosing cluster heads with energy constraints. Our proposed randomized distributed algorithm for the constrained dominating set runs in O(log n log Δ) rounds with high probability where Δ is the maximum degree of a node in the graph. We provide an approximation ratio for the ECDS algorithm of expected size 8HΔ∣OPT∣ and with high probability a size of O(∣OPT∣log n) where n is the number of nodes, H is the harmonic function and OPT means the optimal size. We propose multiple extensions to the distributed algorithm for the energy constrained dominating set. We experimentally show that these extensions perform well in terms of energy usage, node lifetime, and clustering time in comparison and, thus, are very suitable for wireless sensor networks. 相似文献
18.
在数据挖掘的所有算法中,聚类分析尤为重要.基于划分的聚类算法就是用统计分析的方法研究分类问题.本文介绍了聚类的定义及聚类算法的种类,详细阐述了K均值聚类算法和K中心点聚类算法的基本原理并对它们的性能进行分析,对近年来各学者对基于划分的聚类算法的研究现状进行了梳理,对其具体应用实例做了简要介绍。 相似文献
19.
Low-rank representation (LRR) and its variations have achieved great successes in subspace segmentation tasks. However, the segmentation processes of the existing LRR-related methods are all divided into two separated steps: affinity graphs construction and segmentation results obtainment. In the second step, normalize cut (Ncut) algorithm is used to get the final results based on the constructed graphs. This implies that the affinity graphs obtained by LRR-related algorithms may not be most suitable for Ncut, and the best results are not guaranteed to be achieved. In this paper, we propose a spectral clustering steered LRR representation algorithm (SCSLRR) which combines the objection functions of Ncut, K-means and LRR together. By solving a joint optimization problem, SCSLRR is able to find low-rank affinity matrices which are most beneficial for Ncut to get best segmentation results. The extensive experiments of subspace segmentation on several benchmark datasets show that SCSLRR dominates the related methods. 相似文献