共查询到19条相似文献,搜索用时 62 毫秒
1.
个性化数据聚类的研究 总被引:3,自引:0,他引:3
产生无关的聚类结果,不能满足用户的实际需要,这是当前许多聚类算法存在的一个缺点。为此,该文提出根据用户实际需要进行自主聚类的个性化聚类的概念。其方法是,引入用户因素,通过设计频度向量、关心度向量来构造个性化距离函数,并与现有的著名聚类算法结合起来,给出了个性化聚类算法的一个实例描述,可以较好地解决存在的问题。 相似文献
2.
3.
属性——统计混合聚类算法研究 总被引:2,自引:0,他引:2
对属性———统计混合聚类算法进行研究。在属性均值聚类算法和Woodbury算法的基础上,对目标泛函进行改进,提出属性———统计混合聚类算法。文章证明了属性均值聚类算法和模糊C均值聚类算法(FCM)分别是属性-统计混合聚类算法的一个特例。 相似文献
4.
基于聚类算法的个性化搜索研究 总被引:1,自引:0,他引:1
搜索引擎的出现使得用户从信息爆炸性增长的互联网上获取所需的信息成为可能,个性化搜索引擎的研究使搜索结果尽可能满足不同用户的信息需求。文中提出了一种基于改进的DBSCAN算法的个性化搜索方法,在全文搜索包lucene与开源搜索引擎Nutch的基础上,实验证明该方法改善了聚类的结果,提高了用户搜索的准确率。 相似文献
5.
聚类算法是数据挖掘中的重要方法,针对现有适用类属性和混合型属性的数据集聚类算法如k-modes算法、k-prototypes算法和模糊k-prototypes算法等的不足,提出一种新的方法——类属性分解法。这种方法有更高的稳定性和可靠性,并能有效地减少随机性。 相似文献
6.
7.
8.
类属型数据广泛分布于生物信息学等许多应用领域,其离散取值的特点使得类属数据聚类成为统计机器学习领域一项困难的任务.当前的主流方法依赖于类属属性的模进行聚类优化和相关属性的权重计算.提出一种非模的类属型数据统计聚类方法.首先,基于新定义的相异度度量,推导了属性加权的类属数据聚类目标函数.该函数以对象与簇之间的平均距离为基础,从而避免了现有方法以模为中心导致的问题.其次,定义了一种类属型数据的软子空间聚类算法.该算法在聚类过程中根据属性取值的总体分布,而不仅限于属性的模,赋予每个属性衡量其与簇类相关程度的权重,实现自动的特征选择.在合成数据和实际应用数据集上的实验结果表明,与现有的基于模的聚类算法和基于蒙特卡罗优化的其他非模算法相比,该算法有效地提高了聚类结果的质量. 相似文献
9.
针对空间对象的多属性特点,将对象的地理空间位置属性和非空间属性结合纳入相似度衡量,使聚类结果更具有客观性。 相似文献
10.
针对类属型数据聚类中对象间距离函数定义的困难问题,提出一种基于贝叶斯概率估计的类属数据聚类算法。首先,提出一种属性加权的概率模型,在这个模型中每个类属属性被赋予一个反映其重要性的权重;其次,经过贝叶斯公式的变换,定义了基于最大似然估计的聚类优化目标函数,并提出了一种基于划分的聚类算法,该算法不再依赖于对象间的距离,而是根据对象与数据集划分间的加权似然进行聚类;第三,推导了计算属性权重的表达式,得出了类属型属性权重与其符号分布的信息熵成反比的结论。在实际数据和合成数据集上进行了实验,结果表明,与基于距离的现有聚类算法相比,所提算法提高了聚类精度,特别是在生物信息学数据上取得了5%~48%的提升幅度,并可以获得有实际意义的属性加权结果。 相似文献
11.
Document clustering is an intentional act that reflects individual preferences with regard to the semantic coherency and relevant categorization of documents. Hence, effective document clustering must consider individual preferences and needs to support personalization in document categorization. Most existing document-clustering techniques, generally anchoring in pure content-based analysis, generate a single set of clusters for all individuals without tailoring to individuals' preferences and thus are unable to support personalization. The partial-clustering-based personalized document-clustering approach, incorporating a target individual's partial clustering into the document-clustering process, has been proposed to facilitate personalized document clustering. However, given a collection of documents to be clustered, the individual might have categorized only a small subset of the collection into his or her personal folders. In this case, the small partial clustering would degrade the effectiveness of the existing personalized document-clustering approach for this particular individual. In response, we extend this approach and propose the collaborative-filtering-based personalized document-clustering (CFC) technique that expands the size of an individual's partial clustering by considering those of other users with similar categorization preferences. Our empirical evaluation results suggest that when given a small-sized partial clustering established by an individual, the proposed CFC technique generally achieves better clustering effectiveness for the individual than does the partial-clustering-based personalized document-clustering technique. 相似文献
12.
Clustering algorithms are routinely used in biomedical disciplines, and are a basic tool in bioinformatics. Depending on the task at hand, there are two most popular options, the central partitional techniques and the agglomerative hierarchical clustering techniques and their derivatives. These methods are well studied and well established. However, both categories have some drawbacks related to data dimensionality (for partitional algorithms) and to the bottom-up structure (for hierarchical agglomerative algorithms). To overcome these limitations, motivated by the problem of gene expression analysis with DNA microarrays, we present a hierarchical clustering algorithm based on a completely different principle, which is the analysis of shared farthest neighbors. We present a framework for clustering using ranks and indexes, and introduce the shared farthest neighbors (SFN) clustering criterion. We illustrate the properties of the method and present experimental results on different data sets, using the strategy of evaluating data clustering by extrinsic knowledge given by class labels. 相似文献
13.
Based on the molecular kinetic theory, a molecular dynamics-like data clustering approach is proposed in this paper. Clusters are extracted after data points fuse in the iterating space by the dynamical mechanism that is similar to the interacting mechanism between molecules through molecular forces. This approach is to find possible natural clusters without pre-specifying the number of clusters. Compared with 3 other clustering methods (trimmed k-means, JP algorithm and another gravitational model based method), this approach found clusters better than the other 3 methods in the experiments. 相似文献
14.
为了解决多维数据的维数过高、数据量过大带来的平行坐标可视化图形线条密集交叠以及数据规律特征不易获取的问题,提出基于主成分分析和K-means聚类的平行坐标(PCAKP,principal component analysis and k-means clustering parallel coordinate)可视化方法。该方法首先对多维数据采用主成分分析方法进行降维处理,其次对降维后的数据采用K-means聚类处理,最后对聚类得到的数据采用平行坐标可视化技术进行可视化展示。以统计局网站发布的数据为测试数据,对PCAKP可视化方法进行测试,与传统平行坐标可视化图形进行对比,验证了PCAKP可视化方法的实用性和有效性。 相似文献
15.
数据聚类是数据挖掘中的重要研究内容。现实世界中的数据往往同时具有连续属性和离散属性,但现有大多数算法局限于仅处理其中一种属性,而对另一种采取简单舍弃的办法丢失聚类信息和降低聚类质量。一些能处理混合属性的算法又往往处理的属性过多,导致计算量的大增。提出了一种基于BIRCH算法的混合属性数据的聚类算法;在UCI数据集上的实验表明,文中提出的算法具有较好的性能。 相似文献
16.
在属性测度概念的的基础上,针对模式识别问题,介绍了属性识别准侧。运用属性聚类网络方法解决了模式识别问题,别采用属性聚类算法,通过程序是此算法得以实现,并在股票价格变化趋势的预测中取得了较为成功的应用。 相似文献
17.
Micro array technologies have become a widespread research technique for biomedical researchers to assess tens of thousands of gene expression values simultaneously in a single experiment. Micro array data analysis for biological discovery requires computational tools. In this research a novel two-dimensional hierarchical clustering is presented. From the review, it is evident that the previous research works have used clustering which have been applied in gene expression data to create only one cluster for a gene that leads to biological complexity. This is mainly because of the nature of proteins and their interactions. Since proteins normally interact with different groups of proteins in order to serve different biological roles, the genes that produce these proteins are therefore expected to co express with more than one group of genes. This constructs that in micro array gene expression data, a gene may makes its presence in more than one cluster. In this research, multi-level micro array clustering, performed in two dimensions by the proposed two-dimensional hierarchical clustering technique can be used to represent the existence of genes in one or more clusters consistent with the nature of the gene and its attributes and prevent biological complexities. 相似文献
18.
19.
Abstract Clustering is concerned with grouping a collection of input objects. Conventional clustering algorithms cluster unlabelled objects. We argue that there are useful applications that involve clustering of labelled objects. We propose an approach for clustering of labelled objects. The proposed approach makes use of the domain knowledge represented in the form of a directed acyclic graph for clustering. We also propose a set of proper axioms in logic as a basis for the proposed algorithm. We study some of the properties of the approach such as order-independence and describe in detail an application of the proposed algorithm in the context of document retrieval. 相似文献