首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   10篇
  完全免费   2篇
  自动化技术   12篇
  2014年   2篇
  2010年   1篇
  2009年   3篇
  2008年   1篇
  2007年   1篇
  2006年   2篇
  2005年   2篇
排序方式: 共有12条查询结果,搜索用时 140 毫秒
1.
结合限制的分隔模型及K-Means算法   总被引:7,自引:0,他引:7       下载免费PDF全文
何振峰  熊范纶 《软件学报》2005,16(5):799-809
将数据对象间的关联限制与K-means算法结合可以取得较好的效果,但由于划分是由K个中心决定的,每一类仅由一个中心决定,分隔的表示方法限制了算法效果的进一步提高.基于数据对象间的两类限制,定义了数据对象和集合间的两类关联,以及集合间的3类关联,在此基础上给出了结合限制的分隔模型.在模型中,基于集合间的正关联,多个子集中心可以用来表示同一类,使划分的表示可以更为灵活、精细.基于此模型,给出了相应的算法CKS(constrained K-meanswith subsets)来生成结合限制的分隔.对3个UCI数据集的实验结果显示:在准确率及健壮性上,CKS显著优于另一个结合关联限制的K-means类算法COP-K-means,与另一个代表性的算法CCL相比,也有相当优势;在时间代价上,CKS也有一定优势.  相似文献
2.
Semi-supervised model-based document clustering: A comparative study   总被引:3,自引:0,他引:3  
Semi-supervised learning has become an attractive methodology for improving classification models and is often viewed as using unlabeled data to aid supervised learning. However, it can also be viewed as using labeled data to help clustering, namely, semi-supervised clustering. Viewing semi-supervised learning from a clustering angle is useful in practical situations when the set of labels available in labeled data are not complete, i.e., unlabeled data contain new classes that are not present in labeled data. This paper analyzes several multinomial model-based semi-supervised document clustering methods under a principled model-based clustering framework. The framework naturally leads to a deterministic annealing extension of existing semi-supervised clustering approaches. We compare three (slightly) different semi-supervised approaches for clustering documents: Seeded damnl, Constrained damnl, and Feedback-based damnl, where damnl stands for multinomial model-based deterministic annealing algorithm. The first two are extensions of the seeded k-means and constrained k-means algorithms studied by Basu et al. (2002); the last one is motivated by Cohn et al. (2003). Through empirical experiments on text datasets, we show that: (a) deterministic annealing can often significantly improve the performance of semi-supervised clustering; (b) the constrained approach is the best when available labels are complete whereas the feedback-based approach excels when available labels are incomplete. Editor: Andrew Moore  相似文献
3.
基于互信息约束聚类的图像语义标注   总被引:2,自引:0,他引:2  
提出一种基于互信息约束聚类的图像标注算法。采用语义约束对信息瓶颈算法进行改进,并用改进的信息瓶颈算法对分割后的图像区域进行聚类,建立图像语义概念和聚类区域之间的相互关系;对未标注的图像,提出一种计算语义概念的条件概率的方法,同时考虑训练图像的先验知识和区域的低层特征,最后使用条件概率最大的语义关键字对图像区域语义自动标注。对一个包含500幅图像的图像库进行实验,结果表明,该方法比其他方法更有效。  相似文献
4.
基于贝叶斯理论的图像标注和检索   总被引:2,自引:1,他引:1       下载免费PDF全文
图像自动语义标注是基于内容图像检索中很重要且很有挑战性的工作.提出用语义约束的聚类方法对分割后的图像区域进行聚类,在图像标注阶段,使用贪心选择连接(GSJ)算法找出聚类区域的独立子集,然后使用贝叶斯理论进行语义标注.对图像进行标注以后,使用标注的关键字进行检索.在一个包含500幅图像的图像库进行实验,结果表明,提出的方法具有较好的检索性能.  相似文献
5.
6.
In order to import the domain knowledge or application-dependent parameters into the data mining systems, constraint-based mining has attracted a lot of research attention recently. In this paper, the attributes employed to model the constraints are called constraint attributes and those attributes involved in the objective function to be optimized are called optimization attributes. The constrained clustering considered in this paper is conducted in such a way that the objective function of optimization attributes is optimized subject to the condition that the imposed constraint is satisfied. Explicitly, we address the problem of constrained clustering with numerical constraints, in which the constraint attribute values of any two data items in the same cluster are required to be within the corresponding constraint range. This numerical constrained clustering problem, however, cannot be dealt with by any conventional clustering algorithms. Consequently, we devise several effective and efficient algorithms to solve such a clustering problem. It is noted that due to the intrinsic nature of the numerical constrained clustering, there is an order dependency on the process of attaining the clustering, which in many cases degrades the clustering results. In view of this, we devise a progressive constraint relaxation technique to remedy this drawback and improve the overall performance of clustering results. Explicitly, by using a smaller (tighter) constraint range in earlier iterations of merge, we will have more room to relax the constraint and seek for better solutions in subsequent iterations. It is empirically shown that the progressive constraint relaxation technique is able to improve not only the execution efficiency but also the clustering quality.  相似文献
7.
Clustering methods for data-mining problems must be extremely scalable. In addition, several data mining applications demand that the clusters obtained be balanced, i.e., of approximately the same size or importance. In this paper, we propose a general framework for scalable, balanced clustering. The data clustering process is broken down into three steps: sampling of a small representative subset of the points, clustering of the sampled data, and populating the initial clusters with the remaining data followed by refinements. First, we show that a simple uniform sampling from the original data is sufficient to get a representative subset with high probability. While the proposed framework allows a large class of algorithms to be used for clustering the sampled set, we focus on some popular parametric algorithms for ease of exposition. We then present algorithms to populate and refine the clusters. The algorithm for populating the clusters is based on a generalization of the stable marriage problem, whereas the refinement algorithm is a constrained iterative relocation scheme. The complexity of the overall method is O(kN log N) for obtaining k balanced clusters from N data points, which compares favorably with other existing techniques for balanced clustering. In addition to providing balancing guarantees, the clustering performance obtained using the proposed framework is comparable to and often better than the corresponding unconstrained solution. Experimental results on several datasets, including high-dimensional (>20,000) ones, are provided to demonstrate the efficacy of the proposed framework.
Joydeep GhoshEmail:
  相似文献
8.
Internet traffic classification is a critical and essential functionality for network management and security systems. Due to the limitations of traditional port-based and payload-based classification approaches, the past several years have seen extensive research on utilizing machine learning techniques to classify Internet traffic based on packet and flow level characteristics. For the purpose of learning from unlabeled traffic data, some classic clustering methods have been applied in previous studies but the reported accuracy results are unsatisfactory. In this paper, we propose a semi-supervised approach for accurate Internet traffic clustering, which is motivated by the observation of widely existing partial equivalence relationships among Internet traffic flows. In particular, we formulate the problem using a Gaussian Mixture Model (GMM) with set-based equivalence constraint and propose a constrained Expectation Maximization (EM) algorithm for clustering. Experiments with real-world packet traces show that the proposed approach can significantly improve the quality of resultant traffic clusters.  相似文献
9.
10.
该文提出了一种结合属性分布特征的Web模式匹配算法,属性分布特征包括属性对互斥特征和属性对共现特征。属性对互斥特征由属性对的互斥性和出现次数计算得出,这个特征隐含了属性对的语义相似程度。为了充分利用传统的属性名、属性值相似性特征,该文通过机器学习方法结合属性对互斥特征与相似性特征进行属性匹配。并以潜在的匹配属性对为基础,引入有约束的属性聚类方法进行Web模式匹配,聚类方法的约束条件来自属性对共现特征。实验结果表明,相对于仅使用相似性特征的方法,在不同的实验设置下,结合属性分布特征的Web模式匹配算法将F值提高了0.13到0.55。  相似文献
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号