共查询到20条相似文献,搜索用时 0 毫秒
1.
Unsupervised Rough Set Classification Using GAs 总被引:9,自引:1,他引:9
Pawan Lingras 《Journal of Intelligent Information Systems》2001,16(3):215-228
2.
3.
一种基于容错粗糙集的Web搜索结果聚类方法 总被引:1,自引:0,他引:1
一些Web聚类方法把类严格作为互斥的关系,聚类效果不理想.一种基于容错粗糙集的k均值的聚类解决了这一问题.首先运用向量模型表示Web文档信息,采用常规方法得到文本特征词集,然后利用某些特征词协同出现的价值,构造特征词客错关系,扩充特征词的描述能力,最后用特征词容错类描述文档之间的相似关系,实现了Web搜索结果聚类,并提出了简单直观的衡量聚类精度的T模型.实验结果表明,利用容错关系聚类的类标记描述性强、容易理解、明显优于普通k均值算法. 相似文献
4.
一种基于粗糙集的网页分类方法 总被引:16,自引:2,他引:16
Internet的迅速发展带来了一个新的问题,如何有效,迅速地从浩瀚的Web网页中找到所需要的信息,机器学习的发展给这个问题的解决提供了一个新的方向,本文将粗糙集理论应用于网页分类,提出了一种基于粗糙集的决策表约简的增量式学习算法,并利用该算法实现了一个Web网页的分类器,实验结果表明该分类器具有良好的性能。 相似文献
5.
Feature Weighting in k-Means Clustering 总被引:3,自引:0,他引:3
Data sets with multiple, heterogeneous feature spaces occur frequently. We present an abstract framework for integrating multiple feature spaces in the k-means clustering algorithm. Our main ideas are (i) to represent each data object as a tuple of multiple feature vectors, (ii) to assign a suitable (and possibly different) distortion measure to each feature space, (iii) to combine distortions on different feature spaces, in a convex fashion, by assigning (possibly) different relative weights to each, (iv) for a fixed weighting, to cluster using the proposed convex k-means algorithm, and (v) to determine the optimal feature weighting to be the one that yields the clustering that simultaneously minimizes the average within-cluster dispersion and maximizes the average between-cluster dispersion along all the feature spaces. Using precision/recall evaluations and known ground truth classifications, we empirically demonstrate the effectiveness of feature weighting in clustering on several different application domains. 相似文献
6.
7.
8.
基于粗糙集理论的WEB日志中关联规则提取 总被引:2,自引:0,他引:2
随着互联网的飞速发展,WEB日志挖掘,也就是从WEB日志中发现和分析出用户的有用信息已成为研究热点.基于关联规则的方法是WEB挖掘的重要方法.本文应用粗糙集理论提取WEB日志中的关联规则,并将关联规则集用于用户行为的预测上,实验证明,该方法的预测精度要好于现有的方法. 相似文献
9.
基于多例学习的Web图像聚类 总被引:2,自引:0,他引:2
在图像分类和自动标注系统中,多例学习(MIL)是研究的热点.目前MIL中的算法多为监督学习方法.针对非监督学习,在基于EM算法和启发式迭代优化算法的框架下,提出了6种多例聚类算法,并通过它们对来自于真实Web环境下的图像进行聚类以分析用户的搜索兴趣.由于一幅图像含有若干个区域,每个区域可被看为一个样例,属于同一个图像的区域则组成一个包.因此如何理解图像语义内容的问题即转化为多例学习.在多例学习的经典数据集MUSK数据和来自于Web图像集上的比较实验表明,提出的多例聚类算法具有优良的聚类性能. 相似文献
10.
The basic contribution of this paper is the presentation of two methods that can be used to design a practical software change classification system based on data mining methods from rough set theory. These methods incorporate recent advances in rough set theory related to coping with the uncertainty in making change decisions either during software development or during post-deployment of a software system. Two well-known software engineering data sets have been used as means of benchmarking the proposed classification methods, and also to facilitate comparison with other published studies on the same data sets. Two technologies in computation intelligence (CI) are used in the design of the software change classification systems described in this paper, namely, rough sets (a granular computing technology) and genetic algorithms. Using 10-fold cross validated paired t-test, this paper also compares the rough set classification learning method with the Waikato Environment for Knowledge Analysis (WEKA) classification learning method. The contribution of this paper is the presentation of two models for software change classification based on two CI technologies. 相似文献
11.
智能化搜索是当今商务网站制作搜索引擎的一个发展方向 ,它的特点就是迎合每个用户的兴趣 ,将尽量精确的有关网页页面展现在用户面前。粗糙集理论是一种处理含糊和不精确性问题的新型数学工具 ,特别对于数据挖掘和知识发现更是提供了一个完备的理论基础。本文首先对Rough集理论中上、下近似集和近似精度的基本概念进行了描述 ,然后引用了Rough集中的上、下近似集及其近似精度的理论 ,利用简化的WWW模型 ,将所搜索到的网页形成一棵用户兴趣树 ,再对此棵树上的网页结点进行约简 ,从而能够使展现在用户面前的网页页面尽量准确 相似文献
12.
曹志梅 《计算机工程与应用》2005,41(21):215-218
本文基于粗糙集理论和模糊聚类的方法对图书馆的用户评价数据进行了分析,旨在寻找用户评价指标之间的关联规则,确定用户评价的关键性指标。 相似文献
13.
一种基于粗糙集带支持信息的挖掘算法 总被引:1,自引:0,他引:1
本文根据直接利用粗糙集挖掘规则难以避免偶然性、以及求出所有约简与求最小约简的问题都是NP-难的问题,提出一种求精简规则的启发式算法DR。该算法根据实际数据挖掘的特点、充分利用属性支持信息直接从数据表中挖掘高支持度和描述长度小的规则集。算法DR计算简单,其效率主要与属性的个数相关,当属性取不同值的数目不大时是一个高效算法。 相似文献
14.
Web文本聚类算法的分析比较 总被引:2,自引:0,他引:2
随着计算机网络的发展,各种文本资源以惊人的速度增长,导致信息搜寻困难和信息利用率低下。而快速高质量的Web文本聚类技术可以满足用户方便快捷地从互联网获得所需要的信息资源。文章对Web文本聚类如网页采集、去噪、分词、特征表示等关键技术进行研究,对常用的Web文本聚类算法进行了分析比较,所给出的分析比较结果对文本聚类算法的应用有现实意义。 相似文献
15.
基于信息熵模糊聚类和粗糙集理论故障的模糊判据研究 总被引:1,自引:0,他引:1
复杂电子系统的缓变故障是故障预报的难点之一,针对这一问题提出了基于信息熵模糊聚类和粗糙集理论的故障决策判据方法;该方法主要分两个步骤:以信息熵作为聚类标准,采用谱系的方法确定聚类数目,然后通过FCM模糊聚类构造故障决策表;利用粗糙集理论对故障决策表进行简化和最小化,最终形成带评价的简约故障决策判据;该方法改进了一般模糊聚类算法的不足,克服了先验信息和知识不准确、不完整、不一致情况下故障决策表获取与更新的困难;实际算例表明,信息熵模糊聚类方法比一般模糊聚类方法的聚类质量更高,更客观真实。 相似文献
16.
Speed-density relationships are used by mesoscopic traffic simulators to represent traffic dynamics. While classical speed-density relationships provide useful insights into the traffic dynamics problem, they may be restrictive for such applications. This paper addresses the problem of calibrating speed-density relationship parameters using data mining techniques, and proposes a novel hierarchical clustering algorithm based on K-means clustering. By combining K-means with agglomerative hierarchical clustering, the proposed new algorithm is able to reduce early-stage errors inherent in agglomerative hierarchical clustering resulted in improved clustering performance. Moreover, in order to improve the precision of parametric calibration, densities and flows are utilized as variables. The proposed approach is tested against sensor data captured from the 3rd Ring Road of Beijing. The testing results show that the performance of our algorithm is better than existing solutions. 相似文献
17.
基于粗糙集理论的数据挖掘算法及其应用研究 总被引:4,自引:0,他引:4
文章对粗糙集理论及其应用进行了讨论,在分析和综合基于粗糙集理论的数据挖掘算法基础上,提出了新的遗传算法挖掘方法,并就应用模型和应用领域及方法问题进行了分析。通过应用实例表明,文章提供的方法和技术是可行的,具有较大的参考价值。 相似文献
18.
In this paper, the solutions produced by the fuzzy c-means algorithm for a general class of problems are examined and a method to test for the local optimality of such solutions is established. An equivalent mathematical program is defined for the c-means problem utilizing a generalized norm, then the properties of the resulting optimization problem are investigated. It is shown that the gradient of the resulting objective function at the solution produced by the c-means algorithm in this case takes a special structure which can be used in terminating the algorithm. Moreover, the local optimality of the solution obtained is checked utilizing the Hessian of the criterion function. The solution is a local minimum point if the Hessian matrix at this point is positive semidefinite. Simple rules are proposed to help in checking the definiteness of the matrix. 相似文献
19.
Harmony K-means algorithm for document clustering 总被引:2,自引:0,他引:2
Fast and high quality document clustering is a crucial task in organizing information, search engine results, enhancing web
crawling, and information retrieval or filtering. Recent studies have shown that the most commonly used partition-based clustering
algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm can generate a local optimal solution. In this paper we propose a novel Harmony K-means Algorithm (HKA) that deals with document clustering based on Harmony Search (HS) optimization method. It is proved
by means of finite Markov chain theory that the HKA converges to the global optimum. To demonstrate the effectiveness and
speed of HKA, we have applied HKA algorithms on some standard datasets. We also compare the HKA with other meta-heuristic
and model-based document clustering approaches. Experimental results reveal that the HKA algorithm converges to the best known
optimum faster than other methods and the quality of clusters are comparable. 相似文献
20.
一种带变异操作的粒子群聚类算法 总被引:1,自引:1,他引:0
针对基本粒子群算法的早熟收敛和收敛较慢的问题,提出了一种带变异操作的粒子群聚类算法。算法中对出现早熟收敛的种群采取变异操作,使其能够跳出局部最优解。对Iris植物样本数据的测试结果表明:该算法具有很好的全局收敛性和较快的收敛速度。 相似文献