共查询到20条相似文献,搜索用时 15 毫秒
1.
In clustering algorithms, it is usually assumed that the number of clusters is known or given. In the absence of such a priori information, a procedure is needed to find an appropriate number of clusters. This paper presents a clustering algorithm that incorporates a mechanism for finding the appropriate number of clusters as well as the locations of cluster prototypes. This algorithm, called multi-scale clustering, is based on scale-space theory by considering that any prominent data structure ought to survive over many scales. The number of clusters as well as the locations of cluster prototypes are found in an objective manner by defining and using lifetime and drift speed clustering criteria. The outcome of this algorithm does not depend on the initial prototype locations that affect the outcome of many clustering algorithms. As an application of this algorithm, it is used to enhance the Hough transform technique. 相似文献
2.
3.
B. Boutsinas D. K. Tasoulis M. N. Vrahatis 《Pattern Recognition and Image Analysis》2006,16(2):143-154
Clustering is the process of partitioning a set of patterns into disjoint and homogeneous meaningful groups (clusters). A
fundamental and unresolved issue in cluster analysis is to determine how many clusters are present in a given set of patterns.
In this paper, we present the z-windows clustering algorithm, which aims to address this problem using a windowing technique. Extensive empirical tests that
illustrate the efficiency and the accuracy of the propsoed method are presented.
The text was submitted by the authors in English.
Basilis Boutsinas. Received his diploma in Computer Engineering and Informatics in 1991 from the University of Patras, Greece. He also conducted
studies in Electronics Engineering at the Technical Education Institute of Piraeus, Greece, and Pedagogics at the Pedagogical
Academy of Lamia, Greece. He received his PhD on Knowledge Representation from the University of Patras in 1997. He has been
an assistant professor in the Department of Business Administration at the University of Patras since 2001. His primary research
interests include data mining, business intelligence, knowledge representation techniques, nonmonotonic reasoning, and parallel
AI.
Dimitris K. Tasoulis received his diploma in Mathematics from the University of Patras, Greece, in 2000. He attained his MSc degree in 2004 from
the postgraduate course “Mathematics of Computers and Decision Making” from which he was awarded a postgraduate fellowship.
Currently, he is a PhD candidate in the same course. His research activities focus on data mining, clustering, neural networks,
parallel algorithms, and evolutionary computation. He is coauthor of more than ten publications.
Michael N. Vrahatis is with the Department of Mathematics at the University of Patras, Greece. He received the diploma and PhD degree in Mathematics
from the University of Patras in 1978 and 1982, respectively. He was a visiting research fellow at the Department of Mathematics,
Cornell University (1987–1988) and a visiting professor to the INFN (Istituto Nazionale di Fisica Nucleare), Bologna, Italy
(1992, 1994, and 1998); the Department of Computer Science, Katholieke Universiteit Leuven, Belgium (1999); the Department
of Ocean Engineering, Design Laboratory, MIT, Cambridge, MA, USA (2000); and the Collaborative Research Center “Computational
Intelligence” (SFB 531) at the Department of Computer Science, University of Dortmund, Germany (2001). He was a visiting researcher
at CERN (European Organization of Nuclear Research), Geneva, Switzerland (1992) and at INRIA (Institut National de Recherche
en Informatique et en Automatique), France (1998, 2003, and 2004). He is the author of more than 250 publications (more than
110 of which are published in international journals) in his research areas, including computational mathematics, optimization,
neural networks, evolutionary algorithms, and artificial intelligence. His research publications have received more than 600
citations. He has been a principal investigator of several research grants from the European Union, the Hellenic Ministry
of Education and Religious Affairs, and the Hellenic Ministry of Industry, Energy, and Technology. He is among the founders
of the “University of Patras Artificial Intelligence Research Center” (UPAIRC), established in 1997, where currently he serves
as director. He is the founder of the Computational Intelligence Laboratory (CI Lab), established in 2004 at the Department
of Mathematics of University of Patras, where currently he serves as director. 相似文献
4.
Enhancing density-based data reduction using entropy 总被引:1,自引:0,他引:1
Data reduction algorithms determine a small data subset from a given large data set. In this article, new types of data reduction criteria, based on the concept of entropy, are first presented. These criteria can evaluate the data reduction performance in a sophisticated and comprehensive way. As a result, new data reduction procedures are developed. Using the newly introduced criteria, the proposed data reduction scheme is shown to be efficient and effective. In addition, an outlier-filtering strategy, which is computationally insignificant, is developed. In some instances, this strategy can substantially improve the performance of supervised data analysis. The proposed procedures are compared with related techniques in two types of application: density estimation and classification. Extensive comparative results are included to corroborate the contributions of the proposed algorithms. 相似文献
5.
Feng Wang Zhiyi Lin Cheng Yang Yuanxiang Li 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(5):907-915
This paper proposes a new approach named SGMIEC in the field of estimation of distribution algorithm (EDA). While the current EDAs require much time in the statistical learning process as the relationships among the variables are too complicated, the selfish gene theory (SG) is deployed in this approach and a mutual information and entropy based cluster (MIEC) model with an incremental learning and resample scheme is also set to optimize the probability distribution of the virtual population. Experimental results on several benchmark problems demonstrate that, compared with BMDA, COMIT and MIMIC, SGMIEC often performs better in convergent reliability, convergent velocity and convergent process. 相似文献
6.
Stereoscopic motion is an approach for comparing image change due to motion in a stereo pair of image sequences. Qualitatively, the relative image change shows that an object point is approaching, receding, or remaining at constant depth. Quantitatively, the relative change predicts where the object point will pass with respect to the camera system. 相似文献
7.
Seong S. Chae Janice L. DuBien William D. Warde 《Computational statistics & data analysis》2006,50(12):3531-3546
Distributional and asymptotic results on the moment of Rand's Ck statistic were derived by DuBien and Warde [1981. Some distributional results concerning a comparative statistic used in cluster analysis. ASA Proceedings of the Social Statistics Section, 309–313.]. Based on those results, a method to predict the number of clusters is suggested by applying various agglomerative clustering algorithms. In the procedure, the methods using different indexes are examined and compared based on the concept of agreement (or, disagreement) between clusterings generated by different clustering algorithms on the set of data. Our method having practical generality works better than the other methods and assigns statistical meaning to Ck values in determining the number of clusters from the comparison. 相似文献
8.
In this paper, we present a new method, called Spectral Global Silhouette method (GS), to calculate the optimal number of clusters in a dataset using a Spectral Clustering algorithm. It combines both a Silhouette Validity Index and the concept of Local Scaling. First, the GS algorithm has first been tested using synthetic data. Then, it is applied on real data for image segmentation task. In addition, three new methods for image segmentation and two new ways to calculate the optimal number of groups in an image are proposed. Our experiments have shown a promising performance of the proposed algorithms. 相似文献
9.
提出了一种基于距离分布信息熵的图像检索方法,该方法首先对图像的目标区域进行区域划分,然后提取区域的信息熵作为特征来描述图像形状,最后使用欧式距离度量熵矢量之间的相似性。实验结果表明,距离分布信息熵能有效地刻画出二值图象的形状特征,并且具有良好的平移、旋转及尺度不变性,检索结果符合人眼的视觉感受。 相似文献
10.
O. N. Granichin D. S. Shalymov R. Avros Z. Volkovich 《Automation and Remote Control》2011,72(4):754-765
Clustering is actively studied in such fields as statistics, pattern recognition, machine training, et al. A new randomized algorithm is suggested and established for finding the number of clusters in the set of data, the efficiency of which is demonstrated by examples of simulation modeling on synthetic data with thousands of clusters. 相似文献
11.
We introduce from first principles an analysis of the information content of multivariate distributions as information sources. Specifically, we generalize a balance equation and a visualization device, the Entropy Triangle, for multivariate distributions and find notable differences with similar analyses done on joint distributions as models of information channels.As an example application, we extend a framework for the analysis of classifiers to also encompass the analysis of data sets. With such tools we analyze a handful of UCI machine learning task to start addressing the question of how well do datasets convey the information they are supposed to capture about the phenomena they stand for. 相似文献
12.
A pulsed laser system was flown over a forested area in Pennsylvania which exhibited a wide range of canopy closure conditions. The lasing system acts as the ultraviolet light equivalent of radar, sensing not only the distance to the top of the forest canopy, but also the range to the forest floor. The data were analyzed to determine which components of the laser data could explain the variability in crown closure along the flight transect. Results indicated that canopy closure was most strongly related to the penetration capability of the laser pulse. Pulses were attenuated more quickly in a dense canopy. Hence the inability to find a strong ground return in the laser data after initially sensing the top of the canopy connoted dense canopy cover. Photogrammetrically acquired tree heights were compared to laser estimates; average heights differed by less than 1 m. The results indicated that the laser system may be used to remotely sense the vertical forest canopy profile. Elements of this profile are linearly related to crown closure and may be used to assess tree height. 相似文献
13.
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, in the k-modes-type algorithms, the performance of their clustering depends on initial cluster centers and the number of clusters needs be known or given in advance. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes-type algorithms. The proposed method can not only obtain the good initial cluster centers but also provide a criterion to find candidates for the number of clusters. The performance and scalability of the proposed method has been studied on real data sets. The experimental results illustrate that the proposed method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data points. 相似文献
14.
利用FCM求解最佳聚类数的算法 总被引:2,自引:0,他引:2
张姣玲 《计算机工程与应用》2008,44(22):65-67
利用FCM求解最佳聚类数的算法中,每次应用FCM算法都要重新初始化类中心,而FCM算法对初始类中心敏感,这样使得利用FCM求解最佳聚类数的算法很不稳定。对该算法进行了改进,提出了一个合并函数,使得(c-1)类的类中心依赖于类的类中心。仿真实验表明:新的算法稳定性好,且运算速度明显比旧的算法要快。 相似文献
15.
Katsuhiko Takahashi 《Computers & Industrial Engineering》1994,27(1-4):213-216
This paper deals with the problem to determine the number of Kanbans for unbalanced serial production systems under the stochastic conditions. The simulation experiments for the constructed model of Kanban system derive the fundamental information about the problem, and an algorithm to determine the optimal number of Kanbans is proposed. 相似文献
16.
一种多传感器数据的熵权融合方法 总被引:1,自引:0,他引:1
针对多只传感器对多个特性指标进行测量实验的数据融合问题,提出了一种新的多传感器数据的融合算法.该方法采用最大最小法确定各传感器测量数据之间的模糊相似矩阵,定义熵权来确定各传感器的融合权重.可以克服以往方法中关系矩阵的主观影响.实验数据分析表明:该算法简单、数据融合含义清晰,可以避免有效数据的损失. 相似文献
17.
This paper addresses the problem of computing the value of information in settings in which the people using an autonomous-agent system have access to information not directly available to the system itself. To know whether to interrupt a user for this information, the agent needs to determine its value. The fact that the agent typically does not know the exact information the user has and so must evaluate several alternative possibilities significantly increases the complexity of the value-of-information calculation. The paper addresses this problem as it arises in multi-agent task planning and scheduling with architectures in which information about the task schedule resides in a separate “scheduler” module. For such systems, calculating the value to overall agent performance of potential new information requires that the system component that interacts with the user query the scheduler. The cost of this querying and inter-module communication itself substantially affects system performance and must be taken into account. The paper provides a decision-theoretic algorithm for determining the value of information the system might acquire, query-reduction methods that decrease the number of queries the algorithm makes to the scheduler, and methods for ordering the queries to enable faster decision-making. These methods were evaluated in the context of a collaborative interface for an automated scheduling agent. Experimental results demonstrate the significant decrease achieved by using the query-reduction methods in the number of queries needed for reasoning about the value of information. They also show the ordering methods substantially increase the rate of value accumulation, enabling faster determination of whether to interrupt the user. 相似文献
18.
为有效解决Web信息抽取中的主题漂移问题,提出了一种能更准确地反映Web页面信息熵的计算方法--混合熵.该方法把需要计算信息熵的信息块放在多页面网站环境中进行讨论,通过考虑页面内信息对信息熵计算的影响,并同时考虑由模版生成的页面间相同的信息分布的影响,从而保证了信息熵的计算的准确度.用该方法解决信息抽取中信息块的信息熵计算问题,并将仿真结果与其它算法进行比较,结果表明了该方法计算的信息熵的准确度及主题相关信息块与主题无关信息块之间的区分度优于其它方法. 相似文献
19.
决策树是数据挖掘任务中分类的常用方法。在构造决策树的过程中,节点划分属性选择的度量直接影响决策树分类的效果。基于粗糙集的属性频率函数方法度量属性重要性,并用于分枝划分属性的选择和决策树的预剪枝,提出一种决策树学习算法。同时,为了能处理数值型属性,利用数据集的统计性质为启发式知识,提出了一种改进的数值型属性信息熵离散化算法。实验结果表明,新的离散化方法计算效率有明显提高,新的决策树算法与基于信息熵的决策树算法相比较,结构简单,且能有效提高分类效果。 相似文献
20.
基于信息熵的支持向量数据描述分类 总被引:1,自引:0,他引:1
针对现有的支持向量数据描述(SVDD)在解决分类问题时通常存在盲目性和有偏性,在研究信息熵和SVDD分类理论的基础上,提出了改进两类分类问题的E-SVDD算法。首先对两类样本数据分别求出其熵值;然后根据熵值大小决定将哪类放在球内;最后结合两类样本容量以及各自的熵值所提供的分布信息,对SVDD算法中的C值重新进行定义。采用该算法对人工样本集和UCI数据集进行实验,实验结果验证了算法的可行性和有效性。 相似文献