期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Determining number of clusters and prototype locations via multi-scale clustering

Eiji Nakamura Nasser Kehtarnavaz 《Pattern recognition letters》1998,19(14):1265-1283

In clustering algorithms, it is usually assumed that the number of clusters is known or given. In the absence of such a priori information, a procedure is needed to find an appropriate number of clusters. This paper presents a clustering algorithm that incorporates a mechanism for finding the appropriate number of clusters as well as the locations of cluster prototypes. This algorithm, called multi-scale clustering, is based on scale-space theory by considering that any prominent data structure ought to survive over many scales. The number of clusters as well as the locations of cluster prototypes are found in an objective manner by defining and using lifetime and drift speed clustering criteria. The outcome of this algorithm does not depend on the initial prototype locations that affect the outcome of many clustering algorithms. As an application of this algorithm, it is used to enhance the Hough transform technique. 相似文献

2.

基于信息熵的不确定性数据清理方法 总被引：1，自引：0，他引：1

覃远翔段亮岳昆《计算机应用》2013,33(9):2490-2492

针对不确定性数据中往往包含一些异常数据而导致相应的查询结果出现错误的问题,提出了一种基于信息熵的不确定性数据清理方法以减少异常数据并提高不确定性数据的质量。首先使用信息熵来度量数据的不确定度,然后结合统计学方法计算出不确定性数据的可信区间,最后去除那些不在可信区间内的数据。实验结果验证了该方法的高效性和有效性。相似文献

3.

Estimating the number of clusters using a windowing technique

B. Boutsinas D. K. Tasoulis M. N. Vrahatis 《Pattern Recognition and Image Analysis》2006,16(2):143-154

Clustering is the process of partitioning a set of patterns into disjoint and homogeneous meaningful groups (clusters). A fundamental and unresolved issue in cluster analysis is to determine how many clusters are present in a given set of patterns. In this paper, we present the z-windows clustering algorithm, which aims to address this problem using a windowing technique. Extensive empirical tests that illustrate the efficiency and the accuracy of the propsoed method are presented. The text was submitted by the authors in English. Basilis Boutsinas. Received his diploma in Computer Engineering and Informatics in 1991 from the University of Patras, Greece. He also conducted studies in Electronics Engineering at the Technical Education Institute of Piraeus, Greece, and Pedagogics at the Pedagogical Academy of Lamia, Greece. He received his PhD on Knowledge Representation from the University of Patras in 1997. He has been an assistant professor in the Department of Business Administration at the University of Patras since 2001. His primary research interests include data mining, business intelligence, knowledge representation techniques, nonmonotonic reasoning, and parallel AI. Dimitris K. Tasoulis received his diploma in Mathematics from the University of Patras, Greece, in 2000. He attained his MSc degree in 2004 from the postgraduate course “Mathematics of Computers and Decision Making” from which he was awarded a postgraduate fellowship. Currently, he is a PhD candidate in the same course. His research activities focus on data mining, clustering, neural networks, parallel algorithms, and evolutionary computation. He is coauthor of more than ten publications. Michael N. Vrahatis is with the Department of Mathematics at the University of Patras, Greece. He received the diploma and PhD degree in Mathematics from the University of Patras in 1978 and 1982, respectively. He was a visiting research fellow at the Department of Mathematics, Cornell University (1987–1988) and a visiting professor to the INFN (Istituto Nazionale di Fisica Nucleare), Bologna, Italy (1992, 1994, and 1998); the Department of Computer Science, Katholieke Universiteit Leuven, Belgium (1999); the Department of Ocean Engineering, Design Laboratory, MIT, Cambridge, MA, USA (2000); and the Collaborative Research Center “Computational Intelligence” (SFB 531) at the Department of Computer Science, University of Dortmund, Germany (2001). He was a visiting researcher at CERN (European Organization of Nuclear Research), Geneva, Switzerland (1992) and at INRIA (Institut National de Recherche en Informatique et en Automatique), France (1998, 2003, and 2004). He is the author of more than 250 publications (more than 110 of which are published in international journals) in his research areas, including computational mathematics, optimization, neural networks, evolutionary algorithms, and artificial intelligence. His research publications have received more than 600 citations. He has been a principal investigator of several research grants from the European Union, the Hellenic Ministry of Education and Religious Affairs, and the Hellenic Ministry of Industry, Energy, and Technology. He is among the founders of the “University of Patras Artificial Intelligence Research Center” (UPAIRC), established in 1997, where currently he serves as director. He is the founder of the Computational Intelligence Laboratory (CI Lab), established in 2004 at the Department of Mathematics of University of Patras, where currently he serves as director. 相似文献

4.

Enhancing density-based data reduction using entropy 总被引：1，自引：0，他引：1

Huang D Chow TW 《Neural computation》2006,18(2):470-495

Data reduction algorithms determine a small data subset from a given large data set. In this article, new types of data reduction criteria, based on the concept of entropy, are first presented. These criteria can evaluate the data reduction performance in a sophisticated and comprehensive way. As a result, new data reduction procedures are developed. Using the newly introduced criteria, the proposed data reduction scheme is shown to be efficient and effective. In addition, an outlier-filtering strategy, which is computationally insignificant, is developed. In some instances, this strategy can substantially improve the performance of supervised data analysis. The proposed procedures are compared with related techniques in two types of application: density estimation and classification. Extensive comparative results are included to corroborate the contributions of the proposed algorithms. 相似文献

5.

Using selfish gene theory to construct mutual information and entropy based clusters for bivariate optimizations

Feng Wang Zhiyi Lin Cheng Yang Yuanxiang Li 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(5):907-915

This paper proposes a new approach named SGMIEC in the field of estimation of distribution algorithm (EDA). While the current EDAs require much time in the statistical learning process as the relationships among the variables are too complicated, the selfish gene theory (SG) is deployed in this approach and a mutual information and entropy based cluster (MIEC) model with an incremental learning and resample scheme is also set to optimize the probability distribution of the virtual population. Experimental results on several benchmark problems demonstrate that, compared with BMDA, COMIT and MIMIC, SGMIEC often performs better in convergent reliability, convergent velocity and convergent process. 相似文献

6.

Determining object translation information using stereoscopic motion 总被引：1，自引：0，他引：1

Mutch KM 《IEEE transactions on pattern analysis and machine intelligence》1986,(6):750-755

Stereoscopic motion is an approach for comparing image change due to motion in a stereo pair of image sequences. Qualitatively, the relative image change shows that an object point is approaching, receding, or remaining at constant depth. Quantitatively, the relative change predicts where the object point will pass with respect to the camera system. 相似文献

7.

A method of predicting the number of clusters using Rand's statistic 总被引：1，自引：0，他引：1

Seong S. Chae Janice L. DuBien William D. Warde 《Computational statistics & data analysis》2006,50(12):3531-3546

Distributional and asymptotic results on the moment of Rand's C_k statistic were derived by DuBien and Warde [1981. Some distributional results concerning a comparative statistic used in cluster analysis. ASA Proceedings of the Social Statistics Section, 309–313.]. Based on those results, a method to predict the number of clusters is suggested by applying various agglomerative clustering algorithms. In the procedure, the methods using different indexes are examined and compared based on the concept of agreement (or, disagreement) between clusterings generated by different clustering algorithms on the set of data. Our method having practical generality works better than the other methods and assigns statistical meaning to C_k values in determining the number of clusters from the comparison. 相似文献

8.

Determination of the optimal number of clusters using a spectral clustering optimization

《Expert systems with applications》2016

In this paper, we present a new method, called Spectral Global Silhouette method (GS), to calculate the optimal number of clusters in a dataset using a Spectral Clustering algorithm. It combines both a Silhouette Validity Index and the concept of Local Scaling. First, the GS algorithm has first been tested using synthetic data. Then, it is applied on real data for image segmentation task. In addition, three new methods for image segmentation and two new ways to calculate the optimal number of groups in an image are proposed. Our experiments have shown a promising performance of the proposed algorithms. 相似文献

9.

基于距离分布信息熵的商标图像检索

下载免费PDF全文

孙强强陈才扣刘永俊黄建平《计算机工程与应用》2007,43(36):71-73

提出了一种基于距离分布信息熵的图像检索方法,该方法首先对图像的目标区域进行区域划分,然后提取区域的信息熵作为特征来描述图像形状,最后使用欧式距离度量熵矢量之间的相似性。实验结果表明,距离分布信息熵能有效地刻画出二值图象的形状特征,并且具有良好的平移、旋转及尺度不变性,检索结果符合人眼的视觉感受。相似文献

10.

A randomized algorithm for estimating the number of clusters

O. N. Granichin D. S. Shalymov R. Avros Z. Volkovich 《Automation and Remote Control》2011,72(4):754-765

Clustering is actively studied in such fields as statistics, pattern recognition, machine training, et al. A new randomized algorithm is suggested and established for finding the number of clusters in the set of data, the efficiency of which is demonstrated by examples of simulation modeling on synthetic data with thousands of clusters. 相似文献

11.

The evaluation of data sources using multivariate entropy tools

《Expert systems with applications》2017

We introduce from first principles an analysis of the information content of multivariate distributions as information sources. Specifically, we generalize a balance equation and a visualization device, the Entropy Triangle, for multivariate distributions and find notable differences with similar analyses done on joint distributions as models of information channels.As an example application, we extend a framework for the analysis of classifiers to also encompass the analysis of data sets. With such tools we analyze a handful of UCI machine learning task to start addressing the question of how well do datasets convey the information they are supposed to capture about the phenomena they stand for. 相似文献

12.

Determining forest canopy characteristics using airborne laser data 总被引：3，自引：0，他引：3

Ross Nelson William Krabill Gordon MacLean 《Remote sensing of environment》1984,15(3):201-212

A pulsed laser system was flown over a forested area in Pennsylvania which exhibited a wide range of canopy closure conditions. The lasing system acts as the ultraviolet light equivalent of radar, sensing not only the distance to the top of the forest canopy, but also the range to the forest floor. The data were analyzed to determine which components of the laser data could explain the variability in crown closure along the flight transect. Results indicated that canopy closure was most strongly related to the penetration capability of the laser pulse. Pulses were attenuated more quickly in a dense canopy. Hence the inability to find a strong ground return in the laser data after initially sensing the top of the canopy connoted dense canopy cover. Photogrammetrically acquired tree heights were compared to laser estimates; average heights differed by less than 1 m. The results indicated that the laser system may be used to remotely sense the vertical forest canopy profile. Elements of this profile are linearly related to crown closure and may be used to assess tree height. 相似文献

13.

An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data

Liang Bai Jiye Liang Chuangyin Dang 《Knowledge》2011,24(6):785-795

The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, in the k-modes-type algorithms, the performance of their clustering depends on initial cluster centers and the number of clusters needs be known or given in advance. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes-type algorithms. The proposed method can not only obtain the good initial cluster centers but also provide a criterion to find candidates for the number of clusters. The performance and scalability of the proposed method has been studied on real data sets. The experimental results illustrate that the proposed method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data points. 相似文献

14.

利用FCM求解最佳聚类数的算法 总被引：2，自引：0，他引：2

张姣玲《计算机工程与应用》2008,44(22):65-67

利用FCM求解最佳聚类数的算法中,每次应用FCM算法都要重新初始化类中心,而FCM算法对初始类中心敏感,这样使得利用FCM求解最佳聚类数的算法很不稳定。对该算法进行了改进,提出了一个合并函数,使得（c－1）类的类中心依赖于类的类中心。仿真实验表明：新的算法稳定性好,且运算速度明显比旧的算法要快。相似文献

15.

Determining the number of Kanbans for unbalanced serial production systems

Katsuhiko Takahashi 《Computers & Industrial Engineering》1994,27(1-4):213-216

This paper deals with the problem to determine the number of Kanbans for unbalanced serial production systems under the stochastic conditions. The simulation experiments for the constructed model of Kanban system derive the fundamental information about the problem, and an algorithm to determine the optimal number of Kanbans is proposed. 相似文献

16.

一种多传感器数据的熵权融合方法 总被引：1，自引：0，他引：1

万树平《传感器与微系统》2007,26(12):25-26

针对多只传感器对多个特性指标进行测量实验的数据融合问题,提出了一种新的多传感器数据的融合算法.该方法采用最大最小法确定各传感器测量数据之间的模糊相似矩阵,定义熵权来确定各传感器的融合权重.可以克服以往方法中关系矩阵的主观影响.实验数据分析表明:该算法简单、数据融合含义清晰,可以避免有效数据的损失. 相似文献

17.

Determining the value of information for collaborative multi-agent planning

David Sarne Barbara J. Grosz 《Autonomous Agents and Multi-Agent Systems》2013,26(3):456-496

This paper addresses the problem of computing the value of information in settings in which the people using an autonomous-agent system have access to information not directly available to the system itself. To know whether to interrupt a user for this information, the agent needs to determine its value. The fact that the agent typically does not know the exact information the user has and so must evaluate several alternative possibilities significantly increases the complexity of the value-of-information calculation. The paper addresses this problem as it arises in multi-agent task planning and scheduling with architectures in which information about the task schedule resides in a separate “scheduler” module. For such systems, calculating the value to overall agent performance of potential new information requires that the system component that interacts with the user query the scheduler. The cost of this querying and inter-module communication itself substantially affects system performance and must be taken into account. The paper provides a decision-theoretic algorithm for determining the value of information the system might acquire, query-reduction methods that decrease the number of queries the algorithm makes to the scheduler, and methods for ordering the queries to enable faster decision-making. These methods were evaluated in the context of a collaborative interface for an automated scheduling agent. Experimental results demonstrate the significant decrease achieved by using the query-reduction methods in the number of queries needed for reasoning about the value of information. They also show the ordering methods substantially increase the rate of value accumulation, enabling faster determination of whether to interrupt the user. 相似文献

18.

计算Web页面信息熵的方法

朱红灿陈能华周永红《计算机工程与设计》2010,31(1)

为有效解决Web信息抽取中的主题漂移问题,提出了一种能更准确地反映Web页面信息熵的计算方法--混合熵.该方法把需要计算信息熵的信息块放在多页面网站环境中进行讨论,通过考虑页面内信息对信息熵计算的影响,并同时考虑由模版生成的页面间相同的信息分布的影响,从而保证了信息熵的计算的准确度.用该方法解决信息抽取中信息块的信息熵计算问题,并将仿真结果与其它算法进行比较,结果表明了该方法计算的信息熵的准确度及主题相关信息块与主题无关信息块之间的区分度优于其它方法. 相似文献

19.

属性频率划分和信息熵离散化的决策树算法 总被引：2，自引：0，他引：2

下载免费PDF全文

李春贵王萌孙自广王晓荣张增芳《计算机工程与应用》2009,45(12):153-156

决策树是数据挖掘任务中分类的常用方法。在构造决策树的过程中,节点划分属性选择的度量直接影响决策树分类的效果。基于粗糙集的属性频率函数方法度量属性重要性,并用于分枝划分属性的选择和决策树的预剪枝,提出一种决策树学习算法。同时,为了能处理数值型属性,利用数据集的统计性质为启发式知识,提出了一种改进的数值型属性信息熵离散化算法。实验结果表明,新的离散化方法计算效率有明显提高,新的决策树算法与基于信息熵的决策树算法相比较,结构简单,且能有效提高分类效果。相似文献

20.

基于信息熵的支持向量数据描述分类 总被引：1，自引：0，他引：1

何伟成方景龙《计算机应用》2011,31(4):1114-1116

针对现有的支持向量数据描述(SVDD)在解决分类问题时通常存在盲目性和有偏性,在研究信息熵和SVDD分类理论的基础上,提出了改进两类分类问题的E-SVDD算法。首先对两类样本数据分别求出其熵值;然后根据熵值大小决定将哪类放在球内;最后结合两类样本容量以及各自的熵值所提供的分布信息,对SVDD算法中的C值重新进行定义。采用该算法对人工样本集和UCI数据集进行实验,实验结果验证了算法的可行性和有效性。相似文献