具有随机化统计检验的聚类分析算法与网络实现 An Algorithm and Network Implementation of Clustering Analysis with Randomized Statistical Testing期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

具有随机化统计检验的聚类分析算法与网络实现

引用本文：	张文军,张润杰,古德祥. 具有随机化统计检验的聚类分析算法与网络实现[J]. 计算机工程与科学, 2006, 28(12): 74-76

作者姓名：	张文军张润杰古德祥

作者单位：	中山大学昆虫学研究所与生物防治国家重点实验室,广东,广州,510275

基金项目：	国家自然科学基金;教育部留学回国人员科研启动基金

摘要：	聚类分析是应用最为广泛的数学方法之一，但又被认为是数学上不严格的一类方法。主要原因在于聚类过程及其结果没有统计学标准。本文建立了具有随机化统计检验的聚类分析算法，用于对若干个样品进行有显著性标记的聚类分析。该算法由三部分组成：距离测度计算、随机化检验和系统聚类。在该算法中，有14种距离测度、三种系统聚类方方法及指标加权与否可供选择。样品之间的距离定义为：1-随机化检验的P检验值；两类间的距离若满足P检验标准则合并为同一类是统计上显著的、可接受的，否则就是不显著的、不可接受的。算法的特点是：用随机化方法进行差异显著性检验，使得对多种距离测度可进行严格的统计检验，随机化检验不需统计前提和假设，适用于各种统计问问题；用于差异显著性检验的随机化方法需要随机化数值为正整数值，适用范围过窄，用数值同步移位和平移方法可使之适用于实数域。算法用Java语言网络化实现，包含六个类和一个HTFML文件。可通过网络在多种Java兼容的浏览器上实现算法共享。根据水稻田无脊椎动物多样性的调查数据，本文对该算法进行了对比分析，并讨论了选择距离测度的一些原则和进一步研究的途径等问题。
关键词：	聚类分析随机化统计检验距离测度算法网络实现
文章编号：	1007-130X(2006)012-0074-03
修稿时间：	2005-02-28
An Algorithm and Network Implementation of Clustering Analysis with Randomized Statistical Testing

ZHANG Wen-jun,ZHANG Run-jie,GU De-xiang. An Algorithm and Network Implementation of Clustering Analysis with Randomized Statistical Testing[J]. Computer Engineering & Science, 2006, 28(12): 74-76

Authors:	ZHANG Wen-jun ZHANG Run-jie GU De-xiang

Abstract:	A problem with the algorithms of clustering analysis is that their results are always not statistically tested. An algorithm of clustering analysis with randomized statistical testing is developed in this paper. It consists of three parts: calculation of distance measures, randomized testing, and hierarchical clustering. In this algorithm the between-sample distance is defined as the 1-p_test value, where the p_test value is calculated from the randomization procedure for the two samples. If the between-class distance meets with the p_test criterion it will be statistically reasonable to combine the two classes into one class. Fourteen distance measures and three methods of hierarchical clustering are given. The algorithm is implemented as the network program with the Java language which is comprised of 6 Java classes and a HTML file. The program can run on Java-enabled Web browsers. This algorithm is tested with the investigation of rice invertebrate diversity. The criteria for choosing distance measures and the perspective for improving the algorithm are discussed.

Keywords:	cluster analysis randomized statistical testing distance measure algorithm network implementation
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程与科学》浏览原始摘要信息
	点击此处可从《计算机工程与科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏