首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Supersaturated designs (SSDs) are widely researched because they can greatly reduce the number of experiments. However, analyzing the data from SSDs is not easy as their run size is not large enough to estimate all the main effects. This paper introduces contrast-orthogonality cluster and anticontrast-orthogonality cluster to reflect the inner structure of SSDs which are helpful for experimenters to arrange factors to the columns of SSDs. A new strategy for screening active factors is proposed and named as contrast-orthogonality cluster analysis (COCA) method. Simulation studies demonstrate that this method performs well compared to most of the existing methods. Furthermore, the COCA method has lower type II errors and it is easy to be understood and implemented.  相似文献   

2.
Image coding using principal component analysis (PCA), a type of image compression technique, projects image blocks to a subspace that can preserve most of the original information. However, the blocks in the image exhibit various inhomogeneous properties, such as smooth region, texture, and edge, which give rise to difficulties in PCA image coding. This paper proposes a repartition clustering method to partition the data into groups, such that individuals of the same group are homogeneous, and vice versa. The PCA method is applied separately for each group. In the clustering method, the genetic algorithm acts as a framework consisting of three phases, including the proposed repartition clustering. Based on this mechanism, the proposed method can effectively increase image quality and provide an enhanced visual effect.  相似文献   

3.
弹性网络算法是一种启发式算法,最初被提出是用来解决TSP(Traveling Salesman Problem)问题的,现如今,被广泛应用于聚类问题中,尤其对于高维空间数据聚类方面,有很大的优势。提出了一种新的自适应弹性网络算法(Adaptive Elastic Net,AEN)解决聚类问题,该算法利用弹性网络算法得到的[K]个中心点作为聚类初始中心点,并利用局部搜索择优算法在每次迭代中更新中心点。以聚类完成后每一簇的中心点到该簇元素的距离之和作为聚类质量评价标准,分别对随机生成的不同维度的50,100,300,500,1?000个数据点的数据集和UCI中多个标准数据集进行聚类,并将结果与传统聚类算法的聚类结果进行比较。实验表明:相较于传统的聚类算法,该算法可以有效地提高聚类质量。  相似文献   

4.
In this paper a new method of mode separation is proposed. The method is based on mapping of data points from the N-dimensional space onto a sequence so that the majority of points from each mode become successive elements of the sequence. The intervals of points in the sequence belonging to the respective modes of the p.d.f. are then determined from a function generated on this sequence. The nuclei of the modes formed by the elements of these intervals are then used to obtain separating surfaces between the modes and so to partition the data set with multimodal probability density function into unimodal subsets.  相似文献   

5.
This paper proposes a new method for estimating the true number of clusters and initial cluster centers in a dataset with many clusters. The observation points are assigned to the data space to observe the clusters through the distributions of the distances between the observation points and the objects in the dataset. A Gamma Mixture Model (GMM) is built from a distance distribution to partition the dataset into subsets, and a GMM tree is obtained by recursively partitioning the dataset. From the leaves of the GMM tree, a set of initial cluster centers are identified and the true number of clusters is estimated. This method is implemented in the new GMM-Tree algorithm. Two GMM forest algorithms are further proposed to ensemble multiple GMM trees to handle high dimensional data with many clusters. The GMM-P-Forest algorithm builds GMM trees in parallel, whereas the GMM-S-Forest algorithm uses a sequential process to build a GMM forest. Experiments were conducted on 32 synthetic datasets and 15 real datasets to evaluate the performance of the new algorithms. The results have shown that the proposed algorithms outperformed the existing popular methods: Silhouette, Elbow and Gap Statistic, and the recent method I-nice in estimating the true number of clusters from high dimensional complex data.  相似文献   

6.
Clustering is one of the widely used knowledge discovery techniques to reveal structures in a dataset that can be extremely useful to the analyst. In iterative clustering algorithms the procedure adopted for choosing initial cluster centers is extremely important as it has a direct impact on the formation of final clusters. Since clusters are separated groups in a feature space, it is desirable to select initial centers which are well separated. In this paper, we have proposed an algorithm to compute initial cluster centers for k-means algorithm. The algorithm is applied to several different datasets in different dimension for illustrative purposes. It is observed that the newly proposed algorithm has good performance to obtain the initial cluster centers for the k-means algorithm.  相似文献   

7.
聚类分析是数据挖掘技术中的一类常见的方法。对于一类数值属性的挖掘,聚类之后,常出现所谓的孤立点。然而,有的孤立点其实并不孤立,它可能仍属于某个已确定的类,文章提出了一个基于属性之间相似关系的聚类分析方法,并对此进行了探讨。  相似文献   

8.
A new subspace identification approach based on principal component analysis   总被引:17,自引:0,他引:17  
Principal component analysis (PCA) has been widely used for monitoring complex industrial processes with multiple variables and diagnosing process and sensor faults. The objective of this paper is to develop a new subspace identification algorithm that gives consistent model estimates under the errors-in-variables (EIV) situation. In this paper, we propose a new subspace identification approach using principal component analysis. PCA naturally falls into the category of EIV formulation, which resembles total least squares and allows for errors in both process input and output. We propose to use PCA to determine the system observability subspace, the A, B, C, and D matrices and the system order for an EIV formulation. Standard PCA is modified with instrumental variables in order to achieve consistent estimates of the system matrices. The proposed subspace identification method is demonstrated using a simulated process and a real industrial process for model identification and order determination. For comparison the MOESP algorithm and N4SID algorithm are used as benchmarks to demonstrate the advantages of the proposed PCA based subspace model identification (SMI) algorithm.  相似文献   

9.
This paper describes a new methodology to detect small anomalies in high resolution hyperspectral imagery, which involves successively: (1) a multivariate statistical analysis (principal component analysis, PCA) of all spectral bands; (2) a geostatistical filtering of noise and regional background in the first principal components using factorial kriging; and finally (3) the computation of a local indicator of spatial autocorrelation to detect local clusters of high or low reflectance values and anomalies. The approach is illustrated using 1 m resolution data collected in and near northeastern Yellowstone National Park. Ground validation data for tarps and for disturbed soils on mine tailings demonstrate the ability of the filtering procedure to reduce the proportion of false alarms (i.e., pixels wrongly classified as target), and its robustness under low signal to noise ratios. In almost all scenarios, the proposed approach outperforms traditional anomaly detectors (i.e., RX detector which computes the Mahalanobis distance between the vector of spectral values and the vector of global means), and fewer false alarms are obtained when using a novel statistic S2 (average absolute deviation of p-values from 0.5 through all spectral bands) to summarize information across bands. Image degradation through addition of noise or reduction of spectral resolution tends to blur the detection of anomalies, increasing false alarms, in particular for the identification of the least pure pixels. Results from a mine tailings site demonstrate the approach performs reasonably well for highly complex landscape with multiple targets of various sizes and shapes. By leveraging both spectral and spatial information, the technique requires little or no input from the user, and hence can be readily automated.  相似文献   

10.
一种新型的模糊C均值聚类初始化方法   总被引:10,自引:0,他引:10  
刘笛  朱学峰  苏彩红 《计算机仿真》2004,21(11):148-151
模糊C均值聚类(FCM)是一种广泛采用的动态聚类方法,其聚类效果往往受初始聚类中心的影响。受自适应免疫系统对入侵机体的抗原产生免疫记忆的机理启示,提出了一种新的产生初始聚类中心的方法。算法中,待分析的数据被视为入侵性抗原,产生的记忆细胞作为聚类分析的初始中心。克隆选择用来产生抗原的记忆细胞群体,免疫网络理论则用来抑制该群体规模的快速增长。实验结果表明免疫记忆机理用于FCM初始中心的选择是可行的,不仅提高了FCM算法的收敛速度,而且可以通过改变阈值的大小自动决定类别数。  相似文献   

11.
As one of the most important techniques in data mining, cluster analysis has attracted more and more attentions in this big data era. Most clustering algorithms have encountered with challenges including cluster centers determination difficulty, low clustering accuracy, uneven clustering efficiency of different data sets and sensible parameter dependence. Aiming at clustering center determination difficulty and parameter dependence, a novel cluster center fast determination clustering algorithm was proposed in this paper. It is supposed that clustering centers are those data points with higher density and larger distance from other data points of higher density. Normal distribution curves are designed to fit the density distribution curve of density distance product. And the singular points outside the confidence interval by setting the confidence interval are proved to be clustering centers by theory analysis and simulations. Finally, according to these clustering centers, a time scan clustering is designed for the rest of the points by density to complete the clustering. Density radius is a sensible parameter in calculating density for each data point, mountain climbing algorithm is thus used to realize self-adaptive density radius. Abundant typical benchmark data sets are testified to evaluate the performance of the brought up algorithms compared with other clustering algorithms in both aspects of clustering quality and time complexity.  相似文献   

12.
聚类中心初始化的新方法   总被引:3,自引:1,他引:3  
k-均值聚类算法易受初始聚类中心的影响而陷入局部最优解.现有聚类中心初始化方法尚未得到广泛认可.本文依据每个类内至少有一个数据稠密区,且处于不同类的数据稠密区比处于同一类的数据稠密区相距更远的假设,在数据集合上构造一棵最小支撑树,应用根树原理在其上搜索数据稠密区并估计其密度,从中选出密度大且足够分离的数据稠密区,以其内的点作为初始聚类中心,得到了一个聚类中心初始化的新方法.将此方法与现有的方法进行比较,仿真实验表明,本文方法性能更优越.  相似文献   

13.
This paper presents a unified theory of a class of learning neural nets for principal component analysis (PCA) and minor component analysis (MCA). First, some fundamental properties are addressed which all neural nets in the class have in common. Second, a subclass called the generalized asymmetric learning algorithm is investigated, and the kind of asymmetric structure which is required in general to obtain the individual eigenvectors of the correlation matrix of a data sequence is clarified. Third, focusing on a single-neuron model, a systematic way of deriving both PCA and MCA learning algorithms is shown, through which a relation between the normalization in PCA algorithms and that in MCA algorithms is revealed. This work was presented, in part, at the Third International Symposium on Artificial Life and Robotics, Oita, Japan, January 19–21, 1998  相似文献   

14.
This paper proposes a new policy for consolidating a company's profits by segregating the clients using the contents service and allocating the media server's resources selectively by clusters using the cluster analysis method of CRM, which is mainly applied to marketing. In this case, CRM refers to the strategy of consolidating a company's profits by efficiently managing the clients, providing them with a more effective, personalized service, and managing the resources more effectively.For the realization of a new service policy, this paper analyzes the level of contribution vis-à-vis the clients' service pattern (total number of visits to the homepage, service type, service usage period, total payment, average service period, service charge per homepage visit) and profits through the cluster analysis of clients' data applying the TwoStep Cluster Method. Clients were grouped into four clusters according to the contribution level in terms of profits. Likewise, the CRFS (Client Request Filtering System) was suggested to allocate media server resources per cluster. CRFS issues approval within the resource limit of the cluster where the client belongs. In addition, to evaluate the efficiency of CRFS within the Client/Server environment, the number of admitted streams was evaluated for the comparison with other algorithms. A higher renewal rate was shown when applying CRFS through the evaluation of the client's renewal rate.  相似文献   

15.
针对高维混沌复杂系统的多步预测问题,提出了一种基于邻近相点聚类分析的多变量局域多步预测模型。首先对于多变量邻近相点的选取,结合邻近相点多步回溯后的演化规律和变量间的关联信息对演化轨迹的影响,提出了一种新的多变量演化轨迹相似度综合判据;然后针对选取全局最优邻近相点耗时长的缺点,提出了一种基于邻近相点聚类分析的新方案,来降低多步预测时间,提高预测效率。最后通过Lorenz混沌数据仿真实验,实验结果表明该模型具有优良的预测性能。  相似文献   

16.
This paper presents a novel adaptive cuckoo search (ACS) algorithm for optimization. The step size is made adaptive from the knowledge of its fitness function value and its current position in the search space. The other important feature of the ACS algorithm is its speed, which is faster than the CS algorithm. Here, an attempt is made to make the cuckoo search (CS) algorithm parameter free, without a Levy step. The proposed algorithm is validated using twenty three standard benchmark test functions. The second part of the paper proposes an efficient face recognition algorithm using ACS, principal component analysis (PCA) and intrinsic discriminant analysis (IDA). The proposed algorithms are named as PCA + IDA and ACS–IDA. Interestingly, PCA + IDA offers us a perturbation free algorithm for dimension reduction while ACS + IDA is used to find the optimal feature vectors for classification of the face images based on the IDA. For the performance analysis, we use three standard face databases—YALE, ORL, and FERET. A comparison of the proposed method with the state-of-the-art methods reveals the effectiveness of our algorithm.  相似文献   

17.
18.
Vague关系作为模糊关系的一种推广,在某些情况下,比直觉模糊关系具有更强的模糊信息表达能力。通过对照关系和模糊关系的传递闭包,把求模糊矩阵的传递闭包算法完整地推广到Vague关系矩阵上,从而可以将相似Vague关系矩阵转换为等价Vague关系矩阵,进而通过设定肯定、否定双维度阀值αtαf,将此等价的Vague关系矩阵转化成一个等价的布尔矩阵,最终使得达到聚类分析的目的。最后通过一个实例给出了这种聚类分析方法在模式识别中的应用。  相似文献   

19.
The EM algorithm is used to track moving objects as clusters of pixels significantly different from the corresponding pixels in a reference image. The underlying cluster model is Gaussian in image space, but not in grey-level difference distribution. The generative model is used to derive criteria for the elimination and merging of clusters, while simple heuristics are used for the initialisation and splitting of clusters. The system is competitive with other tracking algorithms based on image differencing.  相似文献   

20.
随着分布式计算技术的发展,Hadoop成为大规模数据处理领域的典型代表,由于安全机制相对薄弱,缺少用户行为活动的监控,容易受到隐藏的安全威胁,如数据泄露等。结合主成分分析计算的特点,基于MapReduce对其做并行化处理,克服了传统主成分分析计算的缺点,提高了模型训练效率。提出了一种基于并行化主成分分析的异常行为检测方法,即比较当前用户的行为模式是否与历史行为模式相匹配作为判定用户行为异常与否的度量标准。实验表明该方法能够较好地发现用户的异常行为。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号