首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于属性重要性的加权聚类融合   总被引:6,自引:2,他引:4  
聚类融合是数据挖掘研究的一个热点.当前相关研究大多没有考虑进行融合的聚类成员的质量,因此较差的成员和噪声会对融合结果产生不良的影响.提出了一种对聚类成员进行加权的融合方法.该方法引入粗糙集理论中的属性重要性度量,根据聚类成员对融合的重要性赋予其权重,生成加权共生矩阵,进而产生融合结果.实验结果表明,提出的方法能较好地处理聚类成员间的质量差异,并能有效地消减噪声对融合的影响,从而得到更好的聚类融合结果.  相似文献   

2.
Assessment of clustering tendency is an important first step in cluster analysis. One tool for assessing cluster tendency is the Visual Assessment of Tendency (VAT) algorithm. VAT produces an image matrix that can be used for visual assessment of cluster tendency in either relational or object data. However, VAT becomes intractable for large data sets. The revised VAT (reVAT) algorithm reduces the number of computations done by VAT, and replaces the image matrix with a set of profile graphs that are used for the visual assessment step. Thus, reVAT overcomes the large data set problem which encumbers VAT, but presents a new problem: interpretation of the set of reVAT profile graphs becomes very difficult when the number of clusters is large, or there is significant overlap between groups of objects in the data. In this paper, we propose a new algorithm called bigVAT which (i) solves the large data problem suffered by VAT, and (ii) solves the interpretation problem suffered by reVAT. bigVAT combines the quasi-ordering technique used by reVAT with an image display of the set of profile graphs displaying the clustering tendency information with a VAT-like image. Several numerical examples are given to illustrate and support the new technique.  相似文献   

3.
Partitional clustering of categorical data is normally performed by using K-modes clustering algorithm, which works well for large datasets. Even though the design and implementation of K-modes algorithm is simple and efficient, it has the pitfall of randomly choosing the initial cluster centers for invoking every new execution that may lead to non-repeatable clustering results. This paper addresses the randomized center initialization problem of K-modes algorithm by proposing a cluster center initialization algorithm. The proposed algorithm performs multiple clustering of the data based on attribute values in different attributes and yields deterministic modes that are to be used as initial cluster centers. In the paper, we propose a new method for selecting the most relevant attributes, namely Prominent attributes, compare it with another existing method to find Significant attributes for unsupervised learning, and perform multiple clustering of data to find initial cluster centers. The proposed algorithm ensures fixed initial cluster centers and thus repeatable clustering results. The worst-case time complexity of the proposed algorithm is log-linear to the number of data objects. We evaluate the proposed algorithm on several categorical datasets and compared it against random initialization and two other initialization methods, and show that the proposed method performs better in terms of accuracy and time complexity. The initial cluster centers computed by the proposed approach are close to the actual cluster centers of the different data we tested, which leads to faster convergence of K-modes clustering algorithm in conjunction to better clustering results.  相似文献   

4.
Clustering is an important unsupervised learning technique widely used to discover the inherent structure of a given data set. Some existing clustering algorithms uses single prototype to represent each cluster, which may not adequately model the clusters of arbitrary shape and size and hence limit the clustering performance on complex data structure. This paper proposes a clustering algorithm to represent one cluster by multiple prototypes. The squared-error clustering is used to produce a number of prototypes to locate the regions of high density because of its low computational cost and yet good performance. A separation measure is proposed to evaluate how well two prototypes are separated. Multiple prototypes with small separations are grouped into a given number of clusters in the agglomerative method. New prototypes are iteratively added to improve the poor cluster separations. As a result, the proposed algorithm can discover the clusters of complex structure with robustness to initial settings. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the proposed clustering algorithm.  相似文献   

5.
提出了一种基于Agent技术的机群智能构造器CIB。通过一种系统构造机制,CIB实现了对机群系统的自动配置、部署和引导,从而构造出用户定制的机群系统。同时,它提供了遵循用户心智模型的GUI,减轻了用户的认知负担。文章概述了CIB系统提出的背景,分析了同类机群管理软件的不足之处,介绍了采用Agent机制解决问题的方法,描述了CIB的设计与实现,并从易用性和效率两个方面对系统进行了评测。  相似文献   

6.
基于集群服务器的容灾系统的副本管理研究*   总被引:4,自引:0,他引:4  
提出一种基于集群服务器的容灾系统副本管理方案,提出多个副本的一致性维护和副本选择的算法以及副本数量和分布方式的数学模型。通过容灾系统的性能测试实验,证明它能够实现数据的快速自动恢复,有效地管理副本,并保持副本可靠性和集群服务器性能之间的平衡。  相似文献   

7.
Cluster ensembles have been shown to be better than any standard clustering algorithm at improving accuracy and robustness across different data collections. This meta-learning formalism also helps users to overcome the dilemma of selecting an appropriate technique and the corresponding parameters, given a set of data to be investigated. Almost two decades after the first publication of a kind, the method has proven effective for many problem domains, especially microarray data analysis and its down-streaming applications. Recently, it has been greatly extended both in terms of theoretical modelling and deployment to problem solving. The survey attempts to match this emerging attention with the provision of fundamental basis and theoretical details of state-of-the-art methods found in the present literature. It yields the ranges of ensemble generation strategies, summarization and representation of ensemble members, as well as the topic of consensus clustering. This review also includes different applications and extensions of cluster ensemble, with several research issues and challenges being highlighted.  相似文献   

8.
We introduce a new validity index for crisp clustering that is based on the average normality of the clusters. Unlike methods based on inter-cluster and intra-cluster distances, this index emphasizes the cluster shape by using a high order characterization of its probability distribution. The normality of a cluster is characterized by its negentropy, a standard measure of the distance to normality which evaluates the difference between the cluster's entropy and the entropy of a normal distribution with the same covariance matrix. The definition of the negentropy involves the distribution's differential entropy. However, we show that it is possible to avoid its explicit computation by considering only negentropy increments with respect to the initial data distribution, where all the points are assumed to belong to the same cluster. The resulting negentropy increment validity index only requires the computation of covariance matrices. We have applied the new index to an extensive set of artificial and real problems where it provides, in general, better results than other indices, both with respect to the prediction of the correct number of clusters and to the similarity among the real clusters and those inferred.  相似文献   

9.
一个基于Linux的集群部署方案   总被引:2,自引:0,他引:2  
集群提供了强大的批处理和并行计算能力,代表了高性能计算机发展的新方向,但也同时具有不易管理、故障率高、维护繁琐等问题.通过研究Linux启动过程,结合远程启动技术,提出一种基于Linux的集群部署方案.有效地解决了集群的安装、升级、备份等问题,方便了用户的使用和管理,为实现集群的高性能、高可靠性、高可用性提供了良好的系统支持,极大地简化了系统管理.并且针对高性能计算集群的特点,构建了一个针对上海大学自强3000的集群部署系统(Cluster Deployment System,CDS).  相似文献   

10.
Enterprises in an industrial cluster could dynamically alliance in the form of cluster supply chains to share inner-cluster resources and services, and respond to the ever-fluctuating customer demands in a cost-effective way. However, an effective and feasible method enabling such dynamic cluster supply chain configuration (CSCC) lags behind practice due to the conflict of interests. Researchers are designing All-in-One theoretic models to optimize CSCC with the assumed decision details of all enterprises, while in fact clustered enterprises are seeking effective decentralized decision mechanisms which protect their decision autonomy in the frequently re-configured CSC. A newly emerged multi-disciplinary optimization method, Augmented Lagrangian Coordination (ALC), which supports the open-structure collaboration with strict optimization convergence, is thoroughly investigated in this paper and applied to solve the conflict. Through a complete analysis of CSC’s configuration policies in typical stages, a generic CSCC model is proposed and then partitioned into an ALC-based decentralized decision model by the typical decision autonomy distribution in clusters. Clustered enterprises collaborate vertically and laterally along the ALC model through multi-dimensional couplings to achieve the overall consistency and optimality. Results have proved the effectiveness of ALC for CSCC problem. A set of sensitivity analysis is also conducted to find out the condition in which an order has to be fulfilled in a CSC and the most appropriate configuration.  相似文献   

11.
A robust linear parameter varying (LPV) identification/invalidation method is presented. Starting from a given initial model, the proposed method modifies it and produces an LPV model consistent with the assumed uncertainty/noise bounds and the experimental information. This procedure may complement existing nominal LPV identification algorithms, by adding the uncertainty and noise bounds which produces a set of models consistent with the experimental evidence. Unlike standard invalidation results, the proposed method allows the computation of the necessary changes to the initial model in order to place it within the consistency set. Similar to previous LPV identification procedures, the initial parameter dependency is fixed in advance, but here a methodology to modify this dependency is presented. In addition, all calculations are made on state‐space matrices which simplifies further controller design computations. The application of the proposed method to the identification of nonlinear systems is also discussed. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

12.
Provisioning of quality of service (QoS) is the ultimate goal for any wireless sensor network (WSN). Several factors can influence this requirement such as the adopted cluster formation algorithm. Almost all WSNs are structured based on grouping the sensors nodes into clusters. Not all contemporary cluster formation and routing algorithms (e.g. LEACH) were designed to provide/sustain certain QoS requirement such as delay constraint. Another fundamental design issue is that, these algorithms were built and tested under the assumption of uniformly distributed sensor nodes. However, this assumption is not always true. In some industrial applications and due to the scope of the ongoing monitoring process, sensors are installed and condensed in certain areas, while they are widely separated in other areas. Also unlike the random deployment distributions, there are many applications that need deterministic deployment of sensors like grid distribution. In this work, we investigated and characterized the impact of sensor node deployment distributions on the performance of different flavors of LEACH routing algorithm. In particular, we studied via extensive simulation experiments how LEACH cluster formation approach affects the delay (inter and intra-cluster delay) and energy efficiency expressed in terms of packet/joule for different base station locations and data loads. In this study, we consider four deployment distributions: grid, normal, exponential and uniform. The results showed the significant impact of nodes distribution on the network energy efficiency, throughput and delay performance measures. These findings would help the architects of real time application wireless sensor networks such as secure border sensor networks to design such networks to meet its specifications effectively and fulfill their critical mission.  相似文献   

13.
In this paper the problem of automatic clustering a data set is posed as solving a multiobjective optimization (MOO) problem, optimizing a set of cluster validity indices simultaneously. The proposed multiobjective clustering technique utilizes a recently developed simulated annealing based multiobjective optimization method as the underlying optimization strategy. Here variable number of cluster centers is encoded in the string. The number of clusters present in different strings varies over a range. The points are assigned to different clusters based on the newly developed point symmetry based distance rather than the existing Euclidean distance. Two cluster validity indices, one based on the Euclidean distance, XB-index, and another recently developed point symmetry distance based cluster validity index, Sym-index, are optimized simultaneously in order to determine the appropriate number of clusters present in a data set. Thus the proposed clustering technique is able to detect both the proper number of clusters and the appropriate partitioning from data sets either having hyperspherical clusters or having point symmetric clusters. A new semi-supervised method is also proposed in the present paper to select a single solution from the final Pareto optimal front of the proposed multiobjective clustering technique. The efficacy of the proposed algorithm is shown for seven artificial data sets and six real-life data sets of varying complexities. Results are also compared with those obtained by another multiobjective clustering technique, MOCK, two single objective genetic algorithm based automatic clustering techniques, VGAPS clustering and GCUK clustering.  相似文献   

14.
在基于分簇算法的无线传感器网络中,簇头的能量消耗远高于簇内成员。考虑在相应簇内的成员节点中产生助理簇头,由其分担簇头的负担,对降低簇头能量消耗具有极大帮助。提出了一种助理簇头算法(ASCH),算法根据簇头的自身条件动态地确定簇内是否需要产生助理簇头,同时选择合适的成员节点成为助理簇头。实验结果表明,提出的算法与LEACH算法相比,能耗更加均衡,有效地降低了网络能耗,延长了网络生命周期。  相似文献   

15.
聚类分析在搜索引擎中的应用   总被引:8,自引:0,他引:8  
为了快速、准确地从因特网上找到人们所需的信息,对网页信息进行聚类分析是非常重要的。该文分析了几种适用于搜索引擎的聚类方法,并讨论了聚类分析在设计搜索引擎时的应用。  相似文献   

16.
17.
计算机机群技术及其在Web领域中的应用   总被引:1,自引:0,他引:1  
现今很多应用需要将多台计算机组织起来进行协同工作,来模拟一台功能更强大的计算机以解决问题,因此机群技术得到了广泛应用.本文介绍了机群技术的基本概念和特点,以图示形式描绘了机群的体系结构,然后重点介绍了机群技术在Web领域中的实际应用,最后对机群技术的发展做了展望.  相似文献   

18.
This paper provides a case study of specifying an abstract memory consistency model, providing possible implementations for the model, and proving the correctness of implementations. Specifically, we introduce a class of memory consistency models called partition consistency. Existing abstract consistency models such as sequential consistency, piplined-RAM, Goodman’s processor consistency, and coherence are all members of the partition consistency class. A concrete message-passing network model is also specified. Implementations of partition consistency on this network model are then presented and proved correct. A middle level of abstraction is utilized to facilitate the proofs. All three levels of abstraction are specified using the same framework. The paper aims to illustrate a general methodology and techniques for specifying memory consistency models and proving the correctness of their implementations.  相似文献   

19.
A key aspect of resource management is efficient and effective deployment of available resources whenever needed. The issue typically covers two areas: monitoring of resources used by software systems and managing the consumption of resources. A key aspect of each monitoring system is its reconfigurability – the ability of a system to limit the number of resources monitored at a given time to those that are really necessary at any particular moment. The authors of this article propose a fully dynamic and reconfigurable monitoring system based on the concept of Adaptable Aspect-Oriented Programming (AAOP) in which a set of AOP aspects is used to run an application in a manner specified by the adaptability strategy. The model can be used to implement systems that are able to monitor an application and its execution environment and perform actions such as changing the current set of resource management constraints applied to an application if the application/environment conditions change. Any aspect that implements a predefined interface may be used by the AAOP-based monitoring system as a source of information. The system utilizes the concept of dynamic AOP, meaning that the aspects (which are sources of information) may be dynamically enabled/disabled.  相似文献   

20.
Given the source and destination locations of n group members and a set of required point of interest (POI) types such as restaurants and shopping centers, a Group Trip Scheduling (GTS) query schedules n individual trips such that each POI type is included in exactly one trip and an aggregate trip overhead distance for visiting the required POI types is minimized. Each trip starts at a member’s source location, goes through some POIs, and ends at the member’s destination location. The trip distance of a group member is the distance from her source to destination via the POIs that the group member visits, and the trip overhead distance of the group member is measured by subtracting the distance between her source and destination locations (without visiting any POI type) from her trip distance. The aggregate trip overhead distance is either the summation or the maximum of the trip overhead distances of the group members for visiting the POIs. A GTS query enables a group to schedule independent trips for its members in order to perform a set of tasks with the minimum travel cost. For example, family members normally have many outdoor tasks to perform within a short time for the proper management of home. The members may need to go to a bank to withdraw or deposit money, a pharmacy to buy medicine, or a supermarket to buy groceries. Similarly, organizers of an event may need to visit different POI types to perform many tasks. These scenarios motivate us to introduce a GTS query, a novel query type in spatial databases. We develop an efficient approach to process GTS queries and variants for the Euclidean space and road networks. By exploiting geometric properties, we refine the POI search space and prune POIs, which in turn reduce the query processing overhead significantly. In addition, we propose a dynamic programming technique to eliminate the trip combinations that cannot be part of the optimal query answer. We show that processing a GTS query is NP-hard and propose an approximation algorithm to further reduce the query processing overhead. We perform extensive experiments using real and synthetic datasets and show that our approach outperforms a straightforward approach with a large margin.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号