首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A biclustering algorithm extends conventional clustering techniques to extract all of the meaningful subgroups of genes and conditions in the expression matrix of a microarray dataset. However, such algorithms are very sensitive to input parameters and show poor scalability. This paper proposes a scalable unsupervised biclustering framework, SUBic, to find high quality constant-row biclusters in an expression matrix effectively. A one-dimensional clustering algorithm is proposed to partition the attributes, that is, columns of an expression matrix into disjoint groups based on the similarity of expression values. These groups form a set of short transactions and are used to discover a set of frequent itemsets each of which corresponds to a bicluster. However, a bicluster may include any attribute whose expression value is not similar enough to others, so a bicluster refinement is used to enhance the quality of a bicluster by removing those attributes based on its distribution of expression values. The performance of the proposed method is comparatively analyzed through a series of experiments on synthetic and real datasets.  相似文献   

2.
A biclustering algorithm, based on a greedy technique and enriched with a local search strategy to escape poor local minima, is proposed. The algorithm starts with an initial random solution and searches for a locally optimal solution by successive transformations that improve a gain function. The gain function combines the mean squared residue, the row variance, and the size of the bicluster. Different strategies to escape local minima are introduced and compared. Experimental results on several microarray data sets show that the method is able to find significant biclusters, also from a biological point of view.  相似文献   

3.
Bagging for path-based clustering   总被引:3,自引:0,他引:3  
A resampling scheme for clustering with similarity to bootstrap aggregation (bagging) is presented. Bagging is used to improve the quality of path-based clustering, a data clustering method that can extract elongated structures from data in a noise robust way. The results of an agglomerative optimization method are influenced by small fluctuations of the input data. To increase the reliability of clustering solutions, a stochastic resampling method is developed to infer consensus clusters. A related reliability measure allows us to estimate the number of clusters, based on the stability of an optimized cluster solution under resampling. The quality of path-based clustering with resampling is evaluated on a large image data set of human segmentations.  相似文献   

4.
现有的双聚类算法缺乏发现具有重叠结构双聚类的能力,无法有效发现基因表达数据中隐藏的相应双聚类结构,并且在增删条件过程中均未考虑条件重要性对双聚类结果的影响.针对上述问题,文中提出基于加权均方残差的改进双聚类算法.首先利用重叠率和隶属度控制的模糊划分将基因集划分为初始双聚类,然后在最小化目标函数过程中迭代修改各双簇中条件的权重,最后利用加权的均方残差添加符合条件的基因,删除优化的双聚类中一致波动性不好的基因,得到最终的双聚类集.实验表明,文中算法不仅能生成具有共表达水平大小不同的双簇,并且能将重叠率控制在合理范围内.  相似文献   

5.
6.
研究了从基因芯片中挖掘差异双聚类的算法。差异双聚类中的基因在不同类别的数据中表达水准不同,这样的差异双聚类可以有效地找出影响基因表达水平的关键实验因素以及对实验条件敏感的基因。传统的双聚类方法采取分别在两类基因数据中找出聚类,再进行比较以得到最终的差异双聚类,该策略的时间效率不高。为了快速地找出差异双聚类,提出一个全新的基于权值图的差异双聚类方法,该方法的主要创新之处在于直接在由两类数据构成的权值图上挖掘双聚类,避免了分别挖掘再比较的步骤。实验结果证实该算法具有较高的运行效率。  相似文献   

7.
对某种生物而言, 在某段连续时间内共表达的基因预示着其在同时完成某一生物过程或其间存在某种调控关系; 而目前在基因表达数据上的大多数双聚类算法都是针对非连续样本点的情况提出的, 对于连续样本点(样本之间存在顺序关系)的情况很少涉及。因此在考虑连续样本点的情况下, 提出了一种在时序基因表达数据上挖掘极大一致趋势共表达基因集的双聚类算法TCBicluster。在每个时间点产生行常量共表达基因集, 进而构造以时间点为顶点、以相邻时间点间满足一致性要求的共表达基因集为边的权值图, 并采用扩展连续时间点的方式对权值图进行双聚类挖掘, 使用有效的剪枝策略提高算法效率。实验证明, TCBicluster算法比RAP及CC-TSB算法更能有效挖掘极大一致趋势共表达双聚类且具有较高的效率和良好的可扩展性。  相似文献   

8.
The regression problem of modeling several response variables using the same set of input variables is considered. The model is linearly parameterized and the parameters are estimated by minimizing the error sum of squares subject to a sparsity constraint. The constraint has the effect of eliminating useless inputs and constraining the parameters of the remaining inputs in the model. Two algorithms for solving the resulting convex cone programming problem are proposed. The first algorithm gives a pointwise solution, while the second one computes the entire path of solutions as a function of the constraint parameter. Based on experiments with real data sets, the proposed method has a similar performance to existing methods. In simulation experiments, the proposed method is competitive both in terms of prediction accuracy and correctness of input selection. The advantages become more apparent when many correlated inputs are available for model construction.  相似文献   

9.
Arto Klami 《Machine Learning》2013,92(2-3):225-250
Matching of object refers to the problem of inferring unknown co-occurrence or alignment between observations or samples in two data sets. Given two sets of equally many samples, the task is to find for each sample a representative sample in the other set, without prior knowledge on a distance measure between the sets. Given a distance measure, the problem would correspond to a linear assignment problem, the problem of finding a permutation that re-orders samples in one set to minimize the total distance. When no such measure is available, we need to consider more complex solutions. Typical approaches maximize statistical dependency between the two sets, whereas in this work we present a Bayesian solution that builds a joint model for the two sources. We learn a Bayesian canonical correlation analysis model that includes a permutation parameter for re-ordering the samples in one of the sets. We provide both variational and sampling-based inference for approximative Bayesian analysis, and demonstrate on three data sets that the resulting methods outperform the earlier solutions.  相似文献   

10.
11.
在示教机械臂姿态解算精度优化的研究中,针对使用单组传感器进行数据融合,姿态解算的传统方法中存在的精度低,稳定性差的问题,设计了一种组合MEMS传感器的姿态解算方法。将六组传感器安装于载体坐标系三个轴上,分别测量两组传感器数据。以传感器量测数据与四元数估计数据的向量积代替姿态角误差作为互补滤波器的输入量,分别利用模糊控制器和PI控制器,根据互补滤波原理调节陀螺仪输出量。通过拓展卡尔曼滤波器进行姿态估计,得到更精确的四元数,进而转化为姿态角。仿真结果表明,在静态和动态情况下,多组传感器组合调节后的姿态角数据相比单组传感器PI调节在姿态角精度和系统稳定性上有进一步提高。  相似文献   

12.
同一关联挖掘算法算法在不同性质的数据上会表现出不同的性能。针对该问题,提出一种有趣关联模式挖掘方法。介绍模式的兴趣度度量,引入兴趣度预处理过程,并将数据分为2种类型,分别采用不同的算法对这2类数据集进行挖掘。实例表明,该方法能有效提高输出模式的质量。  相似文献   

13.
双聚类方法是当前分析基因表达数据的一个重要研究方向,其挖掘目标是发现哪些基因在哪些实验条件下具有相似的表达水平或者关系密切.目前已提出了许多双聚类算法来挖掘不同类型的双聚类,然而其大部分挖掘效率不高.鉴于此,提出了一个新颖的挖掘算法——MRCluster,其主要是用来从原始的基因表达数据中挖掘最大的行常量双聚类模式.就其挖掘效率来说,它采用的是基于Apriori原则的基因扩展深度优先的挖掘策略,并且在挖掘过程中引入了一些新颖的剪枝技术来提高效率.将MRCluster和一个行常量双聚类模式挖掘方法RAP(range support pattern)算法进行比较,从实验结果上可以看出,相比RAP算法,MRCluster算法对在原始的基因表达数据中挖掘最大的行常量双聚类模式具有更好的效率.因此,MRCluster算法能够有效地从原始的基因表达数据中挖掘最大的行常量双聚类.  相似文献   

14.
We present two innovations that produce a novel approach to the problem of fuzzy soft set based decision making in the presence of multiobserver input parameter data sets. The first novelty consists of a new process of information fusion that furnishes a more reliable resultant fuzzy soft set from such input data set. The second one concerns the mechanism that decides among the alternatives in this resultant fuzzy soft set. It relies on scores computed from a relative Comparison matrix. The advantages of our novel procedure are a higher power of discrimination and a well-determined final solution.  相似文献   

15.
This paper presents an algorithmic method for solving the two-plant simultaneous bounded domain stabilization problem for SISO LTI systems. This problem has no closed form solution. The solution provides robust performance in the presence of sensor or actuator failure, or other major parameter changes. Vidyasagar (1987) studied a similar problem involving partially bounded stability domains. However, stability with respect to partially bounded domains only partially bound performance characteristics, such as control energy and transient response. The current investigation gives necessary conditions for simultaneous bounded domain stability and demonstrates a geometry-based solution algorithm which can be automated. The possible solutions to the problem and the admissible solutions are represented as sets of points in Euclidean space. The solution to the problem is found by using computational geometric techniques to detect points in the intersection of these two sets, if there is one, and deducing the simultaneous stabilizing compensator design from the points found in the intersection.  相似文献   

16.
The paper relates the stability of a vector (multiobjective) integer optimization problem to the stability of optimal and nonoptimal solutions of this problem. It is shown that the analysis of several types of stability of the problem of searching for Pareto optimal solutions can be reduced to the analysis of two sets consisting of points that stably belong and do not stably belong to the Pareto set. __________ Translated from Kibernetika i Sistemnyi Analiz, No. 3, pp. 142–148, May–June 2008.  相似文献   

17.
Stability-based validation of clustering solutions   总被引:1,自引:0,他引:1  
Data clustering describes a set of frequently employed techniques in exploratory data analysis to extract "natural" group structure in data. Such groupings need to be validated to separate the signal in the data from spurious structure. In this context, finding an appropriate number of clusters is a particularly important model selection question. We introduce a measure of cluster stability to assess the validity of a cluster model. This stability measure quantifies the reproducibility of clustering solutions on a second sample, and it can be interpreted as a classification risk with regard to class labels produced by a clustering algorithm. The preferred number of clusters is determined by minimizing this classification risk as a function of the number of clusters. Convincing results are achieved on simulated as well as gene expression data sets. Comparisons to other methods demonstrate the competitive performance of our method and its suitability as a general validation tool for clustering solutions in real-world problems.  相似文献   

18.
The task of discovering natural groupings of input patterns, or clustering, is an important aspect of machine learning and pattern analysis. In this paper, we study the widely used spectral clustering algorithm which clusters data using eigenvectors of a similarity/affinity matrix derived from a data set. In particular, we aim to solve two critical issues in spectral clustering: (1) how to automatically determine the number of clusters, and (2) how to perform effective clustering given noisy and sparse data. An analysis of the characteristics of eigenspace is carried out which shows that (a) not every eigenvectors of a data affinity matrix is informative and relevant for clustering; (b) eigenvector selection is critical because using uninformative/irrelevant eigenvectors could lead to poor clustering results; and (c) the corresponding eigenvalues cannot be used for relevant eigenvector selection given a realistic data set. Motivated by the analysis, a novel spectral clustering algorithm is proposed which differs from previous approaches in that only informative/relevant eigenvectors are employed for determining the number of clusters and performing clustering. The key element of the proposed algorithm is a simple but effective relevance learning method which measures the relevance of an eigenvector according to how well it can separate the data set into different clusters. Our algorithm was evaluated using synthetic data sets as well as real-world data sets generated from two challenging visual learning problems. The results demonstrated that our algorithm is able to estimate the cluster number correctly and reveal natural grouping of the input data/patterns even given sparse and noisy data.  相似文献   

19.
ABSTRACT

This study focuses in the output feedback stabilisation of constrained linear systems affected by uncertainties and noisy output measurements. The system states are restricted inside a given polytope and a classical Luenberger observer is used to reconstruct the unmeasurable states from output observations. Based on the observed states, a state feedback is proposed as the control input. The stability analysis and the control design are done using an extended version of the attractive ellipsoid method (AEM) approach. To avoid the violation of state constraints, this work proposes a barrier Lyapunov function (BLF) based analysis. The control parameters are obtained throughout the solution of some optimisation problems such that the BLF ensures an approximation of the constraints by a maximal ellipsoidal set and the AEM provides the characterisation of a minimal ultimately bounded set for the closed-loop system solutions. Numerical simulations show the advantages using the BFL-AEM methodology against classical sub-optimal controllers in academic second order and third order examples. Then, the proposed control strategy is applied over a Buck DC-DC converter. In all the cases, the method proposed here prevails over the other controllers.  相似文献   

20.
This work addresses the rolling element bearing (REB) fault classification problem by tackling the issue of identifying the appropriate parameters for the extreme learning machine (ELM) and enhancing its effectiveness. This study introduces a memetic algorithm (MA) to identify the optimal ELM parameter set for compact ELM architecture alongside better ELM performance. The goal of using MA is to investigate the promising solution space and systematically exploit the facts in the viable solution space. In the proposed method, the local search method is proposed along with link-based and node-based genetic operators to provide a tight ELM structure. A vibration data set simulated from the bearing of rotating machinery has been used to assess the performance of the optimized ELM with the REB fault categorization problem. The complexity involved in choosing a promising feature set is eliminated because the vibration data has been transformed into kurtograms to reflect the input of the model. The experimental results demonstrate that MA efficiently optimizes the ELM to improve the fault classification accuracy by around 99.0% and reduces the requirement of hidden nodes by 17.0% for both data sets. As a result, the proposed scheme is demonstrated to be a practically acceptable and well-organized solution that offers a compact ELM architecture in comparison to the state-of-the-art methods for the fault classification problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号