首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
Clustering analysis of temporal gene expression data is widely used to study dynamic biological systems, such as identifying sets of genes that are regulated by the same mechanism. However, most temporal gene expression data often contain noise, missing data points, and non-uniformly sampled time points, which imposes challenges for traditional clustering methods of extracting meaningful information. In this paper, we introduce an improved clustering approach based on the regularized spline regression and an energy based similarity measure. The proposed approach models each gene expression profile as a B-spline expansion, for which the spline coefficients are estimated by regularized least squares scheme on the observed data. To compensate the inadequate information from noisy and short gene expression data, we use its correlated genes as the test set to choose the optimal number of basis and the regularization parameter. We show that this treatment can help to avoid over-fitting. After fitting the continuous representations of gene expression profiles, we use an energy based similarity measure for clustering. The energy based measure can include the temporal information and relative changes of the time series using the first and second derivatives of the time series. We demonstrate that our method is robust to noise and can produce meaningful clustering results.  相似文献   

2.
Gene expression data generated by DNA microarray experiments provide a vast resource of medical diagnostic and disease understanding. Unfortunately, the large amount of data makes it hard, sometimes even impossible, to understand the correct behavior of genes. In this work, we develop a possibilistic approach for mining gene microarray data. Our model consists of two steps. In the first step, we use possibilistic clustering to partition the data into groups (or clusters). The optimal number of clusters is evaluated automatically from the data using the Information Entropy as a validity measure. In the second step, we select from each computed cluster the most representative genes and model them as a graph called a proximity graph. This set of graphs (or hyper-graph) will be used to predict the function of new and previously unknown genes. Experimental results using real-world data sets reveal a good performance and a high prediction accuracy of our model.  相似文献   

3.
Microarray technologies are employed to simultaneously measure expression levels of thousands of genes. Data obtained from such experiments allow inference of individual gene functions, help to identify genes from specific tissues, to analyze the behavior of gene expression levels under various environmental conditions and under different cell cycle stages, and to identify inappropriately transcribed genes and several genetic diseases, among many other applications. As thousands of genes may be involved in a microarray experiment, computational tools for organizing and providing possible visualizations of the genes and their relationships are crucial to the understanding and analysis of the data. This work proposes an algorithm based on artificial immune systems for organizing gene expression data in order to simultaneously reveal multiple features in large amounts of data. A distinctive property of the proposed algorithm is the ability to provide a diversified set of high-quality rearrangements of the genes, opening up the possibility of identifying various co-regulated genes from representative graphical configurations of the expression levels. This is a very useful approach for biologists, because several co-regulated genes may exist under different conditions.  相似文献   

4.
对于时间序列的基因表达数据,传统的聚类算法都是以距离为相似性度量标准,没有考虑基因随时间变化的相似趋势。从基因变化的趋势出发,构造了一种新的模糊相似关系矩阵,提出了改进的基于模糊相似关系的聚类算法,并以该算法计算FCM的初始聚类中心。将该方法应用在酵母菌基因表达数据中,实验结果表明该算法不仅克服了FCM算法易陷入局部极小值、对初值敏感的缺点,而且能够发现一些表达模式变化趋势相似的共调控基因。  相似文献   

5.
Hierarchical clustering is a stepwise clustering method usually based on proximity measures between objects or sets of objects from a given data set. The most common proximity measures are distance measures. The derived proximity matrices can be used to build graphs, which provide the basic structure for some clustering methods. We present here a new proximity matrix based on an entropic measure and also a clustering algorithm (LEGCIust) that builds layers of subgraphs based on this matrix and uses them and a hierarchical agglomerative clustering technique to form the clusters. Our approach capitalizes on both a graph structure and a hierarchical construction. Moreover, by using entropy as a proximity measure, we are able, with no assumption about the cluster shapes, to capture the local structure of the data, forcing the clustering method to reflect this structure. We present several experiments on artificial and real data sets that provide evidence on the superior performance of this new algorithm when compared with competing ones.  相似文献   

6.
Hierarchical clustering is a stepwise clustering method usually based on proximity measures between objects or sets of objects from a given data set. The most common proximity measures are distance measures. The derived proximity matrices can be used to build graphs, which provide the basic structure for some clustering methods. We present here a new proximity matrix based on an entropic measure and also a clustering algorithm (LEGClust) that builds layers of subgraphs based on this matrix, and uses them and a hierarchical agglomerative clustering technique to form the clusters. Our approach capitalizes on both a graph structure and a hierarchical construction. Moreover, by using entropy as a proximity measure we are able, with no assumption about the cluster shapes, to capture the local structure of the data, forcing the clustering method to reflect this structure. We present several experiments on artificial and real data sets that provide evidence on the superior performance of this new algorithm when compared with competing ones.  相似文献   

7.
In recent year, the problem of clustering in microarray data has been gaining significant attention. However most of the clustering methods attempt to find the group of genes where the number of cluster is known a priori. This fact motivated us to develop a new real-coded improved differential evolution based automatic fuzzy clustering algorithm which automatically evolves the number of clusters as well as the proper partitioning of a gene expression data set. To improve the result further, the clustering method is integrated with a support vector machine, a well-known technique for supervised learning. A fraction of the gene expression data points selected from different clusters based on their proximity to the respective centers, is used for training the SVM. The clustering assignments of the remaining gene expression data points are thereafter determined using the trained classifier. The performance of the proposed clustering technique has been demonstrated on five gene expression data sets by comparing it with the differential evolution based automatic fuzzy clustering, variable length genetic algorithm based fuzzy clustering and well known Fuzzy C-Means algorithm. Statistical significance test has been carried out to establish the statistical superiority of the proposed clustering approach. Biological significance test has also been carried out using a web based gene annotation tool to show that the proposed method is able to produce biologically relevant clusters of genes. The processed data sets and the matlab version of the software are available at http://bio.icm.edu.pl/~darman/IDEAFC-SVM/.  相似文献   

8.
袁夏  赵春霞 《机器人》2011,33(1):90-96
提出一种适用于机器人导航和环境理解的聚类算法,该算法用来处理各向异性分布的点云数据.算法的基本思想是基于点云的密度分布变化和空间位置分布的不同进行聚类,将信息聚类思想触入传统的DBSCAN算法,既保留了DBSCAN算法抗噪声能力强的优点,又结合点云的空间概率分布改善了聚类结果.算法采用自适应的实时参数估计方法克服全局参...  相似文献   

9.
建立病变组织分类模型的关键在于找出一组能准确区分样本类别的特征基因。糙集理论中的属性依赖度分析方法能对目标数据进行有效分析。基于属性间的依赖关系和属性对决策的影响存在这样的关系,即属性依赖度越大,属性就越重要,对决策划分的影响就越大,提出了一种属性最大依赖度(maximum dependency of attributes based on rough sets,MDA-RS)算法,并将其应用于特征基因选取。首先用启发式K-均值聚类算法对基因进行聚类分析得到类数为k的基因子集;然后用MDA-RS选出每类的  相似文献   

10.
Microarray technology provides a simple way for collecting huge amounts of data on the expression level of thousands of genes. Detecting similarities among genes is a fundamental task, both to discover previously unknown gene functions and to focus the analysis on a limited set of genes rather than on thousands of genes. Similarity between genes is usually evaluated by analyzing their expression values. However, when additional information is available (e.g., clinical information), it may be beneficial to exploit it. In this paper, we present a new similarity measure for genes, based on their classification power, i.e., on their capability to separate samples belonging to different classes. Our method exploits a new gene representation that measures the classification power of each gene and defines the classification distance as the distance between gene classification powers. The classification distance measure has been integrated in a hierarchical clustering algorithm, but it may be adopted also by other clustering algorithms. The result of experiments runs on different microarray datasets supports the intuition of the proposed approach.  相似文献   

11.
Adaptive control design for a boost inverter   总被引:1,自引:0,他引:1  
In this paper, a novel control strategy for a nonlinear boost inverter is proposed. The idea is based on generating an autonomous oscillator that does not need an external reference signal. This aim is achieved by using energy-shaping methodology with a suitable Hamiltonian function which defines the desired system behavior. A phase controller is added to the control law in order to achieve 180°-synchronization between both parts of the circuit as well as synchronize the voltage output with a pre-specified signal, e.g. synchronization with the electrical grid. An adaptive control is designed for dealing with the common problem of unknown load. In order to analyze the stability of the full system, singular perturbation approach is used. The resulting control is tested by means of simulations.  相似文献   

12.
针对基因表达数据基于表达相似的聚类分析并不能完全揭示基因之间的功能相似问题,结合基因的传输互表达关系,提出基于传输互表达的聚类分析方法。首先用基因的表达相关来构建基因相关图,然后通过最短路分析来获得基因之间传输互表达关系并作为基因的相似测度,再用k-均值聚类算法进行聚类分析。对Yeast基因表达数据进行聚类实验,并与基于表达相似的聚类结果对比。实验结果表明,基于传输互表达的聚类方法能获得更好的聚类性能和较高的聚类正确率,验证基于传输互表达的基因聚类更能揭示基因相似的本质。  相似文献   

13.
考虑到行驶工况对具有多个动力源的PHEV燃油经济性的显著影响,提出一种基于K-means++工况识别的能量管理策略.以ADVISOR中30种标准行驶工况构建组合工况,在工况片段划分与工况识别周期选取的基础上,结合K-means++聚类算法得到四种聚类结果,分别对应拥堵、城市、郊区以及高速四种典型行驶工况.建立发动机油耗...  相似文献   

14.
Clustering is an underspecified task: there are no universal criteria for what makes a good clustering. This is especially true for relational data, where similarity can be based on the features of individuals, the relationships between them, or a mix of both. Existing methods for relational clustering have strong and often implicit biases in this respect. In this paper, we introduce a novel dissimilarity measure for relational data. It is the first approach to incorporate a wide variety of types of similarity, including similarity of attributes, similarity of relational context, and proximity in a hypergraph. We experimentally evaluate the proposed dissimilarity measure on both clustering and classification tasks using data sets of very different types. Considering the quality of the obtained clustering, the experiments demonstrate that (a) using this dissimilarity in standard clustering methods consistently gives good results, whereas other measures work well only on data sets that match their bias; and (b) on most data sets, the novel dissimilarity outperforms even the best among the existing ones. On the classification tasks, the proposed method outperforms the competitors on the majority of data sets, often by a large margin. Moreover, we show that learning the appropriate bias in an unsupervised way is a very challenging task, and that the existing methods offer a marginal gain compared to the proposed similarity method, and can even hurt performance. Finally, we show that the asymptotic complexity of the proposed dissimilarity measure is similar to the existing state-of-the-art approaches. The results confirm that the proposed dissimilarity measure is indeed versatile enough to capture relevant information, regardless of whether that comes from the attributes of vertices, their proximity, or connectedness of vertices, even without parameter tuning.  相似文献   

15.
16.
In this paper, we introduce ordinal proximity measures in the setting of unbalanced qualitative scales by comparing the proximities between linguistic terms without numbers, in a purely ordinal approach. With this new tool, we propose how to measure the consensus in a set of agents when they assess a set of alternatives through an unbalanced qualitative scale. We also introduce an agglomerative hierarchical clustering procedure based on these consensus measures.  相似文献   

17.
Gene selection is one of the important issues for cancer classification based on gene expression profiles. Filter and wrapper approaches are widely used for gene selection, where the former is hard to measure the relationship between genes and the latter requires lots of computation. We present a novel method, called gene boosting, to select relevant gene subsets by integrating filter and wrapper approaches. It repeatedly selects a set of top-ranked informative genes by a filtering algorithm with respect to a temporal training dataset constructed according to the classification result for the original training dataset. Empirical results on three microarray benchmark datasets have shown that the proposed method is effective and efficient in finding a relevant and concise gene subset. It achieved competitive performance with fewer genes in a reasonable time, as well as led to the identification of some genes frequently getting selected.  相似文献   

18.
In this paper, the support vector clustering is extended to an adaptive cell growing model which maps data points to a high dimensional feature space through a desired kernel function. This generalized model is called multiple spheres support vector clustering, which essentially identifies dense regions in the original space by finding their corresponding spheres with minimal radius in the feature space. A multisphere clustering algorithm based on adaptive cluster cell growing method is developed, whereby it is possible to obtain the grade of memberships, as well as cluster prototypes in partition. The effectiveness of the proposed algorithm is demonstrated for the problem of arbitrary cluster shapes and for prototype identification in an actual application to a handwritten digit data set.  相似文献   

19.
Data mining consists of a set of powerful methods that have been successfully applied to many different application domains, including business, engineering, and bioinformatics. In this paper, we propose an innovative approach that uses genetic algorithms to mine a set of temporal behavior data output by a biological system in order to determine the kinetic parameters of the system. Analyzing the behavior of a biological network is a complicated task. In our approach, the machine learning method is integrated with the framework of system dynamics so that its findings are expressed in a form of system dynamics model. An application of the method to the cell division cycle model has shown that the method can discover approximate parametric values of the system and reproduce the input behavior.  相似文献   

20.
In some applications of industrial robots, the robot manipulator must traverse a pre-specified Cartesian path with its hand tip while links of the robot safely move among obstacles cluttered in the robot's scene (environment). In order to reduce the costs of collision detection, one approach is to reduce the number of collision checks by enclosing a few real obstacles with a larger (artificial) bounding volume (a cluster), e.g., by their convex hull [4, 14], without cutting the specified path.In this paper, we propose a recursive algorithm composed of four procedures to tackle the problem of clustering convex polygons cluttered around a specified path in a dynamic environment. A key fact observed is that the number k of clusters is actually determined by the specified path not by any criterion used in clustering. Based on this fact, an initial set of k clusters could be rapidly generated. Then, the initial set of clusters and its number is further refined for satisfying the minimum Euclidean distance criterion imposed in clustering. Compared to the heuristic algorithm in [14], complexity of the proposed algorithm is reduced by one order with respect to the number n of obstacles. Simulation are performed in both static and dynamic environments, which show that the recursive algorithm is very efficient and acquires less number k of clusters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号