期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

SUBic: A Scalable Unsupervised Framework for Discovering High Quality Biclusters

Jooil Lee Yanhua Jin Won Suk Lee 《计算机科学技术学报》2013,28(4):636-646

A biclustering algorithm extends conventional clustering techniques to extract all of the meaningful subgroups of genes and conditions in the expression matrix of a microarray dataset. However, such algorithms are very sensitive to input parameters and show poor scalability. This paper proposes a scalable unsupervised biclustering framework, SUBic, to find high quality constant-row biclusters in an expression matrix effectively. A one-dimensional clustering algorithm is proposed to partition the attributes, that is, columns of an expression matrix into disjoint groups based on the similarity of expression values. These groups form a set of short transactions and are used to discover a set of frequent itemsets each of which corresponds to a bicluster. However, a bicluster may include any attribute whose expression value is not similar enough to others, so a bicluster refinement is used to enhance the quality of a bicluster by removing those attributes based on its distribution of expression values. The performance of the proposed method is comparatively analyzed through a series of experiments on synthetic and real datasets. 相似文献

2.

Random walk biclustering for microarray data

Fabrizio Angiulli Clara Pizzuti 《Information Sciences》2008,178(6):1479-1497

A biclustering algorithm, based on a greedy technique and enriched with a local search strategy to escape poor local minima, is proposed. The algorithm starts with an initial random solution and searches for a locally optimal solution by successive transformations that improve a gain function. The gain function combines the mean squared residue, the row variance, and the size of the bicluster. Different strategies to escape local minima are introduced and compared. Experimental results on several microarray data sets show that the method is able to find significant biclusters, also from a biological point of view. 相似文献

3.

Bagging for path-based clustering 总被引：3，自引：0，他引：3

Fischer B. Buhmann J.M. 《IEEE transactions on pattern analysis and machine intelligence》2003,25(11):1411-1415

A resampling scheme for clustering with similarity to bootstrap aggregation (bagging) is presented. Bagging is used to improve the quality of path-based clustering, a data clustering method that can extract elongated structures from data in a noise robust way. The results of an agglomerative optimization method are influenced by small fluctuations of the input data. To increase the reliability of clustering solutions, a stochastic resampling method is developed to infer consensus clusters. A related reliability measure allows us to estimate the number of clusters, based on the stability of an optimized cluster solution under resampling. The quality of path-based clustering with resampling is evaluated on a large image data set of human segmentations. 相似文献

4.

基于加权均方残差的改进双聚类算法^*

刘文华梁永全冯政《模式识别与人工智能》2016,29(6):519-526

现有的双聚类算法缺乏发现具有重叠结构双聚类的能力,无法有效发现基因表达数据中隐藏的相应双聚类结构,并且在增删条件过程中均未考虑条件重要性对双聚类结果的影响.针对上述问题,文中提出基于加权均方残差的改进双聚类算法.首先利用重叠率和隶属度控制的模糊划分将基因集划分为初始双聚类,然后在最小化目标函数过程中迭代修改各双簇中条件的权重,最后利用加权的均方残差添加符合条件的基因,删除优化的双聚类中一致波动性不好的基因,得到最终的双聚类集.实验表明,文中算法不仅能生成具有共表达水平大小不同的双簇,并且能将重叠率控制在合理范围内. 相似文献

5.

An improved combinatorial biclustering algorithm

Nosova Ekaterina Napolitano Francesco Amato Roberto Cocozza Sergio Miele Gennaro Raiconi Giancarlo Tagliaferri Roberto 《Neural computing & applications》2012,22(1):293-302

相似文献

6.

基于权值图的基因芯片数据差异双聚类挖掘算法*

刁静霓尚学群王淼缪苗《计算机应用研究》2011,28(1):48-50

研究了从基因芯片中挖掘差异双聚类的算法。差异双聚类中的基因在不同类别的数据中表达水准不同,这样的差异双聚类可以有效地找出影响基因表达水平的关键实验因素以及对实验条件敏感的基因。传统的双聚类方法采取分别在两类基因数据中找出聚类,再进行比较以得到最终的差异双聚类,该策略的时间效率不高。为了快速地找出差异双聚类,提出一个全新的基于权值图的差异双聚类方法,该方法的主要创新之处在于直接在由两类数据构成的权值图上挖掘双聚类,避免了分别挖掘再比较的步骤。实验结果证实该算法具有较高的运行效率。相似文献

7.

面向时序基因表达数据的双聚类算法

杨蜜静尚学群许涛王淼《计算机应用研究》2013,30(8):2308-2314

对某种生物而言, 在某段连续时间内共表达的基因预示着其在同时完成某一生物过程或其间存在某种调控关系; 而目前在基因表达数据上的大多数双聚类算法都是针对非连续样本点的情况提出的, 对于连续样本点（样本之间存在顺序关系）的情况很少涉及。因此在考虑连续样本点的情况下, 提出了一种在时序基因表达数据上挖掘极大一致趋势共表达基因集的双聚类算法TCBicluster。在每个时间点产生行常量共表达基因集, 进而构造以时间点为顶点、以相邻时间点间满足一致性要求的共表达基因集为边的权值图, 并采用扩展连续时间点的方式对权值图进行双聚类挖掘, 使用有效的剪枝策略提高算法效率。实验证明, TCBicluster算法比RAP及CC-TSB算法更能有效挖掘极大一致趋势共表达双聚类且具有较高的效率和良好的可扩展性。相似文献

8.

Input selection and shrinkage in multiresponse linear regression

Timo Similä Jarkko Tikka 《Computational statistics & data analysis》2007,52(1):406-422

The regression problem of modeling several response variables using the same set of input variables is considered. The model is linearly parameterized and the parameters are estimated by minimizing the error sum of squares subject to a sparsity constraint. The constraint has the effect of eliminating useless inputs and constraining the parameters of the remaining inputs in the model. Two algorithms for solving the resulting convex cone programming problem are proposed. The first algorithm gives a pointwise solution, while the second one computes the entire path of solutions as a function of the constraint parameter. Based on experiments with real data sets, the proposed method has a similar performance to existing methods. In simulation experiments, the proposed method is competitive both in terms of prediction accuracy and correctness of input selection. The advantages become more apparent when many correlated inputs are available for model construction. 相似文献

9.

Bayesian object matching

Arto Klami 《Machine Learning》2013,92(2-3):225-250

Matching of object refers to the problem of inferring unknown co-occurrence or alignment between observations or samples in two data sets. Given two sets of equally many samples, the task is to find for each sample a representative sample in the other set, without prior knowledge on a distance measure between the sets. Given a distance measure, the problem would correspond to a linear assignment problem, the problem of finding a permutation that re-orders samples in one set to minimize the total distance. When no such measure is available, we need to consider more complex solutions. Typical approaches maximize statistical dependency between the two sets, whereas in this work we present a Bayesian solution that builds a joint model for the two sources. We learn a Bayesian canonical correlation analysis model that includes a permutation parameter for re-ordering the samples in one of the sets. We provide both variational and sampling-based inference for approximative Bayesian analysis, and demonstrate on three data sets that the resulting methods outperform the earlier solutions. 相似文献

10.

Proximity multi-sphere support vector clustering

Trung Le Dat Tran Phuoc Nguyen Wanli Ma Dharmendra Sharma 《Neural computing & applications》2013,22(7-8):1309-1319

相似文献

11.

示教机械臂姿态解算改进方法仿真研究

下载免费PDF全文

黄洋姜文刚《计算机工程与应用》2018,54(15):126-130

在示教机械臂姿态解算精度优化的研究中,针对使用单组传感器进行数据融合,姿态解算的传统方法中存在的精度低,稳定性差的问题,设计了一种组合MEMS传感器的姿态解算方法。将六组传感器安装于载体坐标系三个轴上,分别测量两组传感器数据。以传感器量测数据与四元数估计数据的向量积代替姿态角误差作为互补滤波器的输入量,分别利用模糊控制器和PI控制器,根据互补滤波原理调节陀螺仪输出量。通过拓展卡尔曼滤波器进行姿态估计,得到更精确的四元数,进而转化为姿态角。仿真结果表明,在静态和动态情况下,多组传感器组合调节后的姿态角数据相比单组传感器PI调节在姿态角精度和系统稳定性上有进一步提高。相似文献

12.

一种有趣关联模式挖掘方法

刘晓素郭福亮《计算机工程》2010,36(11):36-38

同一关联挖掘算法算法在不同性质的数据上会表现出不同的性能。针对该问题,提出一种有趣关联模式挖掘方法。介绍模式的兴趣度度量,引入兴趣度预处理过程,并将数据分为2种类型,分别采用不同的算法对这2类数据集进行挖掘。实例表明,该方法能有效提高输出模式的质量。相似文献

13.

从基因表达数据中挖掘最大的行常量双聚类*

缪苗尚学群刘加财王淼《计算机应用研究》2011,28(12):4447-4450

双聚类方法是当前分析基因表达数据的一个重要研究方向,其挖掘目标是发现哪些基因在哪些实验条件下具有相似的表达水平或者关系密切.目前已提出了许多双聚类算法来挖掘不同类型的双聚类,然而其大部分挖掘效率不高.鉴于此,提出了一个新颖的挖掘算法——MRCluster,其主要是用来从原始的基因表达数据中挖掘最大的行常量双聚类模式.就其挖掘效率来说,它采用的是基于Apriori原则的基因扩展深度优先的挖掘策略,并且在挖掘过程中引入了一些新颖的剪枝技术来提高效率.将MRCluster和一个行常量双聚类模式挖掘方法RAP(range support pattern)算法进行比较,从实验结果上可以看出,相比RAP算法,MRCluster算法对在原始的基因表达数据中挖掘最大的行常量双聚类模式具有更好的效率.因此,MRCluster算法能够有效地从原始的基因表达数据中挖掘最大的行常量双聚类. 相似文献

14.

A novel algorithm for fuzzy soft set based decision making from multiobserver input parameter data set

《Information Fusion》2016

We present two innovations that produce a novel approach to the problem of fuzzy soft set based decision making in the presence of multiobserver input parameter data sets. The first novelty consists of a new process of information fusion that furnishes a more reliable resultant fuzzy soft set from such input data set. The second one concerns the mechanism that decides among the alternatives in this resultant fuzzy soft set. It relies on scores computed from a relative Comparison matrix. The advantages of our novel procedure are a higher power of discrimination and a well-determined final solution. 相似文献

15.

A geometric solution to the simultaneous bounded domain stabilization problem

James Douglas Gibson Guy O. Beale 《International journal of control》2013,86(17):1536-1547

This paper presents an algorithmic method for solving the two-plant simultaneous bounded domain stabilization problem for SISO LTI systems. This problem has no closed form solution. The solution provides robust performance in the presence of sensor or actuator failure, or other major parameter changes. Vidyasagar (1987) studied a similar problem involving partially bounded stability domains. However, stability with respect to partially bounded domains only partially bound performance characteristics, such as control energy and transient response. The current investigation gives necessary conditions for simultaneous bounded domain stability and demonstrates a geometry-based solution algorithm which can be automated. The possible solutions to the problem and the admissible solutions are represented as sets of points in Euclidean space. The solution to the problem is found by using computational geometric techniques to detect points in the intersection of these two sets, if there is one, and deducing the simultaneous stabilizing compensator design from the points found in the intersection. 相似文献

16.

Different types of stability of vector integer optimization problem: General approach

T. T. Lebedeva T. I. Sergienko 《Cybernetics and Systems Analysis》2008,44(3):429-433

The paper relates the stability of a vector (multiobjective) integer optimization problem to the stability of optimal and nonoptimal solutions of this problem. It is shown that the analysis of several types of stability of the problem of searching for Pareto optimal solutions can be reduced to the analysis of two sets consisting of points that stably belong and do not stably belong to the Pareto set. __________ Translated from Kibernetika i Sistemnyi Analiz, No. 3, pp. 142–148, May–June 2008. 相似文献

17.

Stability-based validation of clustering solutions 总被引：1，自引：0，他引：1

Lange T Roth V Braun ML Buhmann JM 《Neural computation》2004,16(6):1299-1323

Data clustering describes a set of frequently employed techniques in exploratory data analysis to extract "natural" group structure in data. Such groupings need to be validated to separate the signal in the data from spurious structure. In this context, finding an appropriate number of clusters is a particularly important model selection question. We introduce a measure of cluster stability to assess the validity of a cluster model. This stability measure quantifies the reproducibility of clustering solutions on a second sample, and it can be interpreted as a classification risk with regard to class labels produced by a clustering algorithm. The preferred number of clusters is determined by minimizing this classification risk as a function of the number of clusters. Convincing results are achieved on simulated as well as gene expression data sets. Comparisons to other methods demonstrate the competitive performance of our method and its suitability as a general validation tool for clustering solutions in real-world problems. 相似文献

18.

Spectral clustering with eigenvector selection

Tao Xiang Shaogang Gong 《Pattern recognition》2008,41(3):1012-1029

The task of discovering natural groupings of input patterns, or clustering, is an important aspect of machine learning and pattern analysis. In this paper, we study the widely used spectral clustering algorithm which clusters data using eigenvectors of a similarity/affinity matrix derived from a data set. In particular, we aim to solve two critical issues in spectral clustering: (1) how to automatically determine the number of clusters, and (2) how to perform effective clustering given noisy and sparse data. An analysis of the characteristics of eigenspace is carried out which shows that (a) not every eigenvectors of a data affinity matrix is informative and relevant for clustering; (b) eigenvector selection is critical because using uninformative/irrelevant eigenvectors could lead to poor clustering results; and (c) the corresponding eigenvalues cannot be used for relevant eigenvector selection given a realistic data set. Motivated by the analysis, a novel spectral clustering algorithm is proposed which differs from previous approaches in that only informative/relevant eigenvectors are employed for determining the number of clusters and performing clustering. The key element of the proposed algorithm is a simple but effective relevance learning method which measures the relevance of an eigenvector according to how well it can separate the data set into different clusters. Our algorithm was evaluated using synthetic data sets as well as real-world data sets generated from two challenging visual learning problems. The results demonstrated that our algorithm is able to estimate the cluster number correctly and reveal natural grouping of the input data/patterns even given sparse and noisy data. 相似文献

19.

Robust observer-based controller design for state constrained uncertain systems: attractive ellipsoid method

Manuel Mera Ivan Salgado Isaac Chairez 《International journal of control》2020,93(6):1397-1407

ABSTRACT

This study focuses in the output feedback stabilisation of constrained linear systems affected by uncertainties and noisy output measurements. The system states are restricted inside a given polytope and a classical Luenberger observer is used to reconstruct the unmeasurable states from output observations. Based on the observed states, a state feedback is proposed as the control input. The stability analysis and the control design are done using an extended version of the attractive ellipsoid method (AEM) approach. To avoid the violation of state constraints, this work proposes a barrier Lyapunov function (BLF) based analysis. The control parameters are obtained throughout the solution of some optimisation problems such that the BLF ensures an approximation of the constraints by a maximal ellipsoidal set and the AEM provides the characterisation of a minimal ultimately bounded set for the closed-loop system solutions. Numerical simulations show the advantages using the BFL-AEM methodology against classical sub-optimal controllers in academic second order and third order examples. Then, the proposed control strategy is applied over a Buck DC-DC converter. In all the cases, the method proposed here prevails over the other controllers. 相似文献

20.

An optimized extreme learning machine-based novel model for bearing fault classification

Sandeep S. Udmale Aneesh G. Nath Durgesh Singh Aman Singh Xiaochun Cheng Divya Anand Sanjay Kumar Singh 《Expert Systems》2024,41(2):e13432

This work addresses the rolling element bearing (REB) fault classification problem by tackling the issue of identifying the appropriate parameters for the extreme learning machine (ELM) and enhancing its effectiveness. This study introduces a memetic algorithm (MA) to identify the optimal ELM parameter set for compact ELM architecture alongside better ELM performance. The goal of using MA is to investigate the promising solution space and systematically exploit the facts in the viable solution space. In the proposed method, the local search method is proposed along with link-based and node-based genetic operators to provide a tight ELM structure. A vibration data set simulated from the bearing of rotating machinery has been used to assess the performance of the optimized ELM with the REB fault categorization problem. The complexity involved in choosing a promising feature set is eliminated because the vibration data has been transformed into kurtograms to reflect the input of the model. The experimental results demonstrate that MA efficiently optimizes the ELM to improve the fault classification accuracy by around 99.0% and reduces the requirement of hidden nodes by 17.0% for both data sets. As a result, the proposed scheme is demonstrated to be a practically acceptable and well-organized solution that offers a compact ELM architecture in comparison to the state-of-the-art methods for the fault classification problem. 相似文献