首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
As explored by biologists, there is a real and emerging need to identify co-regulated gene clusters, which include both positive and negative regulated gene clusters. However, the existing pattern-based and tendency-based clustering approaches are only designed for finding positive regulated gene clusters. In this paper, a new subspace clustering model called g-Cluster is proposed for gene expression data. The proposed model has the following advantages: 1) find both positive and negative co-regulated genes in a shot, 2) get away from the restriction of magnitude transformation relationship among co-regulated genes, and 3) guarantee quality of clusters and significance of regulations using a novel similarity measurement gCode and a user-specified regulation threshold δ, respectively. No previous work measures up to the task which has been set. Moreover, MDL technique is introduced to avoid insignificant g-Clusters generated. A tree structure, namely GS-tree, is also designed, and two algorithms combined with efficient pruning and optimization strategies to identify all qualified g-Clusters. Extensive experiments are conducted on real and synthetic datasets. The experimental results show that 1) the algorithm is able to find an amount of co-regulated gene clusters missed by previous models, which are potentially of high biological significance, and 2) the algorithms are effective and efficient, and outperform the existing approaches.  相似文献   

2.
Statistical methods, and in particular machine learning, have been increasingly used in the drug development workflow. Among the existing machine learning methods, we have been specifically concerned with genetic programming. We present a genetic programming-based framework for predicting anticancer therapeutic response. We use the NCI-60 microarray dataset and we look for a relationship between gene expressions and responses to oncology drugs Fluorouracil, Fludarabine, Floxuridine and Cytarabine. We aim at identifying, from genomic measurements of biopsies, the likelihood to develop drug resistance. Experimental results, and their comparison with the ones obtained by Linear Regression and Least Square Regression, hint that genetic programming is a promising technique for this kind of application. Moreover, genetic programming output may potentially highlight some relations between genes which could support the identification of biological meaningful pathways. The structures that appear more frequently in the “best” solutions found by genetic programming are presented.  相似文献   

3.
4.
5.
获取真核细胞中细胞核内蛋白质定位的信息对注解蛋白质功能具有非常重要的意义。针对于利用计算方法预测蛋白质在亚核水平上的定位更具挑战性的问题,提出了基于自互协方差变换与递归特征消除预测蛋白质亚核定位的方法。该方法基于位置特异性得分矩阵利用自互协方差变换构建蛋白质序列的特征向量,采用递归特征消除法进行特征选择,选用支持向量机作为预测工具,并在两个经典数据集SC714和LD504上进行了夹克刀测试。实验结果表明,该方法比大多数已报道的预测方法具有更高的预测准确率。  相似文献   

6.
Recently, many methods have been proposed for constructing gene regulatory networks (GRNs). However, most of the existing methods ignored the time delay regulatory relation in the GRN predictions. In this paper, we propose a hybrid method, termed GA/PSO with DTW, to construct GRNs from microarray datasets. The proposed method uses test of correlation coefficient and the dynamic time warping (DTW) algorithm to determine the existence of a time delay relation between two genes. In addition, it uses the particle swarm optimization (PSO) to find thresholds for discretizing the microarray dataset. Based on the discretized microarray dataset and the predicted types of regulatory relations among genes, the proposed method uses a genetic algorithm to generate a set of candidate GRNs from which the predicted GRN is constructed. Three real-life sub-networks of yeast are used to verify the performance of the proposed method. The experimental results show that the GA/PSO with DTW is better than the other existing methods in terms of predicting sensitivity and specificity.  相似文献   

7.
张宏怡  张军英 《计算机工程》2007,33(15):26-28,39
科学的基因聚类方法是构建基因调控网络的前提,但仅以聚类作为构建网络的主要手段只能找到共同调控的基因,不能精确反映基因之间的相互作用过程。贝叶斯网络模型通过基于图的方式求得多变量之间条件独立的概率因果关系,但因其计算复杂性受到应用层面的限制。该文综合考虑几方面因素,在对基因进行聚类基础上,通过对调控关系的预测获得对目标基因的调控基因组,再利用LCD(local causal relation discovery)方法通过限制搜索条件发现基因间的独立关系,进而获得基因调控网络。实验结果表明了该方法的可行性和有效性。  相似文献   

8.
This paper proposes the design and a comparative study of two proposed online kernel methods identification in the reproducing kernel Hilbert space and other two kernel method existing in the literature. The two proposed methods, titled SVD-KPCA, online RKPCA. The two other techniques named Sliding Window Kernel Recursive Least Square and the Kernel Recursive Least Square. The considered performances are the Normalized Means Square Error, the consumed time and the numerical complexity. All methods are evaluated by handling a chemical process known as the Continuous Stirred Tank Reactor and Wiener-Hammerstein benchmark.  相似文献   

9.
满意特征选择及其应用   总被引:2,自引:0,他引:2  
实际应用中的特征选择是一个满意优化问题.针对已有特征选择方法较少考虑特征获取代价和特征集维数的自动确定问题,提出一种满意特征选择方法(SFSM),将样本分类性能、特征集维数和特征提取复杂性等多种因素综合考虑.给出特征满意度和特征集满意度定义,设计出满意度函数,导出满意特征集评价准则,详细描述了特征选择算法.雷达辐射源信号特征选择与识别的实验结果显示,SFSM在计算效率和选出特征的质量方面明显优于顺序前进法、新特征选择法和多目标遗传算法.证实了SFSM的有效性和实用性.  相似文献   

10.
11.
DNA microarray technology has emerged as a prospective tool for diagnosis of cancer and its classification. It provides better insights of many genetic mutations occurring within a cell associated with cancer. However, thousands of gene expressions measured for each biological sample using microarray pose a great challenge. Many statistical and machine learning methods have been applied to get most relevant genes prior to cancer classification. A two phase hybrid model for cancer classification is being proposed, integrating Correlation-based Feature Selection (CFS) with improved-Binary Particle Swarm Optimization (iBPSO). This model selects a low dimensional set of prognostic genes to classify biological samples of binary and multi class cancers using Naive–Bayes classifier with stratified 10-fold cross-validation. The proposed iBPSO also controls the problem of early convergence to the local optimum of traditional BPSO. The proposed model has been evaluated on 11 benchmark microarray datasets of different cancer types. Experimental results are compared with seven other well known methods, and our model exhibited better results in terms of classification accuracy and the number of selected genes in most cases. In particular, it achieved up to 100% classification accuracy for seven out of eleven datasets with a very small sized prognostic gene subset (up to <1.5%) for all eleven datasets.  相似文献   

12.
13.
14.
针对驱动通路识别的相关研究依赖传统生物实验方法,存在费时费力且经济成本高的问题,提出一种新的二进制癌症驱动通路识别方法PEA-BLMWS。首先,利用已有的基因表达数据,通过对比正常基因与突变基因表达量的差异,挖掘潜在的基因突变数据;其次,引入蛋白质相互作用网络数据,构建出一个改进的二进制线性最大权重子矩阵模型;最后,提出一种双亲协同进化算法求解该矩阵模型。在GBM(glioblastoma)和OVCA(ovarian cancer)数据集上的实验结果表明,相比于其他先进的Dendrix、CCA-NMWS和CGP-NCM识别方法,PEA-BLMWS识别的基因集中有更多基因富集在已知的信号通路中,未富集在信号通路中的基因也与癌症的发生密切相关,故该识别方法可作为一种驱动通路识别的有效工具。  相似文献   

15.
Due to recent interest in the analysis of DNA microarray data, new methods have been considered and developed in the area of statistical classification. In particular, according to the gene expression profile of existing data, the goal is to classify the sample into a relevant diagnostic category. However, when classifying outcomes into certain cancer types, it is often the case that some genes are not important, while some genes are more important than others. A novel algorithm is presented for selecting such relevant genes referred to as marker genes for cancer classification. This algorithm is based on the Support Vector Machine (SVM) and Supervised Weighted Kernel Clustering (SWKC). To investigate the performance of this algorithm, the methods were applied to a simulated data set and some real data sets. For comparison, some other well-known methods such as Prediction Analysis of Microarrays (PAM), Support Vector Machine-Recursive Feature Elimination (SVM-RFE), and a Structured Polychotomous Machine (SPM) were considered. The experimental results indicate that the proposed SWKC/SVM algorithm is conceptually much simpler and performs more efficiently than other existing methods used in identifying marker genes for cancer classification. Furthermore, the SWKC/SVM algorithm has the advantage that it requires much less computing time compared with the other existing methods.  相似文献   

16.
Coefficient of consolidation in the soil is the significant engineering properties and an important parameter for designing and auditing of geo-technical structures. Therefore, in this study, authors have proposed an efficient methodology to prediction the coefficient of consolidation using machine learning models namely Multiple Linear Regression (MLR), Artificial Neural Network (ANN), Support Vector Regression (SVR), and Adaptive Network based Fuzzy Inference System (ANFIS). Further, various feature selection techniques such as Least Absolute Shrinkage and Selection Operator algorithm (LASSO), Random Forests - Recursive Feature Elimination (RF-RFE), and Mutual information have also been applied. It has been observed that feature selection methods have enhanced the quality of prediction model by eliminating the irrelevant features and utilized only important features while building the prediction models. Experiments are performed on the dataset collected on the 534 soil samples from Ha Noi –Hai Phong highway project, Vietnam. Experimental results show the adequacy of the proposed model, and the hybrid approach ANFIS which is a fusion of ANN and fuzzy inference system includes complementary information of the uncertainty and adaptability. ANFIS along with LASSO feature selection method produces the coefficient of determination of 0.831 and thus provides the best prediction for the coefficient of consolidation of a soil as compared to other approaches.  相似文献   

17.
一种基于递归分类树的集成特征基因选择方法   总被引:14,自引:1,他引:14  
李霞  张田文  郭政 《计算机学报》2004,27(5):675-682
利用DNA芯片基因表达谱信息识别疾病相关基因,对癌症等疾病分型、诊断及病理学研究有非常重要的实际意义.该文提出了一种基于递归分类树的特征基因选择的集成方法EFST(Ensemble Feature Selection based on Recursive Partition—Tree).EFST可选择多组基于不同样本分布结构的特征基因,结合有监督机器学习中的多分类器集成(ensemble)决策技术,利用提出的衡量特征基因稳定性与显著性测度.集成各特征基因组选择最终的特征基因.应用结肠癌2000个基因的表达谱实验数据分析结果显示:EFST方法不仅具有寻找疾病相关基因的能力和较强的数据维数压缩能力,而且由支持向量机(SVM)等4种模式分类方法证实EFST方法可以明显地提高疾病鉴别分类的准确率.  相似文献   

18.
19.
Microarray technologies are employed to simultaneously measure expression levels of thousands of genes. Data obtained from such experiments allow inference of individual gene functions, help to identify genes from specific tissues, to analyze the behavior of gene expression levels under various environmental conditions and under different cell cycle stages, and to identify inappropriately transcribed genes and several genetic diseases, among many other applications. As thousands of genes may be involved in a microarray experiment, computational tools for organizing and providing possible visualizations of the genes and their relationships are crucial to the understanding and analysis of the data. This work proposes an algorithm based on artificial immune systems for organizing gene expression data in order to simultaneously reveal multiple features in large amounts of data. A distinctive property of the proposed algorithm is the ability to provide a diversified set of high-quality rearrangements of the genes, opening up the possibility of identifying various co-regulated genes from representative graphical configurations of the expression levels. This is a very useful approach for biologists, because several co-regulated genes may exist under different conditions.  相似文献   

20.
Protein function prediction is an important problem in functional genomics. Typically, protein sequences are represented by feature vectors. A major problem of protein datasets that increase the complexity of classification models is their large number of features. Feature selection (FS) techniques are used to deal with this high dimensional space of features. In this paper, we propose a novel feature selection algorithm that combines genetic algorithms (GA) and ant colony optimization (ACO) for faster and better search capability. The hybrid algorithm makes use of advantages of both ACO and GA methods. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. The performance of proposed algorithm is compared to the performance of two prominent population-based algorithms, ACO and genetic algorithms. Experimentation is carried out using two challenging biological datasets, involving the hierarchical functional classification of GPCRs and enzymes. The criteria used for comparison are maximizing predictive accuracy, and finding the smallest subset of features. The results of experiments indicate the superiority of proposed algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号