首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
基于相容关系的基因选择方法   总被引:1,自引:0,他引:1  
焦娜  苗夺谦 《计算机科学》2010,37(10):217-220
有效的基因选择是对基因表达数据进行分析的重要内容。粗糙集作为一种软计算方法能够保持在数据集分类能力不变的基础上,对属性进行约简。由于基因表达数据的连续性,为了避免运用粗糙集方法所必需的离散化过程带来的信息丢失,将相容粗糙集应用于基因的特征选取,提出了基于相容关系的基因选择方法。首先,通过i检验对基因表达数据进行排列,选择评分靠前的若干基因;然后,通过相容粗糙集对这些基因进一步约简。在两个标准的基因表达数据上进行了实验,结果表明该方法是可行性和有效性的。  相似文献   

2.
一种基于拆分的基因选择算法   总被引:1,自引:0,他引:1  
基因表达数据是由成千上万个基因及几十个样本组成的,有效的基因选择算法是基因表达数据研究的重要内容。粗糙集是一个有效的去掉冗余特征的工具。然而,对于含有成千上万特征、几十个样本的基因表达数据,现有基于粗糙集的特征选择算法的计算效率会变得非常低。为此,将拆分方法应用于特征选择,提出了一种基于拆分的特征选择算法。该算法把一个复杂的表拆分成简单的、更容易处理的主表与子表形式,然后把它们的结果连接到一起解决初始表的问题。实验结果表明,该算法在保证分类精度的同时,能明显提高计算效率。  相似文献   

3.
基于模糊粗糙集的肿瘤分类特征基因选取   总被引:2,自引:0,他引:2  
依据基因表达谱有效建立肿瘤分类模型的关键在于,准确找出决定样本类剐的一组特征基因.粗糙集理论作为一种新的软计算方法能够保持在原数据集的分类能力不变的基础上,对属性极大约简,从大量基因中找到对分类有效的基因.由于基因表达谱数据集的连续性,为了避免运用粗糙集方法所必需的离散化过程带来的信息丢失,尝试将模糊粗糙集应用于特征基因的选取,提出了基于互信息的模糊粗糙集属性约简算法,运用于基因表达谱数据集的基因选取.然后分别采用KNN和C5.0分类器进行特征基因分类性能进行检验.以急性白血病亚型(leukemia Microarray)和直肠癌(colon Microarray)分类特征基因选取为例进行实验,结果表明了上述方法的可行性和有效性.  相似文献   

4.
曹娟  张颖淳  赵玲 《计算机科学》2013,40(7):226-228,265
依据基因表达谱建立有效肿瘤分类模型的关键在于准确找出决定样本类别的一组特征基因。粗糙集理论已成功应用于肿瘤分类特征基因选取中。然而,粗糙集方法处理连续值的基因表达谱数据集所必需的离散化过程会使得部分信息丢失,对所选取的特征基因的分类精度造成一定影响。因此,曾提出基于互信息的模糊粗糙集基因表达谱数据集特征基因的选取算法。然而,该算法计算代价较高,当所选取的基因数较多时难以实现。为此,对 该算法进行了 改进,从最大相关性和最重要性(最小冗余)两方面对互信息进行了近似替代计算,大大降低了算法的复杂度,提高了算法的效率。以急性白血病亚型(leukemia)、直肠癌(colon)和乳腺癌(Breast)分类特征基因选取为例进行实验,然后分别采用1NN和SVM分类器进行特征基因分类精度检验,结果证实了新方法的可行性和有效性。  相似文献   

5.
特征基因选择在微阵列数据分析中占据着非常重要的作用,好的特征选择方法是提高基因表达数据的分类精度与分类速度的关键之一.联系蚁群算法和粗糙集理论在微阵列数据处理上的优势,文中结合粗糙集理论,对蚁群优化算法模型进行了改进,并将粗糙集的属性依赖度和属性重要度应用到蚁群算法的路径选择及评估中,提出一种新的基因选择方法.该方法实现简单,并可以比较快速地获得最优解,最终选择出较小的并且分类性能较强的特征基因子集.通过对基因数据集的仿真实验表明,该算法是有效可行的.  相似文献   

6.
为了避免连续数据离散化处理时造成的信息损失,降低样本属性邻域求解的复杂度,提高特征基因提取的效率。该文在单调度量空间上,提出了一种基于单调邻域粗糙集的特征基因提取方法。并在两个标准的基因表达数据上进行了实验,结果证明该方法是有效可行的。  相似文献   

7.
孟军  李锐  郝涵 《计算机科学》2015,42(6):37-40, 66
在对基因微阵列数据的特征选择和分类的研究中,粗糙集理论是一个可以消除冗余基因的有效工具.但是传统的粗糙集模型不能很好地处理连续型数值数据,而离散化方法可能会导致信息的丢失.为此,提出了一种基于相交邻域粗糙集模型的属性约简算法,即将传统粗糙集中的距离邻域扩展为相交邻域,采用基于集合的方式来定义近似,以此构建粗糙集模型.在癌症数据集上进行实验,结果表明基于集合近似和相交邻域的粗糙集模型可以取得较好的分类效果,并且通过对选择出的基因进行GO术语分析,进一步证明了该模型的有效性.  相似文献   

8.
基于数据分箱的CARS方法用于基因表达谱的特征筛选   总被引:1,自引:0,他引:1  
特征基因的选择是基因表达谱数据挖掘的关键问题。本文在CARS方法的基础上,提出了基于数据分箱的CARS方法用于特征基因选择。方法基本思路是对数据进行分箱,用CARS方法对各箱变量进行特征选择,所得的特征基因子集合并后再用CARS方法选择最佳特征基因;所选择的最佳特征基因利用支持向量机进行留一交叉检验。本方法在前列腺癌数据集上进行分析,最终选择了7个特征基因,这7个特征基因利用支持向量机进行留一交叉检验所得的样本识别准确率为99.02%。结果表明本方法选择的特征基因分类精度高,且具有良好的稳定性,说明该方法是一种有效的肿瘤特征基因选择方法。  相似文献   

9.
粗糙集理论是一种分析不精确、不一致、不完备数据的有效工具,利用"相容粗糙集"的理论对图形图像进行预检索,对提高图形图像的检索效率具有一定的作用。本文基于相容粗糙集的图形图像信息预检索进行了研究。  相似文献   

10.
焦娜 《计算机科学》2016,43(1):49-52
粗糙集理论是一个能有效地删除冗余特征的工具。由于实际应用的数据往往是连续的,并且结构复杂、特征多,现有的粗糙集知识约简方法对真实复杂的数据计算效率较低。为此,首先将相容关系应用于粗糙集的知识约简,再将复杂的信息表纵向分割成简单的缩减表和小规模信息表,然后把缩减表和小规模信息表连接起来进行知识约简。实例表明,提出的方法能够有效提高粗糙集对复杂数据的计算效率。  相似文献   

11.
This paper presents a new visualization software called pairheatmap, which is able to generate and compare two heatmaps so as to compare expression patterns of gene groups. It adds a conditioning variable such as time to the heatmap, and provides separate clustering for row groups in the first heatmap in order to visualize pattern changes between two heatmaps. pairheatmap is developed in R statistical environment. It provides: (1) the flexible framework for comparing two heatmaps; and (2) high-quality figures based on R package grid. The general architecture can be efficiently incorporated into bioinformatics pipeline. The package and user documentation are free to download at http://cran.r-project.org/web/packages/pairheatmap/index.html.  相似文献   

12.
借鉴生物的基因机制,从木马程序的基因片段入手,来探究检测和查杀木马程序的方法.  相似文献   

13.
在计算学基因挖掘方法的基础上引入基因本体论GO,从而提出一种融合GO的集成基因挖掘方法,即集成SDA—GO—SVM基因挖掘方法。通过两套基因芯片表达谱数据实验并和其它方法对比。该方法能挖掘出更优的特征基因。  相似文献   

14.
Micro array has been a widely used microscopic measurement that accumulates the expression levels of a large number of genes varying over different time points. Cluster analysis more over the concept of bi-clustering provides insight into meaningful information from the correlation of a subset of genes with a subset of conditions. This eventually helps in discovering biologically meaningful clusters over analyzing missing values, imprecision and noise present in micro array data set. Although the concept of fuzzy set is enough to deal with the overlapping nature of the bi-clusters but the use of shadowed set helps in identifying and analyzing the nature of the genes lying in the confusion area of the clusters. In this article, we have suggested a bi-clustering model of the shadowed set with gradual representation of cardinality and named it as Gradual shadowed set for gene expression (GSS-GE) clustering. It identifies the bi-clusters in the core and in the shadowed region and evaluates their biological significance. The excellence of the proposed GSS-GE has been demonstrated by considering three real data sets, namely yeast data, serum data and mouse data set. The performance is compared with Ching Church’s algorithm (CC), Bimax, order preserving sub matrix (OPSM), Large Average Sub matrices (LAS), statistical plaid model and a modified fuzzy co-clustering (MFCC) algorithm. For the mouse data set there is no cluster level analysis of the micro array has been done so far. We have also provided the statistical and biological significance to prove the superiority of the proposed GSS-GE.  相似文献   

15.
The emerging field of bioinformatics has recently created much interest in the computer science and engineering communities. With the wealth of sequence data in many public online databases and the huge amount of data generated from the Human Genome Project, computer analysis has become indispensable. This calls for novel algorithms and opens up new areas of applications for many pattern recognition techniques. In this article, we review two major avenues of research in bioinformatics, namely DNA sequence analysis and DNA microarray data analysis. In DNA sequence analysis, we focus on the topics of sequence comparison and gene recognition. For DNA microarray data analysis, we discuss key issues such as image analysis for gene expression data extraction, data pre-processing, clustering analysis for pattern discovery and gene expression time series data analysis. We describe current methods and show how computational techniques could be useful in these areas. It is our hope that this review article could demonstrate how the pattern recognition community could have an impact on the fascinating and challenging area of genomic research.  相似文献   

16.
In computational biology, gene networks are typically inferred from gene expression data alone. Incorporating multiple types of biological evidences makes it possible to improve gene network estimation. In this paper, we describe an approach for building enzyme gene networks by the integration of gene expression data, motif sequence, and metabolic information. To evaluate the approach, we apply it to a pool of E. coli genes related to aspartate pathway. The results show that integrative approach has potentials of obtaining more accurate gene networks.  相似文献   

17.
Biclustering algorithms have become popular tools for gene expression data analysis. They can identify local patterns defined by subsets of genes and subsets of samples, which cannot be detected by traditional clustering algorithms. In spite of being useful, biclustering is an NP-hard problem. Therefore, the majority of biclustering algorithms look for biclusters optimizing a pre-established coherence measure. Many heuristics and validation measures have been proposed for biclustering over the last 20 years. However, there is a lack of an extensive comparison of bicluster coherence measures on practical scenarios. To deal with this lack, this paper experimentally analyzes 17 bicluster coherence measures and external measures calculated from information obtained in the gene ontologies. In this analysis, results were produced by 10 algorithms from the literature in 19 gene expression datasets. According to the experimental results, a few pairs of strongly correlated coherence measures could be identified, which suggests redundancy. Moreover, the pairs of strongly correlated measures might change when dealing with normalized or non-normalized data and biclusters enriched by different ontologies. Finally, there was no clear relation between coherence measures and assessment using information from gene ontology.  相似文献   

18.
利用动态调整聚类个数的思想,在模糊C-均值聚类算法基础上引入基于多维PFS判别函数,提出一种基于多维伪F统计量的基因表达动态C-均值聚类算法.以H5N1病毒基因序列数字特征提取为例,在聚类分析过程中直接利用数字特征矩阵作为分析数据,结果表明该算法可以动态调整聚类个数,给出最佳聚类数目,从而获得较好的聚类质量.  相似文献   

19.
Gene expression data are expected to be of significant help in the development of efficient cancer diagnosis and classification platforms. One problem arising from these data is how to select a small subset of genes from thousands of genes and a few samples that are inherently noisy. This research aims to select a small subset of informative genes from the gene expression data which will maximize the classification accuracy. A model for gene selection and classification has been developed by using a filter approach, and an improved hybrid of the genetic algorithm and a support vector machine classifier. We show that the classification accuracy of the proposed model is useful for the cancer classification of one widely used gene expression benchmark data set.  相似文献   

20.
This study presents a new microfluidic chip that generates micro-scale emulsion droplets for gene delivery applications. Compared with conventional methods of droplet formation, the proposed chip can create uniform droplets (size variation <7.1%) and hence enhance the efficiency of the subsequent gene delivery. A new microfluidic chip was developed in this study, which used a new design with a pneumatic membrane chamber integrated into a T-junction microchannel. Traditionally, the size of droplets was controlled by the flow rate ratio of the continuous and disperse phase flows, which can be controlled by syringe pumps. In this study, a pneumatic chamber near the intersection of the T-junction channel was designed to locally change the flow velocity and the shear force. When the upper air chamber was filled with compressed air, the membrane was deflected and then the droplet size could be fine-tuned accordingly. Experimental data showed that using the new design, the higher the air pressure applied to the active tunable membrane, the smaller the droplet size. Finally, droplets were used as carriers for DNA to be transfected into the Cos-7 cells. It was also experimentally found that the size of the emulsion droplets plays an important role on the efficiency of the gene delivery. The preliminary results of this paper have been presented at the 2007 IEEE International Conference of Nano/Molecular Medicine and Engineering (IEEE NANOMED 2007), Macau, China, 6–9 August, 2007.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号