首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
三维微阵列数据的多目标进化聚类   总被引:1,自引:0,他引:1       下载免费PDF全文
聚类技术广泛应用于微阵列数据分析中。在基因-样本-时间GST微阵列数据矩阵中,挖掘三雏聚类成为当前的热门研究课题。3D聚类过程经常需要对多个相互冲突的目标进行优化,而且进化算法以其强大的探寻能力成为高维搜索空间中非常有效的搜索方法。本文基于多目标进化计算方法提出一个新的3D聚类算法MOE-TC,以挖掘GST数据中的3D聚类。现实微阵列数据上的实验验证结果充分说明了本文算法的有效性。  相似文献   

2.
基因表达数据是由DNA微阵列实验产生的大规模矩阵,能有效地提取生物学信息,由于受到实验条件限制,基因表达数据往往存在缺失值,需要进行缺失数据的填补。传统的缺失数据填补方法是基于基因表达数据的单一特征,未充分考虑数据矩阵间的相关性。针对双聚类均方残值越小基因表达数据相关性越高这一特性进行研究,提出一种基于模拟退火优化双聚类的缺失数据填补方法(bi-SA),采用模拟退火法确定最优双聚类,从而实现缺失数据的最有效填补。四组真实基因表达数据实验表明,bi-SA方法能够获得较高的填补准确性。  相似文献   

3.
针对现有基因表达数据投影聚类算法假定基因相互独立,根据每个基因的独立区分度选择相关投影空间的不足,提出了根据基因间相互关系进行投影聚类的算法MOLION.通过将基因表达数据转换为序列数据,基于设定的用户偏好函数,采用分界判定法对样本穷举树进行快速地深度优先遍历,同时应用了高效的削减和优化策略.几个真实微阵列数据集上的实验证实了提出的算法具有较高的效率和预测准确性,为考察疾病表型的形成原因提供了一个新视角.  相似文献   

4.
基因微阵列(DNA microarray)是实验分子生物学中的一个重要突破,其使得研究者可以同时监测多个基因在多个实验条件下表达水平的变化,进而为发现基因协同表达网络、研制药物、预防疾病等提供技术支持.研究者们提出了大量的聚类算法来分析基因表达数据,但是标准的聚类算法(单向聚类)只能发现少量的知识.因为基因不可能在所有实验条件下共表达,也不可能展示出相同的表达水平,但是可能参与多种遗传通路.在这种情况下,双聚类方法应运而生.这样就将基因表达数据的分析从整体模式转向局部模式,从而改变了只根据数据的全部对象或属性将数据聚类的局面.主要从局部模式的定义、局部模式类型与标准、局部模式的挖掘与查询等方面进行了梳理.介绍了基因表达数据中局部模式挖掘当前的研究现状与进展,详细总结了基于定量和定性的局部模式挖掘标准以及相关的挖掘系统,分析了存在的问题,并深入探讨了未来的研究方向.  相似文献   

5.
基因表达数据是由DNA微阵列实验产生的大规模数据矩阵,双聚类算法是挖掘数据矩阵中具有较高相关性的子矩阵,能有效地提取生物学信息.针对当前多目标双聚类优化算法易于陷入早熟和局部最优解等问题,论文提出了基于逻辑运算的离散人工蜂群优化双聚类算法(LOABCB算法),一方面引入人工蜂群算法增强双聚类的全局寻优能力,另一方面通过...  相似文献   

6.
基于DNA微阵列基因表达数据的分类方法研究   总被引:1,自引:1,他引:0  
介绍了目前几种基于DNA微阵列基因表达数据的分类方法。分别阐述了递归分割法、构建森林法以及信息融合方法的算法思想,对每种方法进行了深入描述,并对它们进行了分析和比较。最后对基于基因表达微阵列数据的分类技术进行了展望。  相似文献   

7.
基因芯片是微阵列技术的典型代表,它具有高通量的特性和同时检测全部基因组基因表达水平的能力。应用微阵列芯片的一个主要目的是基因表达模式的发现,即在基因组水平发现功能相似,生物学过程相关的基因簇;或者将样本分类,发现样本的各种亚型。例如根据基因表达水平对癌症样本进行分类,发现疾病的分子亚型。非负矩阵分解NMF方法是一种非监督的、非正交的、基于局部表示的矩阵分解方法。近年来这种方法被越来越多地应用在微阵列数据的分类分析和聚类发现中。系统地介绍了非负矩阵分解的原理、算法和应用,分解结果的生物学解释,分类结果的质量评估和基于NMF算法的分类软件。总结并评估了NMF方法在微阵列数据分类和聚类发现应用中的表现。  相似文献   

8.
微阵列实验是一个复杂的多步骤的实验过程,不确定性存在于实验的每一个步骤中,导致最后得到的实验结果中包含了一些数据噪声。为了从这些含有噪声的数据中得到更多有意义的生物信息,很多算法相继被提出来计算基因表达值。目前流行的mmgMOS模型提高了芯片数据分析的准确性,但是该模型的主要缺点是其参数值φ在整个数据集上是唯一不变的,单一的值不能代表不同探针的真实信号。本文对mmgMOS模型中的参数值φ进行改进,从而进一步提高后续寻找差异基因的准确率。  相似文献   

9.
微阵列数据的一个重要应用就是给疾病样本分类。微阵列数据具有样本数量小、特征数量大的特点。该文提出了一种新的方法。以急性白血病的基因表达数据为对象,经过t统计法适当降维,利用覆盖算法白血病类型进行分类,并同已有算法进行了对比。实验结果证明本算法是有效的。  相似文献   

10.
特征基因选择在微阵列数据分析中占据着非常重要的作用,好的特征选择方法是提高基因表达数据的分类精度与分类速度的关键之一.联系蚁群算法和粗糙集理论在微阵列数据处理上的优势,文中结合粗糙集理论,对蚁群优化算法模型进行了改进,并将粗糙集的属性依赖度和属性重要度应用到蚁群算法的路径选择及评估中,提出一种新的基因选择方法.该方法实现简单,并可以比较快速地获得最优解,最终选择出较小的并且分类性能较强的特征基因子集.通过对基因数据集的仿真实验表明,该算法是有效可行的.  相似文献   

11.
This investigation deals with a new distance measure for genes using their microarray expressions and a new algorithm for fast gene ordering without clustering. This distance measure is called "Maxrange distance," where the distance between two genes corresponding to a particular type of experiment is computed using a normalization factor, which is dependent on the dynamic range of the gene expression values of that experiment. The new gene-ordering method called "Minimal Neighbor" is based on the concept of nearest neighbor heuristic involving O(n2) time complexity. The superiority of this distance measure and the comparability of the ordering algorithm have been extensively established on widely studied microarray data sets by performing statistical tests. An interesting application of this ordering algorithm is also demonstrated for finding useful groups of genes within clusters obtained from a nonhierarchical clustering method like the self-organizing map.  相似文献   

12.
There are many sources of systematic variations in cDNA microarray experiments which affect the measured gene expression levels. Print-tip lowess normalization is widely used in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. However, print-tip lowess normalization performs poorly in situations where error variability for each gene is heterogeneous over intensity ranges. We first develop support vector machine quantile regression (SVMQR) by extending support vector machine regression (SVMR) for the estimation of linear and nonlinear quantile regressions, and then propose some new print-tip normalization methods based on SVMR and SVMQR. We apply our proposed normalization methods to previous cDNA microarray data of apolipoprotein AI-knockout (apoAI-KO) mice, diet-induced obese mice, and genistein-fed obese mice. From our comparative analyses, we find that our proposed methods perform better than the existing print-tip lowess normalization method.  相似文献   

13.
基于GA/SVM的微阵列数据特征的选择与分类   总被引:2,自引:0,他引:2       下载免费PDF全文
微阵列数据样本小、维度高的特点给数据分析造成了困难,而主基因的挑选又十分的重要。该文采用遗传算法挑选主基因,其中,用k最邻居距离作为模式识别方法,用支持向量机构造了诊断系统,用不同核函数进行预测分类性能测试。在经典的白血病数据集上,对34个样本的测试集的分类准确率为100%。  相似文献   

14.
Bio-chip data that consists of high-dimensional attributes have more attributes than specimens. Thus, it is difficult to obtain covariance matrix from tens thousands of genes within a number of samples. Feature selection and extraction is critical to remove noisy features and reduce the dimensionality in microarray analysis. This study aims to fill the gap by developing a data mining framework with a proposed algorithm for cluster analysis of gene expression data, in which coefficient correlation is employed to arrange genes. Indeed, cluster analysis of microarray data can find coherent patterns of gene expression. The output is displayed as table list for convenient survey. We adopt the breast cancer microarray dataset to demonstrate practical viability of this approach.  相似文献   

15.
We develop an approach to analyze time-course microarray data which are obtained from a single sample at multiple time points and to identify which genes are cell-cycle regulated. Since some genes have similar gene expression patterns, to reduce the amount of hypothesis testing, we first perform a clustering analysis to group genes into classes with similar cell-cycle patterns, including a class with no cell-cycle phenomena at all. Then we build a statistical model and an inference function assuming that genes within a cluster share the same mean model. A varying coefficient nonparametric approach is employed to be more flexible to fit the time-course data. In order to incorporate the correlation of longitudinal measurements, the quadratic inference function method is applied to obtain more efficient estimators and more powerful tests. Furthermore, this method allows us to perform chi-squared tests to determine whether certain genes are cell-cycle regulated. A data example on cell-cycle microarray data as well as simulations are illustrated.  相似文献   

16.
鉴于传统的基因选择方法会选出大量冗余基因从而导致较低的样本预测准确率,提出一种基于聚类和微粒群优化的基因选择算法。首先采用聚类算法将基因分成固定数目的簇;然后,采用极限学习机作为分类器进行簇中的特征基因分类性能评价,得到一个备选基因库;最后,采用基于微粒群优化和极限学习机的缠绕法从备选基因库中选择具有最大分类率、最小数目的基因子集。所选出的基因具有良好的分类性能。在两个公开的微阵列数据集上的实验结果表明,相对于一些经典的方法,新方法能够以较少的基因获得更高的分类性能。  相似文献   

17.
考虑样本不平衡的模型无关的基因选择方法   总被引:9,自引:0,他引:9  
李建中  杨昆  高宏  骆吉洲  郭政 《软件学报》2006,17(7):1485-1493
在基因表达数据分析中,鉴别基因是后续研究中非常重要的信息基因.有很多研究致力于从基因表达数据中选出信息基因这一挑战性工作,并提出了一些基因选择方法.然而,这些方法(特别是非参数选择方法)都没有考虑不同样本类别中样本大小的不平衡性问题.考虑样本不平衡性和基因选择方法的稳定性,给出一个全新的与数据分布模型无关的基因选择方法.在类内变化小和类间差别大的策略下,选择敏感的度量函数提高方法的鉴别能力,同时,利用类内变化和类间差别的一致性来增加方法的稳定性和适用性.这一方法不但可以应用于两个类别的情况,也可以应用于多个类别的情况.最后,使用两组真实的基因表达数据对所提出的方法进行了验证.实验结果表明,这一方法比其他方法具有更高的有效性和稳健性.  相似文献   

18.
段旭 《计算机工程与设计》2011,32(11):3836-3839
一个微阵列数据集包含了成千上万的基因、相对少量的样本,而在这成千上万的基因中,只有一少部分基因对肿瘤分类是有贡献的,因此,对于肿瘤分类来说,最重要的一个问题就是识别选择出对肿瘤分类最有贡献的基因。为了能有效地进行微阵列基因选择,提出用一个边缘分布模型(marginal distribution model,MDM)来描述微阵列数据。该模型不仅能区分基因是否在两样本中差异表达,而且能区分出基因在哪一类样本中表达,从而选择出的基因更具有生物学意义。模拟数据及真实微阵列数据集上的实验结果表明,该方法能有效地进行微阵列基因选择。  相似文献   

19.
Data derived from gene expression microarrays often are used for purposes of classification and discovery. Many methods have been proposed for accomplishing these and related aims, however the statistical properties of such methods generally are not well established. To this end, it is desirable to develop realistic mathematical and statistical models that can be used in a simulation context so that the impacts of data analysis methods and testing approaches can be established. A method is developed in which variation among arrays can be characterized simultaneously for a large number of genes resulting in a multivariate model of gene expression. The method is based on selecting mathematical transformations of the underlying expression measures such that the transformed variables follow approximately a Gaussian distribution, and then estimating associated parameters, including correlations. The result is a multivariate normal distribution that serves to model transformed gene expression values within a subject population, while accounting for covariances among genes and/or probes. This model then is used to simulate microarray expression and probe intensity data by employing a modified Cholesky matrix factorization technique which addresses the singularity problem for the “small n, big p” situation. An example is given using prostate cancer data and, as an illustration, it is shown how data normalization can be investigated using this approach.  相似文献   

20.
Estimating an overall density function from repeated observations on each of a sample of independent subjects or experimental units is of interest. An example is provided by biodemographic studies, where one observes age-at-death for several cohorts of flies. Cohorts are kept in separate cages, which form the experimental units. Time variation then is likely to exist between the cohort densities and hazard rates due to cage effects on aging. Given the densities of age-at-death for the individual cohorts, one aims to obtain an estimate for the underlying overall density and hazard rate. In microarray gene expression experiments, similar problems arise when addressing the need for normalization of probe-level data from different arrays. Conventional methods, such as the cross-sectional average density, ignore time variation and hence are often not representative for such data. We view densities as functional data and model individual densities as warped versions of an underlying overall density, where the observed densities are assumed to be realizations of an underlying stochastic process. Quantile-synchronized distribution functions are obtained from an inverse warping mapping, based on quantile synchronization, leading to quantile-synchronized density and hazard functions. Kernel type smoothing methods with plug-in bandwidth selection can be used for estimating the components of the model. Asymptotic properties of the synchronized density estimates are derived. Simulation results show that functional density synchronization is often advantageous when compared to conventional density averaging or simple time-shift warping. Our approach complements previous quantile normalization methods used for microarray expression data and is illustrated with both longevity data obtained for 54 cohorts of mexflies (Mexican fruit flies) and gene expression data of the Ts1Cje mouse study for Down syndrome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号