共查询到19条相似文献,搜索用时 187 毫秒
1.
段旭 《计算机工程与设计》2011,32(11):3836-3839
一个微阵列数据集包含了成千上万的基因、相对少量的样本,而在这成千上万的基因中,只有一少部分基因对肿瘤分类是有贡献的,因此,对于肿瘤分类来说,最重要的一个问题就是识别选择出对肿瘤分类最有贡献的基因。为了能有效地进行微阵列基因选择,提出用一个边缘分布模型(marginal distribution model,MDM)来描述微阵列数据。该模型不仅能区分基因是否在两样本中差异表达,而且能区分出基因在哪一类样本中表达,从而选择出的基因更具有生物学意义。模拟数据及真实微阵列数据集上的实验结果表明,该方法能有效地进行微阵列基因选择。 相似文献
2.
高振斌 《计算机应用与软件》2019,36(8)
基因表达分析中的微阵列数据具有高维、高冗余的特点,给基因表达数据分类带来很大的困难。机器学习中的最小二乘支持向量机算法具有计算效率高的优势,从而为数据挖掘提供了一条有效途径。针对两类典型的癌症微阵列数据集(结肠癌集和白血病集),进行归一化预处理并且计算其相关系数矩阵;使用主成分分析法进行降维处理,得到用于特征选取和分类的信息基因集(各取10个基因);采用最小二乘支持向量机分类器对信息基因集进行分类。实验结果表明,该算法在两类癌症数据集上的留一交叉检验的准确率分别为97.5%和100%,具有比其他分类器都高的测试准确率,为进一步医学临床应用提供可靠的诊断依据。 相似文献
3.
基于DNA微阵列数据的癌症分类问题研究进展 总被引:1,自引:1,他引:0
应用DNA微阵列数据对癌症进行诊断与分型,已经逐渐成为生物信息学领域的研究热点之一。首先概述了基于微阵列数据的癌症分类问题的研究现状与发展趋势。然后简要介绍了微阵列实验的基本步骤,微阵列数据的结构、特点以及用于癌症分类的基本流程。接下来重点从数据预处理、特征基因选择、分类器设计以及分类性能评价等几方面对近10年来的研究成果进行了详细的综述与比较分析。最后,对该领域目前仍然存在的问题进行了归纳并对未来可能的研究方向作出了预测与展望。 相似文献
4.
基于支持向量机的微阵列基因表达数据分析方法 总被引:5,自引:0,他引:5
DNA微阵列技术,使人们可以同时观测成千上万个基因的表达水平,对其数据的分析已成为生物信息学研究的焦点.针对微阵列基因表达数据维数高、样本小、非线性的特点,设计了一种基于支持向量机的基因表达数据分类识别方法,该方法采用信噪比进行基因特征提取,运用支持向量机的不同核函数进行性能测试,针对几个典型数据集的实验表明其识别效果良好. 相似文献
5.
6.
7.
随着技术的不断进步,大量的数据可以通过微阵列芯片等传感器收集到。癌症检测中,可以使用机器学习方法分析癌症微阵列数据。但是,机器学习方法在遇到超高维度的情况时表现不佳。文章提出使用Ball相关系数和Abess方法相结合的Ball-Abess方法来解决癌症微阵列数据遇到的超高维问题。与其他分类方法相比,利用该方法能够得到更好的结果。 相似文献
8.
近年来,需要深入研究癌症细胞的基因表达技术正在不断增多。机器学习算法已经被广泛用于当今世界的许多领域,但是却很少应用于生物信息领域。系统研究了决策树的生成、修剪的原理和算法以及其它与决策树相关的问题;并且根据CAMDA2000(critical assessment of mieroarray data analysis)提供的急性淋巴白血病(ALL)和急性骨髓白血病(AML)数据集,设计并实现了一个基于ID3算法的决策树分类器,并利用后剪枝算法简化决策树。最后通过实验验证算法的有效性,实验结果表明利用该决策树分类器对白血病微阵列实验数据进行判别分析,分类准确率很高,证明了决策树算法在医学数据挖掘领域有着广泛的应用前景。 相似文献
9.
DNA微阵列技术可以同时检测细胞内成千上万的基因的活性,被广泛应用于重大基因疾病的临床诊断。然而微阵列数据通常具有高维小样本特点,且存在大量噪声和冗余基因。为了进一步提高微阵列数据分类性能,提出一种特征基因混合选择算法。首先采用ReliefF算法剔除大量无关基因,获得特征基因候选子集;然后采用基于差分进化算法优化的邻域粗糙集模型实现特征基因选择;最后利用支持向量机进行分类,以验证算法的有效性。仿真实验结果表明,该算法能用尽可能少的特征基因来获得更高的分类精度,既增强了算法的泛化性能,又提高了时间效率,而且对致病基因的临床诊断有着重要的参考意义。 相似文献
10.
分类问题是模式识别和数据挖掘领域中的重要内容,贝叶斯分类是一种经典的分类算法;LVQ神经网络在分类问题方面也有广泛的研究和应用;在实际分类应用中选择高效、稳定的分类算法非常重要。文章选用UCI机器学习仓库中的三个数据集,对贝叶斯分类和LVQ神经网络在分类问题上进行了分类正确率、稳定性、分类效率三方面的实验研究,实验结果表明,贝叶斯分类相比LVQ神经网络来说,具有更高的分类正确率、稳定性和分类效率,该结论为选择这两种分类算法提供了参考依据。 相似文献
11.
Mohd Saberi Mohamad Sigeru Omatu Safaai Deris Siti Zaiton Mohd Hashim 《Artificial Life and Robotics》2007,11(2):219-222
Gene expression data are expected to be of significant help in the development of efficient cancer diagnosis and classification
platforms. One problem arising from these data is how to select a small subset of genes from thousands of genes and a few
samples that are inherently noisy. This research aims to select a small subset of informative genes from the gene expression
data which will maximize the classification accuracy. A model for gene selection and classification has been developed by
using a filter approach, and an improved hybrid of the genetic algorithm and a support vector machine classifier. We show
that the classification accuracy of the proposed model is useful for the cancer classification of one widely used gene expression
benchmark data set. 相似文献
12.
The ability to provide thousands of gene expression values simultaneously makes microarray data very useful for phenotype classification. A major constraint in phenotype classification is that the number of genes greatly exceeds the number of samples. We overcame this constraint in two ways; we increased the number of samples by integrating independently generated microarrays that had been designed with the same biological objectives, and reduced the number of genes involved in the classification by selecting a small set of informative genes. We were able to maximally use the abundant microarray data that is being stockpiled by thousands of different research groups while improving classification accuracy. Our goal is to implement a feature (gene) selection method that can be applicable to integrated microarrays as well as to build a highly accurate classifier that permits straightforward biological interpretation. In this paper, we propose a two-stage approach. Firstly, we performed a direct integration of individual microarrays by transforming an expression value into a rank value within a sample and identified informative genes by calculating the number of swaps to reach a perfectly split sequence. Secondly, we built a classifier which is a parameter-free ensemble method using only the pre-selected informative genes. By using our classifier that was derived from large, integrated microarray sample datasets, we achieved high accuracy, sensitivity, and specificity in the classification of an independent test dataset. 相似文献
13.
Mohd Saberi Mohamad Sigeru Omatu Safaai Deris Michifumi Yoshioka 《Artificial Life and Robotics》2009,14(1):16-19
The application of microarray data for cancer classification has recently gained in popularity. The main problem that needs
to be addressed is the selection of a small subset of genes from the thousands of genes in the data that contribute to a disease.
This selection process is difficult due to the availability of a small number of samples compared with the huge number of
genes, many irrelevant genes, and noisy genes. Therefore, this article proposes an improved binary particle swarm optimization
to select a near-optimal (small) subset of informative genes that is relevant for the cancer classification. Experimental
results show that the performance of the proposed method is superior to the standard version of particle swarm optimization
(PSO) and other previous related work in terms of classification accuracy and the number of selected genes. 相似文献
14.
Mohd Saberi Mohamad Sigeru Omatu Safaai Deris Muhammad Faiz Misman Michifumi Yoshioka 《Artificial Life and Robotics》2009,13(2):414-417
Gene expression technology, namely microarrays, offers the ability to measure the expression levels of thousands of genes
simultaneously in biological organisms. Microarray data are expected to be of significant help in the development of an efficient
cancer diagnosis and classification platform. A major problem in these data is that the number of genes greatly exceeds the
number of tissue samples. These data also have noisy genes. It has been shown in literature reviews that selecting a small
subset of informative genes can lead to improved classification accuracy. Therefore, this paper aims to select a small subset
of informative genes that are most relevant for cancer classification. To achieve this aim, an approach using two hybrid methods
has been proposed. This approach is assessed and evaluated on two well-known microarray data sets, showing competitive results.
This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January
31–February 2, 2008 相似文献
15.
Fernando Díaz Florentino Fdez-Riverola Juan M. Corchado 《Computational Intelligence》2006,22(3-4):254-268
Gene expression profiles are composed of thousands of genes at the same time, representing the complex relationships between them. One of the well-known constraints specifically related to microarray data is the large number of genes in comparison with the small number of available experiments or cases. In this context, the ability of design methods capable of overcoming current limitations of state-of-the-art algorithms is crucial to the development of successful applications. This paper presents gene -CBR, a hybrid model that can perform cancer classification based on microarray data. The system employs a case-based reasoning model that incorporates a set of fuzzy prototypes, a growing cell structure network and a set of rules to provide an accurate diagnosis. The hybrid model has been implemented and tested with microarray data belonging to bone marrow cases from forty-three adult patients with cancer plus a group of six cases corresponding to healthy persons. 相似文献
16.
Abstract: Cancer classification, through gene expression data analysis, has produced remarkable results, and has indicated that gene expression assays could significantly aid in the development of efficient cancer diagnosis and classification platforms. However, cancer classification, based on DNA array data, remains a difficult problem. The main challenge is the overwhelming number of genes relative to the number of training samples, which implies that there are a large number of irrelevant genes to be dealt with. Another challenge is from the presence of noise inherent in the data set. It makes accurate classification of data more difficult when the sample size is small. We apply genetic algorithms (GAs) with an initial solution provided by t statistics, called t‐GA, for selecting a group of relevant genes from cancer microarray data. The decision‐tree‐based cancer classifier is built on the basis of these selected genes. The performance of this approach is evaluated by comparing it to other gene selection methods using publicly available gene expression data sets. Experimental results indicate that t‐GA has the best performance among the different gene selection methods. The Z‐score figure also shows that some genes are consistently preferentially chosen by t‐GA in each data set. 相似文献
17.
Mohd Saberi Mohamad Sigeru Omatu Safaai Deris Muhammad Faiz Misman Michifumi Yoshioka 《Artificial Life and Robotics》2009,13(2):410-413
A microarray machine offers the capacity to measure the expression levels of thousands of genes simultaneously. It is used
to collect information from tissue and cell samples regarding gene expression differences that could be useful for cancer
classification. However, the urgent problems in the use of gene expression data are the availability of a huge number of genes
relative to the small number of available samples, and the fact that many of the genes are not relevant to the classification.
It has been shown that selecting a small subset of genes can lead to improved accuracy in the classification. Hence, this
paper proposes a solution to the problems by using a multiobjective strategy in a genetic algorithm. This approach was tried
on two benchmark gene expression data sets. It obtained encouraging results on those data sets as compared with an approach
that used a single-objective strategy in a genetic algorithm.
This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January
31–February 2, 2008 相似文献
18.
基于基因表达谱对组织样本进行分类,在疾病诊断领域,是个非常重要的研究课题。在基因表达数据中,基因的数量(几千个)相对于数据样本(几十个)的个数通常比较多;也就是说,数据的维数相比于数据点的个数来说比较高(这个就是采样不足问题)。过高的维数(特征或基因数)将给分类问题带来极大的挑战。提出了结合非相关线性判别式分析方法(ULDA)和支持向量机(SVM)分类算法,对结肠癌组织样本进行分类识别,并同其他方法作了比较研究,分类效果得到了提高;结果表明了该方法的可行性和有效性。 相似文献
19.
Cancer classification using ensemble of neural networks with multiple significant gene subsets 总被引:2,自引:1,他引:2
Molecular level diagnostics based on microarray technologies can offer the methodology of precise, objective, and systematic
cancer classification. Genome-wide expression patterns generally consist of thousands of genes. It is desirable to extract
some significant genes for accurate diagnosis of cancer because not all genes are associated with a cancer. In this paper,
we have used representative gene vectors that are highly discriminatory for cancer classes and extracted multiple significant
gene subsets based on those representative vectors respectively. Also, an ensemble of neural networks learned from the multiple
significant gene subsets is proposed to classify a sample into one of several cancer classes. The performance of the proposed
method is systematically evaluated using three different cancer types: Leukemia, colon, and B-cell lymphoma. 相似文献