首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
一种基于微阵列数据的集成分类方法*   总被引:1,自引:0,他引:1  
针对现有的微阵列数据集成分类方法分类精度不高这一问题,提出了一种Bagging-PCA-SVM方法。该方法首先采用Bootstrap技术对训练样本集重复取样,构成大量训练样本子集,然后在每个子集上进行特征选择和主成分分析以消除噪声基因与冗余基因;最后利用支持向量机作为分类器,采用多数投票的方法预测样本的类属。通过三个数据集进行了测试,测试结果表明了该方法的有效性和可行性。  相似文献   

2.
Multimedia Tools and Applications - Data is undoubtedly one of the most significant assets in the current competitive era and to ensure its value is retained, data safety emerges as a principle...  相似文献   

3.
Abstract: In this work an entropic filtering algorithm (EFA) for feature selection is described, as a workable method to generate a relevant subset of genes. This is a fast feature selection method based on finding feature subsets that jointly maximize the normalized multivariate conditional entropy with respect to the classification ability of tumours. The EFA is tested in combination with several machine learning algorithms on five public domain microarray data sets. It is found that this combination offers subsets yielding similar or much better accuracies than using the full set of genes. The solutions obtained are of comparable quality to previous results, but they are obtained in a maximum of half an hour computing time and use a very low number of genes.  相似文献   

4.
Microarray technology presents a challenge due to the large dimensionality of the data, which can be difficult to interpret. To address this challenge, the article proposes a feature extraction-based cancer classification technique coupled with artificial bee colony optimization (ABC) algorithm. The ABC-support vector machine (SVM) method is used to classify the lung cancer datasets and compared them with existing techniques in terms of precision, recall, F-measure, and accuracy. The proposed ABC-SVM has the advantage of dealing with complex nonlinear data, providing good flexibility. Simulation analysis was conducted with 30% of the data reserved for testing the proposed method. The results indicate that the proposed attribute classification technique, which uses fewer genes, performs better than other modalities. The classifiers, such as naïve Bayes, multi-class SVM, and linear discriminant analysis, were also compared and the proposed method outperformed these classifiers and state-of-the-art techniques. Overall, this study demonstrates the potential of using intelligent algorithms and feature extraction techniques to improve the accuracy of cancer diagnosis using microarray gene expression data.  相似文献   

5.
MEMS (micro-electro-mechanical-system) IMU (inertial measurement unit) sensors are characteristically noisy and this presents a serious problem to their effective use. The Kalman filter assumes zero-mean Gaussian process and measurement noise variables, and then recursively computes optimal state estimates. However, establishing the exact noise statistics is a non-trivial task. Additionally, this noise often varies widely in operation. Addressing this challenge is the focus of adaptive Kalman filtering techniques. In the covariance scaling method, the process and measurement noise covariance matrices Q and R are uniformly scaled by a scalar-quantity attenuating window. This study proposes a new approach where individual elements of Q and R are scaled element-wise to ensure more granular adaptation of noise components and hence improve accuracy. In addition, the scaling is performed over a smoothly decreasing window to balance aggressiveness of response and stability in steady state. Experimental results show that the root mean square errors for both pith and roll axes are significantly reduced compared to the conventional noise adaptation method, albeit at a slightly higher computational cost. Specifically, the root mean square pitch errors are 1.1? under acceleration and 2.1? under rotation, which are significantly less than the corresponding errors of the adaptive complementary filter and conventional covariance scaling-based adaptive Kalman filter tested under the same conditions.  相似文献   

6.
Gene expression technology, namely microarrays, offers the ability to measure the expression levels of thousands of genes simultaneously in biological organisms. Microarray data are expected to be of significant help in the development of an efficient cancer diagnosis and classification platform. A major problem in these data is that the number of genes greatly exceeds the number of tissue samples. These data also have noisy genes. It has been shown in literature reviews that selecting a small subset of informative genes can lead to improved classification accuracy. Therefore, this paper aims to select a small subset of informative genes that are most relevant for cancer classification. To achieve this aim, an approach using two hybrid methods has been proposed. This approach is assessed and evaluated on two well-known microarray data sets, showing competitive results. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

7.

Microarray gene expression profile shall be exploited for the efficient and effective classification of cancers. This is a computationally challenging task because of large quantity of genes and relatively small amount of experiments in gene expression data. The repercussion of this work is to devise a framework of techniques based on supervised machine learning for discrimination of acute lymphoblastic leukemia and acute myeloid leukemia using microarray gene expression profiles. Artificial neural network (ANN) technique was employed for this classification. Moreover, ANN was compared with other five machine learning techniques. These methods were assessed on eight different classification performance measures. This article reports a significant classification accuracy of 98% using ANN with no error in identification of acute lymphoblastic leukemia and only one error in identification of acute myeloid leukemia on tenfold cross-validation and leave-one-out approach. Furthermore, models were validated on independent test data, and all samples were correctly classified.

  相似文献   

8.
Cancer classification is one of the major applications of the microarray technology. When standard machine learning techniques are applied for cancer classification, they face the small sample size (SSS) problem of gene expression data. The SSS problem is inherited from large dimensionality of the feature space (due to large number of genes) compared to the small number of samples available. In order to overcome the SSS problem, the dimensionality of the feature space is reduced either through feature selection or through feature extraction. Linear discriminant analysis (LDA) is a well-known technique for feature extraction-based dimensionality reduction. However, this technique cannot be applied for cancer classification because of the singularity of the within-class scatter matrix due to the SSS problem. In this paper, we use Gradient LDA technique which avoids the singularity problem associated with the within-class scatter matrix and shown its usefulness for cancer classification. The technique is applied on three gene expression datasets; namely, acute leukemia, small round blue-cell tumour (SRBCT) and lung adenocarcinoma. This technique achieves lower misclassification error as compared to several other previous techniques.  相似文献   

9.
Microarray data are often characterized by high dimension and small sample size. There is a need to reduce its dimension for better classification performance and computational efficiency of the learning model. The minimum redundancy and maximum relevance (mRMR), which is widely explored to reduce the dimension of the data, requires discretization and setting of external parameters. We propose an incremental formulation of the trace of ratio of the scatter matrices to determine a relevant set of genes which does not involve discretization and external parameter setting. It is analytically shown that the proposed incremental formulation is computationally efficient in comparison to its batch formulation. Extensive experiments on 14 well-known available microarray cancer datasets demonstrate that the performance of the proposed method is better in comparison to the well-known mRMR method. Statistical tests also show that the proposed method is significantly better when compared to the mRMR method.  相似文献   

10.
In this paper, the problem of classifying the quality of microarray data spots is addressed, using concepts derived from the supervised learning theory. The proposed method, after extracting spots from the microarray image, computes several features, which take into account shape, color and variability. The features are classified using support vector machines, a recent statistical classification technique that is being employed widely. The proposed method does not make any assumptions on the problem and does not require any a priori information. The proposed system has been tested in a real case, for several different parameters’ configurations. Experimental results show the effectiveness of the proposed approach, also in comparison with state-of-the-art methods.  相似文献   

11.
SVM在基因微阵列癌症数据分类中的应用   总被引:1,自引:0,他引:1  
在总结二分类支持向量机应用的基础上,提出了利用t-验证方法和Wilcoxon验证方法进行特征选取,以支持向量机(SVM)为分类器,针对基因微阵列癌症数据进行分析的新方法,通过对白血病数据集和结肠癌数据集的分类实验,证明提出的方法不但识别率高,而且需要选取的特征子集小,分类速度快,提高了分类的准确性与分类速度。  相似文献   

12.
A novel ensemble of classifiers for microarray data classification   总被引:1,自引:0,他引:1  
Yuehui  Yaou   《Applied Soft Computing》2008,8(4):1664-1669
Micorarray data are often extremely asymmetric in dimensionality, such as thousands or even tens of thousands of genes and a few hundreds of samples. Such extreme asymmetry between the dimensionality of genes and samples presents several challenges to conventional clustering and classification methods. In this paper, a novel ensemble method is proposed. Firstly, in order to extract useful features and reduce dimensionality, different feature selection methods such as correlation analysis, Fisher-ratio is used to form different feature subsets. Then a pool of candidate base classifiers is generated to learn the subsets which are re-sampling from the different feature subsets with PSO (Particle Swarm Optimization) algorithm. At last, appropriate classifiers are selected to construct the classification committee using EDAs (Estimation of Distribution Algorithms). Experiments show that the proposed method produces the best recognition rates on four benchmark databases.  相似文献   

13.
In this paper, multiplier-less nearly perfect reconstruction tree structured non-uniform filter banks (NUFB) are proposed. When sharp transition width filter banks are to be implemented, the order of the filters and hence the complexity will become very high. The filter banks employ an iterative algorithm which adjusts the cut off frequencies of the prototype filter, to reduce the amplitude distortion. It is found that the proposed design method, in which the prototype filter is designed by the frequency response masking method, gives better results when compared to the earlier reported results, in terms of the number of multipliers when sharp transition width filter banks are needed. To reduce the complexity and power consumption for hardware realization, a design method which makes the NUFB totally multiplier-less is also proposed in this paper. The NUFB is made multiplier-less by converting the continuous filter bank coefficients to finite precision coefficients in the signed power of two space. The filter bank with finite precision coefficients may lead to performance degradation. This calls for the use of suitable optimization techniques. The classical gradient based optimization techniques cannot be deployed here, because the search space consists of only integers. In this context, meta-heuristic algorithm is a good choice as it can be tailor made to suit the problem under consideration. Thus, this design method results in near perfect NUFBs which are simple and multiplier-less and have linear phase and sharp transition width with very low aliasing. Also, different non-uniform bands can be obtained from the tree structured filter bank by rearranging the branches.  相似文献   

14.
Knowledge gained through classification of microarray gene expression data is increasingly important as they are useful for phenotype classification of diseases. Different from black box methods, fuzzy expert system can produce interpretable classifier with knowledge expressed in terms of if-then rules and membership function. This paper proposes a novel Genetic Swarm Algorithm (GSA) for obtaining near optimal rule set and membership function tuning. Advanced and problem specific genetic operators are proposed to improve the convergence of GSA and classification accuracy. The performance of the proposed approach is evaluated using six gene expression data sets. From the simulation study it is found that the proposed approach generated a compact fuzzy system with high classification accuracy for all the data sets when compared with other approaches.  相似文献   

15.
在生物信息学中,一个重要的问题是基于微芯片技术将肿瘤分类到不同的类别中去。和许多传统的分类问题相比,这个问题的主要困难是基因空间的维数很高,而要分类的样本数量很小。非负矩阵分解(NMF)在微芯片数据聚类问题中已经成功地解决了这个问题。将非负矩阵分解拓展到数据分类,尤其是肿瘤分类中去取得了很好的效果。基于非负矩阵分解的方法有三个优点:良好的分类成绩,无参数和良好的可解释性。  相似文献   

16.
基于遗传算法的结肠癌基因选择与样本分类   总被引:2,自引:1,他引:1       下载免费PDF全文
提出了一种基于两轮遗传算法的用于结肠癌微阵列数据基因选择与样本分类的新方法。该方法先根据基因的Bhattacharyya距离指标过滤大部分与分类不相关的基因,而后使用结合了遗传算法和CFS(Correlation-based Feature Selection)的GA/CFS方法选择优秀基因子集,并存档记录这些子集。根据存档子集中基因被选择的频率选择进一步搜索的候选子集,最后以结合了遗传算法和SVM的GA/SVM从候选基因子集中选择分类特征子集。把这种GA/CFS-GA/SVM方法应用到结肠癌微阵列数据,实验结果及与文献的比较表明了该方法效果良好。  相似文献   

17.
Isometric mapping (Isomap) is a popular nonlinear dimensionality reduction technique which has shown high potential in visualization and classification. However, it appears sensitive to noise or scarcity of observations. This inadequacy may hinder its application for the classification of microarray data, in which the expression levels of thousands of genes in a few normal and tumor sample tissues are measured. In this paper we propose a double-bounded tree-connected variant of Isomap, aimed at being more robust to noise and outliers when used for classification and also computationally more efficient. It differs from the original Isomap in the way the neighborhood graph is generated: in the first stage we apply a double-bounding rule that confines the search to at most k nearest neighbors contained within an ε-radius hypersphere; the resulting subgraphs are then joined by computing a minimum spanning tree among the connected components. We therefore achieve a connected graph without unnaturally inflating the values of k and ε. The computational experiences show that the new method performs significantly better in terms of accuracy with respect to Isomap, k-edge-connected Isomap and the direct application of support vector machines to data in the input space, consistently across seven microarray datasets considered in our tests.  相似文献   

18.
In this paper a new framework for feature selection consisting of an ensemble of filters and classifiers is described. Five filters, based on different metrics, were employed. Each filter selects a different subset of features which is used to train and to test a specific classifier. The outputs of these five classifiers are combined by simple voting. In this study three well-known classifiers were employed for the classification task: C4.5, naive-Bayes and IB1. The rationale of the ensemble is to reduce the variability of the features selected by filters in different classification domains. Its adequacy was demonstrated by employing 10 microarray data sets.  相似文献   

19.
Gene expression microarray is a rapidly maturing technology that provides the opportunity to assay the expression levels of thousands or tens of thousands of genes in a single experiment. We present a new heuristic to select relevant gene subsets in order to further use them for the classification task. Our method is based on the statistical significance of adding a gene from a ranked-list to the final subset. The efficiency and effectiveness of our technique is demonstrated through extensive comparisons with other representative heuristics. Our approach shows an excellent performance, not only at identifying relevant genes, but also with respect to the computational cost.  相似文献   

20.
Microarray data has significant potential in clinical medicine, which always owns a large quantity of genes relative to the samples’ number. Finding a subset of discriminatory genes (features) through intelligent algorithms has been trend. Based on this, building a disease prognosis expert system will bring a great effect on clinical medicine. In addition, the fewer the selected genes are, the less cost the disease prognosis expert system is. So the small gene set with high classification accuracy is what we need. In this paper, a multi-objective model is built according to the analytic hierarchy process (AHP), which treats the classification accuracy absolutely important than the number of selected genes. And a multi-objective heuristic algorithm called MOEDA is proposed to solve the model, which is an improvement of Univariate Marginal Distribution Algorithm. Two main rules are designed, one is ’Higher and Fewer Rule’ which is used for evaluating and sorting individuals and the other is ‘Forcibly Decrease Rule’ which is used for generate potential individuals with high classification accuracy and fewer genes. Our proposed method is tested on both binary-class and multi-class microarray datasets. The results show that the gene set selected by MOEDA not only results in higher accuracies, but also keep a small scale, which cannot only save computational time but also improve the interpretability and application of the result with the simple classification model. The proposed MOEDA opens up a new way for the heuristic algorithms applying on microarray gene expression data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号