首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Cancer classification is one of the main steps during patient healing process. This fact enforces modern clinical researchers to use advanced bioinformatics methods for cancer classification. Cancer classification is usually performed using gene expression data gained in microarray experiment and advanced machine learning methods. Microarray experiment generates huge amount of data, and its processing via machine learning methods represents a big challenge. In this study, two-step classification paradigm which merges genetic algorithm feature selection and machine learning classifiers is utilized. Genetic algorithm is built in MapReduce programming spirit which makes this algorithm highly scalable for Hadoop cluster. In order to improve the performance of the proposed algorithm, it is extended into a parallel algorithm which process on microarray data in distributed manner using the Hadoop MapReduce framework. In this paper, the algorithm was tested on eleven GEMS data sets (9 tumors, 11 tumors, 14 tumors, brain tumor 1, lung cancer, brain tumor 2, leukemia 1, DLBCL, leukemia 2, SRBCT, and prostate tumor) and its accuracy reached 100% for less than 25 selected features. The proposed cloud computing-based MapReduce parallel genetic algorithm performed well on gene expression data. In addition, the scalability of the suggested algorithm is unlimited because of underlying Hadoop MapReduce platform. The presented results indicate that the proposed method can be effectively implemented for real-world microarray data in the cloud environment. In addition, the Hadoop MapReduce framework demonstrates substantial decrease in the computation time.

  相似文献   

2.
基于基因表达谱建立具有有效预测性的肿瘤分类模型对肿瘤的临床诊断与治疗具有非常重要的意义。针对肿瘤亚型识别问题,所要解决的一个关键问题就是发现决定肿瘤亚型的一组特征基因子集。提出了一个组合式的肿瘤信息基因选择策略:首先从单个的样本基因信息量角度出发,采用Relief-F算法剔除分类无关基因;其次考虑样本基因间的关系,使用K-means算法过滤冗余基因,最后采用人工神经网络作为分类器来测试和评估所选出的肿瘤信息基因的分类能力。实验是在具有七种亚型的急性白血病基因表达谱数据集上完成的,其留一法准确率达到100%,表明所提出的信息基因选择方法对于多肿瘤亚型的识别问题研究是非常有效的。  相似文献   

3.
Cancer classification is one of the major applications of the microarray technology. When standard machine learning techniques are applied for cancer classification, they face the small sample size (SSS) problem of gene expression data. The SSS problem is inherited from large dimensionality of the feature space (due to large number of genes) compared to the small number of samples available. In order to overcome the SSS problem, the dimensionality of the feature space is reduced either through feature selection or through feature extraction. Linear discriminant analysis (LDA) is a well-known technique for feature extraction-based dimensionality reduction. However, this technique cannot be applied for cancer classification because of the singularity of the within-class scatter matrix due to the SSS problem. In this paper, we use Gradient LDA technique which avoids the singularity problem associated with the within-class scatter matrix and shown its usefulness for cancer classification. The technique is applied on three gene expression datasets; namely, acute leukemia, small round blue-cell tumour (SRBCT) and lung adenocarcinoma. This technique achieves lower misclassification error as compared to several other previous techniques.  相似文献   

4.

Evolutionary computing algorithms are computational intelligent systems that are used in a wide range of research applications, primarily for optimization. In this paper, an artificial neural network (ANN), a machine learning technique, is used to classify the data. The weights associated with each neuron and the architecture of the neural network always bias the output of the network model. With prior knowledge or trial and error techniques, different metrics or objectives can be used to optimise these weights. The optimization of weights using multiple objectives refers to a "multi-objective optimization problem." In this paper, an evolutionary cultural algorithm is used to optimise weights in ANN, and the results are reported with improved accuracy. Three benchmark datasets for autism screening data are used, trained, and tested for model accuracy in the classification: toddlers (1054,19), children (292,21), and adults (704,21).With the support of the domain expert, real-time data were collected from parents and caregivers and totalled over 1000 records, with a moderate difference in attributes based on CARS-2 (Childhood Autism Rating Scale, 2nd Edition) for ASD screening. In this paper, the proposed model is compared using a curve-fitting mathematical technique. The proposed model is trained and tested, and the results showed that it outperformed other algorithms in terms of precision, accuracy, sensitivity, and specificity.

  相似文献   

5.

Cataracts are the leading cause of visual impairment and blindness globally. Over the years, researchers have achieved significant progress in developing state-of-the-art machine learning techniques for automatic cataract classification and grading, aiming to prevent cataracts early and improve clinicians’ diagnosis efficiency. This survey provides a comprehensive survey of recent advances in machine learning techniques for cataract classification/grading based on ophthalmic images. We summarize existing literature from two research directions: conventional machine learning methods and deep learning methods. This survey also provides insights into existing works of both merits and limitations. In addition, we discuss several challenges of automatic cataract classification/grading based on machine learning techniques and present possible solutions to these challenges for future research.

  相似文献   

6.

This study investigates the ability of wavelet-artificial neural networks (WANN) for the prediction of short-term daily river flow. The WANN model is improved by conjunction of two methods, discrete wavelet transform and artificial neural networks (ANN) based on regression analyses, respectively. The proposed WANN models are applied to the daily flow data of Vanyar station, on the Ajichai River in the northwest region of Iran, and compared with the ANN and support vector machine (SVM) techniques. Mean square error (MSE), mean absolute error (MAE) and correlation coefficient (R) statistics are used for evaluating precision of the WANN, ANN and SVM models. Comparison results demonstrate that the WANN model performs better than the ANN and SVM models in short-term (1-, 2- and 3-day ahead) daily river flow prediction.

  相似文献   

7.
For a non-idealized machine tool, each point in the workspace is associated with a tool point positioning error vector. If this error map can be determined, then it is possible to substantially improve the positioning performance of the machine by introducing suitable compensation into the control loop. This paper explores the possibility of using an artifical neural network (ANN) to compute this mapping. The training set for the ANN is obtained by mounting a physical artifact whose dimensions are precisely known in the machine's workspace. The machine, equipped with a touch trigger probe, measures the positions of features on the artifact. The difference between the machine reading and the known dimension is the machine error at that point in the workspace. Using standard modeling techniques, the kinematic error model for a CNC turning center was developed. This model was parameterized by measurement of the parametric error functions using a laser interferometer, electronic levels and a precision square. The kinematic model was then used to simulate the artifact-measuring process and develop the ANN training set. The effect of changing artifact geometry was explored and a machining operation was simulated using the ANN output to provide compensation. The results show that the ANN is capable of learning the error map of a real machine, and that ANN-based compensation can significantly reduce part-dimensional errors.  相似文献   

8.

Stream-flow forecasting is a crucial task for hydrological science. Throughout the literature, traditional and artificial intelligence models have been applied to this task. An attempt to explore and develop better expert models is an ongoing endeavor for this hydrological application. In addition, the accuracy of modeling, confidence and practicality of the model are the other significant problems that need to be considered. Accordingly, this study investigates modern non-tuned machine learning data-driven approach, namely extreme learning machine (ELM). This data-driven approach is containing single layer feedforward neural network that selects the input variables randomly and determine the output weights systematically. To demonstrate the reliability and the effectiveness, one-step-ahead stream-flow forecasting based on three time-scale pattern (daily, mean weekly and mean monthly) for Johor river, Malaysia, were implemented. Artificial neural network (ANN) model is used for comparison and evaluation. The results indicated ELM approach superior the ANN model level accuracies and time consuming in addition to precision forecasting in tropical zone. In measureable terms, the dominance of ELM model over ANN model was indicated in accordance with coefficient determination (R 2) root-mean-square error (RMSE) and mean absolute error (MAE). The results were obtained for example the daily time scale R 2 = 0.94 and 0.90, RMSE = 2.78 and 11.63, and MAE = 0.10 and 0.43, for ELM and ANN models respectively.

  相似文献   

9.
ABSTRACT

Sea Surface Salinity (SSS) is a pre-eminent parameter in oceanology causing extreme climate and weather events such as floods and droughts. Therefore, knowledge discovery of SSS is increasingly becoming a fundamental problem in recent years. However, not only the inadequacy of in-situ SSS data in large ocean basins are hampering conduction of detailed analyses of patterning SSS variations but also conventional data-gathering techniques for SSS estimation are often too expensive and time-consuming to meet the amount of data required in SSS estimation studies. Conversely, the brand-new Soil Moisture Active-Passive (SMAP) mission could provide validated SSS data along with its main objective soil moisture retrieval. As a result, collecting a candidate data set of surface’s parameters as inputs to SSS with the aid of Pearson correlation and Boruta feature selection techniques, this paper aims to study the predictive skills of machine learning approaches to estimate SMAP radiometer SSS in the Persian Gulf region from April 2015 to April 2017. Thus, four machine learning methods including Support Vector Regression (SVR), artificial neural network (ANN), random forest (RF) and gradient boosting machine (GBM) were adopted to model the SSS. Two approaches of GBM and RF provided scarcely equivalent predictions for both the calibration and validation data sets that were distinguishably substantiated by experimental results and simulations, nonetheless, slightly superior results were attained with the GBM model by correlation coefficient (r) = 0.734, root mean squared error (RMSE) = 0.906 and mean absolute error (MAE) = 0.627. The findings demonstrate promising SSS estimation from SMAP, which could provide a baseline to perceive the large-scale changes in SSS.  相似文献   

10.
Elmidaoui  Sara  Cheikhi  Laila  Idri  Ali  Abran  Alain 《计算机科学技术学报》2020,35(5):1147-1174

Maintaining software once implemented on the end-user side is laborious and, over its lifetime, is most often considerably more expensive than the initial software development. The prediction of software maintainability has emerged as an important research topic to address industry expectations for reducing costs, in particular, maintenance costs. Researchers and practitioners have been working on proposing and identifying a variety of techniques ranging from statistical to machine learning (ML) for better prediction of software maintainability. This review has been carried out to analyze the empirical evidence on the accuracy of software product maintainability prediction (SPMP) using ML techniques. This paper analyzes and discusses the findings of 77 selected studies published from 2000 to 2018 according to the following criteria: maintainability prediction techniques, validation methods, accuracy criteria, overall accuracy of ML techniques, and the techniques offering the best performance. The review process followed the well-known systematic review process. The results show that ML techniques are frequently used in predicting maintainability. In particular, artificial neural network (ANN), support vector machine/regression (SVM/R), regression &; decision trees (DT), and fuzzy &; neuro fuzzy (FNF) techniques are more accurate in terms of PRED and MMRE. The N-fold and leave-one-out cross-validation methods, and the MMRE and PRED accuracy criteria are frequently used in empirical studies. In general, ML techniques outperformed non-machine learning techniques, e.g., regression analysis (RA) techniques, while FNF outperformed SVM/R, DT, and ANN in most experiments. However, while many techniques were reported superior, no specific one can be identified as the best.

  相似文献   

11.
In the conventional backpropagation (BP) learning algorithm used for the training of the connecting weights of the artificial neural network (ANN), a fixed slope−based sigmoidal activation function is used. This limitation leads to slower training of the network because only the weights of different layers are adjusted using the conventional BP algorithm. To accelerate the rate of convergence during the training phase of the ANN, in addition to updates of weights, the slope of the sigmoid function associated with artificial neuron can also be adjusted by using a newly developed learning rule. To achieve this objective, in this paper, new BP learning rules for slope adjustment of the activation function associated with the neurons have been derived. The combined rules both for connecting weights and slopes of sigmoid functions are then applied to the ANN structure to achieve faster training. In addition, two benchmark problems: classification and nonlinear system identification are solved using the trained ANN. The results of simulation-based experiments demonstrate that, in general, the proposed new BP learning rules for slope and weight adjustments of ANN provide superior convergence performance during the training phase as well as improved performance in terms of root mean square error and mean absolute deviation for classification and nonlinear system identification problems.  相似文献   

12.

Tailoring the muckpile shape and its fragmentation to the requirements of the excavating equipment in surface mines can significantly improve the efficiency and savings through increased production, machine life and reduced maintenance. Considering the various blast parameters together to predict the throw is subtle and can lead to wrong conclusions. In this paper, a different approach was followed to combine the representational power of multilayer neural networks and various machine learning techniques to predict the throw of a bench blast using the data from a limestone mine located in central India. Then, using various analysis techniques, the training parameters have been adjusted to reduce the cross-validation error and increase the accuracy. Here, four different architectures of neural networks have been trained by different techniques, and the best model has been selected. The different machine learning techniques have been implemented on the basis of accuracy of the output. The sensitivity analysis has been done to get the relative importance of the variables in prediction of the output.

  相似文献   

13.
The Resourcesat-2 is a highly suitable satellite for crop classification studies with its improved features and capabilities. Data from one of its sensors, the linear imaging and self-scanning (LISS IV), which has a spatial resolution of 5.8 m, was used to compare the relative accuracies achieved by support vector machine (SVM), artificial neural network (ANN), and spectral angle mapper (SAM) algorithms for the classification of various crops and non-crop covering a part of Varanasi district, Uttar Pradesh, India. The separability analysis was performed using a transformed divergence (TD) method between categories to assess the quality of training samples. The outcome of the present study indicates better performance of SVM and ANN algorithms in comparison to SAM for the classification using LISS IV sensor data. The overall accuracies obtained by SVM and ANN were 93.45% and 92.32%, respectively, whereas the lower accuracy of 74.99% was achieved using the SAM algorithm through error matrix analysis. Results derived from SVM, ANN, and SAM classification algorithms were validated with the ground truth information acquired by the field visit on the same day of satellite data acquisition.  相似文献   

14.
Vapnik  Vladimir  Izmailov  Rauf 《Machine Learning》2019,108(3):381-423

This paper introduces a new learning paradigm, called Learning Using Statistical Invariants (LUSI), which is different from the classical one. In a classical paradigm, the learning machine constructs a classification rule that minimizes the probability of expected error; it is data-driven model of learning. In the LUSI paradigm, in order to construct the desired classification function, a learning machine computes statistical invariants that are specific for the problem, and then minimizes the expected error in a way that preserves these invariants; it is thus both data- and invariant-driven learning. From a mathematical point of view, methods of the classical paradigm employ mechanisms of strong convergence of approximations to the desired function, whereas methods of the new paradigm employ both strong and weak convergence mechanisms. This can significantly increase the rate of convergence.

  相似文献   

15.
Zhang  Yong  Liu  Bo  Cai  Jing  Zhang  Suhua 《Neural computing & applications》2016,28(1):259-267

Extreme learning machine for single-hidden-layer feedforward neural networks has been extensively applied in imbalanced data learning due to its fast learning capability. Ensemble approach can effectively improve the classification performance by combining several weak learners according to a certain rule. In this paper, a novel ensemble approach on weighted extreme learning machine for imbalanced data classification problem is proposed. The weight of each base learner in the ensemble is optimized by differential evolution algorithm. Experimental results on 12 datasets show that the proposed method could achieve more classification performance compared with the simple vote-based ensemble method and non-ensemble method.

  相似文献   

16.
Bootstrap estimated true and false positive rates and ROC curve   总被引:1,自引:0,他引:1  
Diagnostic studies and new biomarkers are assessed by the estimated true and false positive rates of the classification rule. One diagnostic rule is considered for high-dimensional predictor data. Cross-validation and the leave-one-out bootstrap are discussed to estimate true and false positive rates of classifiers by the machine learning methods Adaboost, Bagging, Random Forest, (penalized) logistic regression and support vector machines. The .632+ bootstrap estimation of the misclassification error has been previously proposed to adjust the overfitting of the apparent error. This idea is generalized to the estimation of true and false positive rates. Tree-based simulation models with 8 and 50 binary non-informative variables are analysed to examine the properties of the estimators. Finally, a bootstrap estimation of receiver operating characteristic (ROC) curves is suggested and a .632+ bootstrap estimation of ROC curves is discussed. This approach is applied to high-dimensional gene expression data of leukemia and predictors of image data for glaucoma diagnosis.  相似文献   

17.
Many attempts have been made to analyze gene expression data. Typical goals of such analysis include discovery of subclasses, designing predictors/classifiers for diseases, identifying marker genes, and trying to get a deeper understanding of underlying biological process. Success of each of these tasks strongly depends on the features used to solve the problem. The high dimensional nature of expression profiles makes the task very difficult. Consequently, many researchers have used some feature selection criteria to reduce the dimensionality of the problem. These approaches are off‐line in nature, as feature selection is done in a separate phase from the system design phase. These approaches ignore the fact that utility of features depends on both the problem that is solved and the tool that is used to solve the problem. We here propose to use a novel neural scheme that picks up the necessary features on‐line when the system learns the classification task. Because it considers all the features at one go, it does not miss any subtle combination of these features. We demonstrate the effectiveness of our on‐line feature selection (OFS) scheme to distinguish between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) cancer expression data set. Our scheme could identify only five genes that can produce results as good as or even better than what is reported in the literature on this data set. It identifies an important marker gene that alone has a very good discriminating power. This analysis method is quite general in nature and can be effectively used in other areas of bioinformatics. © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 453–467, 2006.  相似文献   

18.
In this paper, the classification of the two binary bioinformatics datasets, leukemia and colon tumor, is further studied by using the recently developed neural network-based finite impulse response extreme learning machine (FIR-ELM). It is seen that a time series analysis of the microarray samples is first performed to determine the filtering properties of the hidden layer of the neural classifier with FIR-ELM for feature identification. The linear separability of the data patterns in the microarray datasets is then studied. For improving the robustness of the neural classifier against noise and errors, a frequency domain gene feature selection algorithm is also proposed. It is shown in the simulation results that the FIR-ELM algorithm has an excellent performance for the classification of bioinformatics data in comparison with many existing classification algorithms.  相似文献   

19.
We present learning of figures, nonempty compact sets in Euclidean space, based on Gold’s learning model aiming at a computable foundation for binary classification of multivariate data. Encoding real vectors with no numerical error requires infinite sequences, resulting in a gap between each real vector and its discretized representation used for the actual machine learning process. Our motivation is to provide an analysis of machine learning problems that explicitly tackles this aspect which has been glossed over in the literature on binary classification as well as in other machine learning tasks such as regression and clustering. In this paper, we amalgamate two processes: discretization and binary classification. Each learning target, the set of real vectors classified as positive, is treated as a figure. A learning machine receives discretized vectors as input data and outputs a sequence of discrete representations of the target figure in the form of self-similar sets, known as fractals. The generalization error of each output is measured by the Hausdorff metric. Using this learning framework, we reveal a hierarchy of learnable classes under various learning criteria in the track of traditional analysis based on Gold’s learning model, and show a mathematical connection between machine learning and fractal geometry by measuring the complexity of learning using the Hausdorff dimension and the VC dimension. Moreover, we analyze computability aspects of learning of figures using the framework of Type-2 Theory of Effectivity (TTE).  相似文献   

20.
This paper describes a novel system based on the machine vision and machine learning techniques for fully automated, real-time identification of constituent elements in a sample specimen using laser-induced breakdown spectroscopy (LIBS) images. The proposed system is developed as a compact spectrum analyzer for rapid element detection using a commercially available video camera. We proposed a correlation-based pattern matching algorithm for analyzing single element spectra. However, the use of a high-speed laser and presence of numerous imperfections in the experimental setup require advanced techniques for analyzing multi-element spectra. We cast the element detection problem as a multi-label classification problem that uses support vector machines and artificial neural networks for multi-element classification. The proposed algorithms were evaluated using actual LIBS images. The machine learning approaches yielded correct identification of elements to an accuracy of 99%. Our system is useful in instances where a qualitative analysis is sufficient over a quantitative element analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号