期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Prediction of novel and selective TNF-alpha converting enzyme (TACE) inhibitors and characterization of correlative molecular descriptors by machine learning approaches

Yong Cong Xue-gang Yang Wei Lv Ying Xue 《Journal of molecular graphics & modelling》2009,28(3):236-244

相似文献

2.

Land-cover classification of partly missing data using support vector machines

Arnt-Børre Salberg Robert Jenssen 《International journal of remote sensing》2013,34(14):4471-4481

Land-cover classification based on multi-temporal satellite images for scenarios where parts of the data are missing due to, for example, clouds, snow or sensor failure has received little attention in the remote-sensing literature. The goal of this article is to introduce support vector machine (SVM) methods capable of handling missing data in land-cover classification. The novelty of this article consists of combining the powerful SVM regularization framework with a recent statistical theory of missing data, resulting in a new method where an SVM is trained for each missing data pattern, and a given incomplete test vector is classified by selecting the corresponding SVM model. The SVM classifiers are evaluated on Landsat Enhanced Thematic Mapper Plus (ETM?+?) images covering a scene of Norwegian mountain vegetation. The results show that the proposed SVM-based classifier improves the classification accuracy by 5–10% compared with single image classification. The proposed SVM classifier also outperforms recent non-parametric k-nearest neighbours (k-NN) and Parzen window density-based classifiers for incomplete data by about 3%. Moreover, since the resulting SVM classifier may easily be implemented using existing SVM libraries, we consider the new method to be an attractive choice for classification of incomplete data in remote sensing. 相似文献

3.

Assessing machine-learning algorithms and image- and lidar-derived variables for GEOBIA classification of mining and mine reclamation

A.E. Maxwell T.A. Warner M.P. Strager J.F. Conley A.L. Sharp 《International journal of remote sensing》2013,34(4):954-978

相似文献

4.

A novel ship classification approach for high resolution SAR images based on the BDA-KELM classification model

Jun Wu Yu Zhu Zhicheng Wang Zhengji Song Wenhai Wang 《International journal of remote sensing》2017,38(23):6457-6476

Ship classification based on synthetic aperture radar (SAR) images is a crucial component in maritime surveillance. In this article, the feature selection and the classifier design, as two key essential factors for traditional ship classification, are jointed together, and a novel ship classification model combining kernel extreme learning machine (KELM) and dragonfly algorithm in binary space (BDA), named BDA-KELM, is proposed which conducts the automatic feature selection and searches for optimal parameter sets (including the kernel parameter and the penalty factor) for classifier at the same time. Finally, a series of ship classification experiments are carried out based on high resolution TerraSAR-X SAR imagery. Other four widely used classification models, namely k-Nearest Neighbour (k-NN), Bayes, Back Propagation neural network (BP neural network), Support Vector Machine (SVM), are also tested on the same dataset. The experimental results shows that the proposed model can achieve a better classification performance than these four widely used models with an classification accuracy as high as 97% and encouraging results of other three multi-class classification evaluation metrics. 相似文献

5.

SVOIS: Support Vector Oriented Instance Selection for text classification

Chih-Fong Tsai Che-Wei Chang 《Information Systems》2013

Automatic text classification is usually based on models constructed through learning from training examples. However, as the size of text document repositories grows rapidly, the storage requirements and computational cost of model learning is becoming ever higher. Instance selection is one solution to overcoming this limitation. The aim is to reduce the amount of data by filtering out noisy data from a given training dataset. A number of instance selection algorithms have been proposed in the literature, such as ENN, IB3, ICF, and DROP3. However, all of these methods have been developed for the k-nearest neighbor (k-NN) classifier. In addition, their performance has not been examined over the text classification domain where the dimensionality of the dataset is usually very high. The support vector machines (SVM) are core text classification techniques. In this study, a novel instance selection method, called Support Vector Oriented Instance Selection (SVOIS), is proposed. First of all, a regression plane in the original feature space is identified by utilizing a threshold distance between the given training instances and their class centers. Then, another threshold distance, between the identified data (forming the regression plane) and the regression plane, is used to decide on the support vectors for the selected instances. The experimental results based on the TechTC-100 dataset show the superior performance of SVOIS over other state-of-the-art algorithms. In particular, using SVOIS to select text documents allows the k-NN and SVM classifiers perform better than without instance selection. 相似文献

6.

Applying 1-norm SVM with squared loss to gene selection for cancer classification

Li Zhang Weida Zhou Bangjun Wang Zhao Zhang Fanzhang Li 《Applied Intelligence》2018,48(7):1878-1890

Gene selection methods available have high computational complexity. This paper applies an 1-norm support vector machine with the squared loss (1-norm SVMSL) to implement fast gene selection for cancer classification. The 1-norm SVMSL, a variant of the 1-norm support vector machine (1-norm SVM) has been proposed. Basically, the 1-norm SVMSL can perform gene selection and classification at the same. However, to improve classification performance, we only use the 1-norm SVMSL as a gene selector, and adopt a subsequent classifier to classify the selected genes. We perform extensive experiments on four DNA microarray data sets. Experimental results indicate that the 1-norm SVMSL has a very fast gene selection speed compared with other methods. For example, the 1-norm SVMSL is almost an order of magnitude faster than the 1-norm SVM, and at least four orders of magnitude faster than SVM-RFE (recursive feature elimination), a state-of-the-art method. 相似文献

7.

Feature selection for support vector machines with RBF kernel

Quanzhong Liu Chihau Chen Yang Zhang Zhengguo Hu 《Artificial Intelligence Review》2011,36(2):99-115

Linear kernel Support Vector Machine Recursive Feature Elimination (SVM-RFE) is known as an excellent feature selection algorithm. Nonlinear SVM is a black box classifier for which we do not know the mapping function F{\Phi} explicitly. Thus, the weight vector w cannot be explicitly computed. In this paper, we proposed a feature selection algorithm utilizing Support Vector Machine with RBF kernel based on Recursive Feature Elimination(SVM-RBF-RFE), which expands nonlinear RBF kernel into its Maclaurin series, and then the weight vector w is computed from the series according to the contribution made to classification hyperplane by each feature. Using w_i²{w_i^2} as ranking criterion, SVM-RBF-RFE starts with all the features, and eliminates one feature with the least squared weight at each step until all the features are ranked. We use SVM and KNN classifiers to evaluate nested subsets of features selected by SVM-RBF-RFE. Experimental results based on 3 UCI and 3 microarray datasets show SVM-RBF-RFE generally performs better than information gain and SVM-RFE. 相似文献

8.

An Evolutionary Algorithm Approach to Optimal Ensemble Classifiers for DNA Microarray Data Analysis

Kyung-Joong Kim Sung-Bae Cho 《Evolutionary Computation, IEEE Transactions on》2008,12(3):377-388

In general, the analysis of microarray data requires two steps: feature selection and classification. From a variety of feature selection methods and classifiers, it is difficult to find optimal ensembles composed of any feature-classifier pairs. This paper proposes a novel method based on the evolutionary algorithm (EA) to form sophisticated ensembles of features and classifiers that can be used to obtain high classification performance. In spite of the exponential number of possible ensembles of individual feature-classifier pairs, an EA can produce the best ensemble in a reasonable amount of time. The chromosome is encoded with real values to decide the weight for each feature-classifier pair in an ensemble. Experimental results with two well-known microarray datasets in terms of time and classification rate indicate that the proposed method produces ensembles that are superior to individual classifiers, as well as other ensembles optimized by random and greedy strategies. 相似文献

9.

基于SVM动态集成的高光谱遥感图像分类

牛鹏魏维《计算机应用》2010,30(6):1590-1593

在Bagging支持向量机(SVM)的基础上,将动态分类器集选择技术用于SVM的集成学习,研究了SVM动态集成在高光谱遥感图像分类中的应用。结合高光谱数据特性,通过随机选取特征子空间和反馈学习改进了Bagging SVM方法;通过引进加性复合距离改善了K近邻局部空间的计算方法;通过将错分的训练样本添加到验证集增强了验证集样本的代表性。实验结果表明,与单个优化的SVM和其他常见的SVM集成方法相比,改进后的SVM动态集成分类精度最高,能有效地提高高光谱遥感图像的分类精度。相似文献

10.

Metaheuristic approach for an enhanced mRMR filter method for classification using drug response microarray data

《Expert systems with applications》2017

Quality data mining analysis based on microarray gene expression data is a good approach for disease classification and other fields, such as pharmacology, as well as a useful tool for medical innovation. One of the challenges in classification is that microarrays involve high dimensionality and a large number of redundant and irrelevant features. Feature selection is the most popular method for determining the optimal number of features that will be used for classification. Feature selection is important to accelerate learning, which is represented only by the optimal feature subset. The current approach for microarray feature selection for the filter method is to simply select the top-ranked genes, i.e., keeping the 50 or 100 best-ranked genes. However, the current approach is determined by human intuition; it requires trial and error, and thus, is time-consuming. Accordingly, this study aims to propose a metaheuristic approach for selecting the top n relevant genes in drug microarray data to enhance the minimum redundancy–maximum relevance (mRMR) filter method. Three metaheuristics are applied, namely, particle swarm optimization (PSO), cuckoo search (CS), and artificial bee colony (ABC). Subsequently, k-nearest neighbor and support vector machine are used as classifiers to evaluate classification performance. The experiment used a microarray gene dataset of liver xenobiotic and pharmacological responses. Experimental results show that meta-heuristic is more efficient approaches that have reduced the complexity of the classifier. Furthermore, the results show that mRMR-CS exhibits the best performance compared with mRMR-PSO and mRMR-ABC. 相似文献

11.

Identification of cancerous gene groups from microarray data by employing adaptive genetic and support vector machine technique

Alok Kumar Shukla 《Computational Intelligence》2020,36(1):102-131

Nowadays, microarray gene expression data plays a vital role in tumor classification. However, due to the accessibility of a limited number of tissues compared to large number of genes in genomic data, various existing methods have failed to identify a small subset of discriminative genes. To overcome this limitation, in this paper, we developed a new hybrid technique for gene selection, called ensemble multipopulation adaptive genetic algorithm (EMPAGA) that can overlook the irrelevant genes and classify cancer accurately. The proposed hybrid gene selection algorithm comprises of two phase. In the first phase, an ensemble gene selection (EGS) method used to filter the noisy and redundant genes in high-dimensional datasets by combining multilayer and F-score approaches. Then, an adaptive genetic algorithm based on multipopulation strategy with support vector machine and naïve Bayes (NB) classifiers as a fitness function is applied for gene selection to select the extremely sensible genes from the reduced datasets. The performance of the proposed method is estimated on 10 microarray datasets of numerous tumor. The comprehensive results and various comparisons disclose that EGS has a remarkable impact on the efficacy of the adaptive genetic algorithm with multipopulation strategy and enhance the capability of the proposed approach in terms of convergence rate and solution quality. The experiments results demonstrate the superiority of the proposed method when compared to other standard wrappers regarding classification accuracy and optimal number of genes. 相似文献

12.

Software defect prediction using ensemble learning on selected features

《Information and Software Technology》2015

ContextSeveral issues hinder software defect data including redundancy, correlation, feature irrelevance and missing samples. It is also hard to ensure balanced distribution between data pertaining to defective and non-defective software. In most experimental cases, data related to the latter software class is dominantly present in the dataset.ObjectiveThe objectives of this paper are to demonstrate the positive effects of combining feature selection and ensemble learning on the performance of defect classification. Along with efficient feature selection, a new two-variant (with and without feature selection) ensemble learning algorithm is proposed to provide robustness to both data imbalance and feature redundancy.MethodWe carefully combine selected ensemble learning models with efficient feature selection to address these issues and mitigate their effects on the defect classification performance.ResultsForward selection showed that only few features contribute to high area under the receiver-operating curve (AUC). On the tested datasets, greedy forward selection (GFS) method outperformed other feature selection techniques such as Pearson’s correlation. This suggests that features are highly unstable. However, ensemble learners like random forests and the proposed algorithm, average probability ensemble (APE), are not as affected by poor features as in the case of weighted support vector machines (W-SVMs). Moreover, the APE model combined with greedy forward selection (enhanced APE) achieved AUC values of approximately 1.0 for the NASA datasets: PC2, PC4, and MC1.ConclusionThis paper shows that features of a software dataset must be carefully selected for accurate classification of defective components. Furthermore, tackling the software data issues, mentioned above, with the proposed combined learning model resulted in remarkable classification performance paving the way for successful quality control. 相似文献

13.

Systematic benchmarking of microarray data feature extraction and classification

《国际计算机数学杂志》2012,89(5):803-811

A combination of microarrays with classification methods is a promising approach to supporting clinical management decisions in oncology. The aim of this paper is to systematically benchmark the role of classification models. Each classification model is a combination of one feature extraction method and one classification method. We consider four feature extraction methods and five classification methods, from which 20 classification models can be derived. The feature extraction methods are t-statistics, non-parametric Wilcoxon statistics, ad hoc signal-to-noise statistics, and principal component analysis (PCA), and the classification methods are Fisher linear discriminant analysis (FLDA), the support vector machine (SVM), the k nearest-neighbour classifier (kNN), diagonal linear discriminant analysis (DLDA), and diagonal quadratic discriminant analysis (DQDA). Twenty randomizations of each of three binary cancer classification problems derived from publicly available datasets are examined. PCA plus FLDA is found to be the optimal classification model. 相似文献

14.

Dynamic Adaboost learning with feature selection based on parallel genetic algorithm for image annotation

Ran Li Jianjiang Lu Yafei Zhang Tianzhong Zhao 《Knowledge》2010,23(3):195-201

Image annotation can be formulated as a classification problem. Recently, Adaboost learning with feature selection has been used for creating an accurate ensemble classifier. We propose dynamic Adaboost learning with feature selection based on parallel genetic algorithm for image annotation in MPEG-7 standard. In each iteration of Adaboost learning, genetic algorithm (GA) is used to dynamically generate and optimize a set of feature subsets on which the weak classifiers are constructed, so that an ensemble member is selected. We investigate two methods of GA feature selection: a binary-coded chromosome GA feature selection method used to perform optimal feature subset selection, and a bi-coded chromosome GA feature selection method used to perform optimal-weighted feature subset selection, i.e. simultaneously perform optimal feature subset selection and corresponding optimal weight subset selection. To improve the computational efficiency of our approach, master-slave GA, a parallel program of GA, is implemented. k-nearest neighbor classifier is used as the base classifier. The experiments are performed over 2000 classified Corel images to validate the performance of the approaches. 相似文献

15.

Exploring the potential role of feature selection in global land-cover mapping

Le Yu Haohuan Fu Bo Wu Nicolas Clinton Peng Gong 《International journal of remote sensing》2016,37(23):5491-5504

Global land cover has been acknowledged as a fundamental variable in several global-scale studies for environment and climate change. Recent developments in global land-cover mapping focused on spatial resolution improvement with more heterogeneous features to integrate the spatial, spectral, and temporal information. Although the high dimensional input features as a whole lead to discriminatory strengths to produce more accurate land-cover maps, it comes at the cost of an increased classification complexity. The feature selection method has become a necessity for dimensionality reduction in classification with large amounts of input features. In this study, the potential of feature selection in global land-cover mapping is explored. A total of 63 features derived from the Landsat Thematic Mapper (TM) spectral bands, Moderate Resolution Imaging Spectroradiometer (MODIS) time series enhanced vegetation index (EVI) data, digital elevation model (DEM), and many climate-ecological variables and global training samples are input to k-nearest neighbours (k-NN) and Random Forest (RF) classifiers. Two filter feature selection algorithms, i.e. Relieff and max-min-associated (MNA), were employed to select the optimal subsets of features for the whole world and different biomes. The mapping accuracies with/without feature selection were evaluated by a global validation sample set. Overall, the result indicates no significant accuracy improvement in global land-cover mapping after dimensionality reduction. Nevertheless, feature selection has the capability of identifying useful features in different biomes and improves the computational efficiency, which is valuable in global-scale computing. 相似文献

16.

Modified nearest neighbour classifier for hyperspectral data classification

Mahesh Pal 《International journal of remote sensing》2013,34(24):9207-9217

A modified k-nearest neighbour (k-NN) classifier is proposed for supervised remote sensing classification of hyperspectral data. To compare its performance in terms of classification accuracy and computational cost, k-NN and a back-propagation neural network classifier were used. A classification accuracy of 91.2% was achieved by the proposed classifier with the data set used. Results from this study suggest that the accuracy achieved with this classifier is significantly better than the k-NN and comparable to a back-propagation neural network. Comparison in terms of computational cost also suggests the effectiveness of modified k-NN classifier for hyperspectral data classification. A fuzzy entropy-based filter approach was used for feature selection to compare the performance of modified and k-NN classifiers with a reduced data set. The results suggest a significant increase in classification accuracy by the modified k-NN classifier in comparison with k-NN classifier with selected features. 相似文献

17.

Modified linear discriminant analysis approaches for classification of high-dimensional microarray data

Ping Xu 《Computational statistics & data analysis》2009,53(5):1674-1687

Linear discriminant analysis (LDA) is one of the most popular methods of classification. For high-dimensional microarray data classification, due to the small number of samples and large number of features, classical LDA has sub-optimal performance corresponding to the singularity and instability of the within-group covariance matrix. Two modified LDA approaches (MLDA and NLDA) were applied for microarray classification and their performance criteria were compared with other popular classification algorithms across a range of feature set sizes (number of genes) using both simulated and real datasets. The results showed that the overall performance of the two modified LDA approaches was as competitive as support vector machines and other regularized LDA approaches and better than diagonal linear discriminant analysis, k-nearest neighbor, and classical LDA. It was concluded that the modified LDA approaches can be used as an effective classification tool in limited sample size and high-dimensional microarray classification problems. 相似文献

18.

Knowledge discovery using neural approach for SME’s credit risk analysis problem in Turkey

Gülnur Derelioğlu Fikret Gürgen 《Expert systems with applications》2011,38(8):9313-9318

This study proposes a knowledge discovery method that uses multilayer perceptron (MLP) based neural rule extraction (NRE) approach for credit risk analysis (CRA) of real-life small and medium enterprises (SMEs) in Turkey. A feature selection and extraction stage is followed by neural classification that produces accurate rule sets. In the first stage, the feature selection is achieved by decision tree (DT), recursive feature extraction with support vector machines (RFE-SVM) methods and the feature extraction is performed by factor analysis (FA), principal component analysis (PCA) methods. It is observed that the RFE-SVM approach gave the best result in terms of classification accuracy and minimal input dimension. Among various classifiers k-NN, MLP and SVM are compared in classification experiments. Then, the Continuous/Discrete Rule Extractor via Decision Tree Induction (CRED) algorithm is used to extract rules from the hidden units of a MLP for knowledge discovery. Here, the MLP makes a decision for customers as being “good” or “bad” and reveals the rules obtained at the final decision. In the experiments, Turkish SME database has 512 samples. The proposed approach validates the claim that is a viable alternative to other methods for knowledge discovery. 相似文献

19.

Feature subset selection Filter–Wrapper based on low quality data

José M. Cadenas M. Carmen Garrido Raquel Martínez 《Expert systems with applications》2013,40(16):6241-6252

Today, feature selection is an active research in machine learning. The main idea of feature selection is to choose a subset of available features, by eliminating features with little or no predictive information, as well as redundant features that are strongly correlated. There are a lot of approaches for feature selection, but most of them can only work with crisp data. Until now there have not been many different approaches which can directly work with both crisp and low quality (imprecise and uncertain) data. That is why, we propose a new method of feature selection which can handle both crisp and low quality data. The proposed approach is based on a Fuzzy Random Forest and it integrates filter and wrapper methods into a sequential search procedure with improved classification accuracy of the features selected. This approach consists of the following main steps: (1) scaling and discretization process of the feature set; and feature pre-selection using the discretization process (filter); (2) ranking process of the feature pre-selection using the Fuzzy Decision Trees of a Fuzzy Random Forest ensemble; and (3) wrapper feature selection using a Fuzzy Random Forest ensemble based on cross-validation. The efficiency and effectiveness of this approach is proved through several experiments using both high dimensional and low quality datasets. The approach shows a good performance (not only classification accuracy, but also with respect to the number of features selected) and good behavior both with high dimensional datasets (microarray datasets) and with low quality datasets. 相似文献

20.

Toward feature selection in big data preprocessing based on hybrid cloud-based model

Shehab Noha Badawy Mahmoud Ali H Arafat 《The Journal of supercomputing》2022,78(3):3226-3265

Recently, big data are widely noticed in many fields like machine learning, pattern recognition, medical, financial, and transportation fields. Data analysis is crucial to converting data into more specific information fed to the decision-making systems. With the diverse and complex types of datasets, knowledge discovery becomes more difficult. One solution is to use feature subset selection preprocessing that reduces this complexity, so the computation and analysis become convenient. Preprocessing produces a reliable and suitable source for any data-mining algorithm. The effective features’ selection can improve a model’s performance and help us understand the characteristics and underlying structure of complex data. This study introduces a novel hybrid feature selection cloud-based model for imbalanced data based on the k nearest neighbor algorithm. The proposed model showed good performance compared with the simple weighted nearest neighbor. The proposed model combines the firefly distance metric and the Euclidean distance used in the k nearest neighbor. The experimental results showed good insights in both time usage and feature weights compared with the weighted nearest neighbor. It also showed improvement in the classification accuracy by 12% compared with the weighted nearest neighbor algorithm. And using the cloud-distributed model reduced the processing time up to 30%, which is deliberated to be substantial compared with the recent state-of-the-art methods.

相似文献