首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
文本分类存在维数灾难、数据集噪声及特征词对分类贡献不同等问题,影响文本分类精度。为提高文本分类精度,在数据处理方面提出一种新方法。该方法首先对数据集进行去噪处理,结合特征提取算法和语义分析方法对数据实现降维,再利用词语语义相关度对文本特征向量中每个特征词赋予不同权重;并利用经过上述处理的文本数据学习分类器。实验结果表明,该文本处理方法能够有效提高文本分类精度。  相似文献   

2.
一种图像分类的多特征vague融合模型*   总被引:1,自引:0,他引:1  
针对图像分类中传统的特征融合方式所形成的巨大特征空间甚至维数灾难问题,提出了一种基于vague融合的图像分类模型。通过同时给出支持和反对的证据,运用vague集的真假隶属函数对图像分类中多特征分类器的分类结果进行决策融合,多特征分类器的分类结果得到优化和综合,从而获得更准确、更稳定的决策分类结果。实验结果表明,运用此决策融合模型是可行的,同时,图像分类准确率得到了明显提高。  相似文献   

3.
Since most cancer treatments come with a certain degree of toxicity it is very essential to identify a cancer type correctly and then administer the relevant therapy. With the arrival of powerful tools such as gene expression microarrays the cancer classification basis is slowly changing from morphological properties to molecular signatures. Several recent studies have demonstrated a marked improvement in prediction accuracy of tumor types based on gene expression microarray measurements over clinical markers. The main challenge in working with gene expression microarrays is that there is a huge number of genes to work with. Out of them only a small fraction are actually relevant for differentiating between different types of cancer. A Bayesian nearest neighbor model equipped with an integrated variable selection technique is proposed to overcome this challenge. This classification and gene selection model is able to classify different cancer types accurately and simultaneously identify the relevant or important genes. The proposed model is completely automatic in the sense that it adaptively picks up the neighborhood size and the important covariates. The method is successfully applied to three simulated data sets and four well known real data sets. To demonstrate the competitiveness of the method a comparative study is also done with several other “off the shelf” popular classification methods. For all the simulated data sets and real life data sets, the proposed method produced highly competitive if not better results. While the standard approach is two step model building for gene selection and then tumor prediction, this novel adaptive gene selection technique automatically selects the relevant genes along with tumor class prediction in one go. The biological relevance of the selected genes are also discussed to validate the claim.  相似文献   

4.
5.
6.
Hyperspectral imaging is gaining a significant role in agricultural remote sensing applications. Its data unit is the hyperspectral cube which holds spatial information in two dimensions while spectral band information of each pixel in the third dimension. The classification accuracy of hyperspectral images (HSI) increases significantly by employing both spatial and spectral features. For this work, the data was acquired using an airborne hyperspectral imager system which collected HSI in the visible and near-infrared (VNIR) range of 400 to 1000 nm wavelength within 180 spectral bands. The dataset is collected for nine different crops on agricultural land with a spectral resolution of 3.3 nm wavelength for each pixel. The data was cleaned from geometric distortions and stored with the class labels and annotations of global localization using the inertial navigation system. In this study, a unique pixel-based approach was designed to improve the crops' classification accuracy by using the edge-preserving features (EPF) and principal component analysis (PCA) in conjunction. The preliminary processing generated the high-dimensional EPF stack by applying the edge-preserving filters on acquired HSI. In the second step, this high dimensional stack was treated with the PCA for dimensionality reduction without losing significant spectral information. The resultant feature space (PCA-EPF) demonstrated enhanced class separability for improved crop classification with reduced dimensionality and computational cost. The support vector machines classifier was employed for multiclass classification of target crops using PCA-EPF. The classification performance evaluation was measured in terms of individual class accuracy, overall accuracy, average accuracy, and Cohen kappa factor. The proposed scheme achieved greater than 90 % results for all the performance evaluation metrics. The PCA-EPF proved to be an effective attribute for crop classification using hyperspectral imaging in the VNIR range. The proposed scheme is well-suited for practical applications of crops and landfill estimations using agricultural remote sensing methods.  相似文献   

7.
Genetic algorithms (GAs) have been used as conventional methods for classifiers to adaptively evolve solutions for classification problems. Feature selection plays an important role in finding relevant features in classification. In this paper, feature selection is explored with modular GA-based classification. A new feature selection technique, relative importance factor (RIF), is proposed to find less relevant features in the input domain of each class module. By removing these features, it is aimed to reduce the classification error and dimensionality of classification problems. Benchmark classification data sets are used to evaluate the proposed approach. The experiment results show that RIF can be used to find less relevant features and help achieve lower classification error with the feature space dimension reduced.  相似文献   

8.
丁剑  王树英 《计算机科学》2016,43(5):257-260, 293
根据时间序列数据维度高、实值有序、数据间存在自相关性等特点,对时间序列分类过程进行研究。研究了当前比较流行的时间序列分类方法;从图像处理的角度出发,提出了一种将图片信息转化为时间序列数据的ITTS方法。shapelets作为最能够表示一条时间序列的子序列,随着时间的推移,这个特征序列可能会动态地发生变化。基于这样的思想,提出了一种基于动态发现shapelets的增量式时间序列分类算法IPST。该算法能够较好地动态发现当前最优的k个shapelets,从而提高时间序列分类的准确度。 得到 的shapelets集合还可以与多个传统的分类器结合,从而获得更佳的分类效果。  相似文献   

9.
Facial expression recognition generally requires that faces be described in terms of a set of measurable features. The selection and quality of the features representing each face have a considerable bearing on the success of subsequent facial expression classification. Feature selection is the process of choosing a subset of features in order to increase classifier efficiency and allow higher classification accuracy. Many current dimensionality reduction techniques, used for facial expression recognition, involve linear transformations of the original pattern vectors to new vectors of lower dimensionality. In this paper, we present a methodology for the selection of features that uses nondominated sorting genetic algorithm-II (NSGA-II), which is one of the latest genetic algorithms developed for resolving problems with multiobjective approach with high accuracy. In the proposed feature selection process, NSGA-II optimizes a vector of feature weights, which increases the discrimination, by means of class separation. The proposed methodology is evaluated using 3D facial expression database BU-3DFE. Classification results validates the effectiveness and the flexibility of the proposed approach when compared with results reported in the literature using the same experimental settings.  相似文献   

10.
针对原始病理图像经软件提取形态学特征后存在高维度,以及医学领域上样本的少量性问题,提出ReliefF-HEPSO头颈癌病理图像特征选择算法。该算法构建了多层次降维框架,首先根据特征和类别的相关性,利用ReliefF算法确定不同的特征权重,实现初步降维。其次利用进化神经策略(ENS)丰富二进制粒子群算法(BPSO)的种群的多样性,提出混合二进制进化粒子群算法(HEPSO)对候选特征子集完成最佳特征子集的自动寻找。与7种特征选择算法的实验对比结果证明,该算法能更有效筛选出高相关性的病理图像形态学特征,实现快速降维,以较少特征获得较高分类性能。  相似文献   

11.
Classification of intrusion attacks and normal network traffic is a challenging and critical problem in pattern recognition and network security. In this paper, we present a novel intrusion detection approach to extract both accurate and interpretable fuzzy IF-THEN rules from network traffic data for classification. The proposed fuzzy rule-based system is evolved from an agent-based evolutionary framework and multi-objective optimization. In addition, the proposed system can also act as a genetic feature selection wrapper to search for an optimal feature subset for dimensionality reduction. To evaluate the classification and feature selection performance of our approach, it is compared with some well-known classifiers as well as feature selection filters and wrappers. The extensive experimental results on the KDD-Cup99 intrusion detection benchmark data set demonstrate that the proposed approach produces interpretable fuzzy systems, and outperforms other classifiers and wrappers by providing the highest detection accuracy for intrusion attacks and low false alarm rate for normal network traffic with minimized number of features.  相似文献   

12.
13.
Goyal  Neha  Kumar  Nitin  Kapil 《Multimedia Tools and Applications》2022,81(22):32243-32264

Automated plant recognition based on leaf images is a challenging task among the researchers from several fields. This task requires distinguishing features derived from leaf images for assigning class label to a leaf image. There are several methods in literature for extracting such distinguishing features. In this paper, we propose a novel automated framework for leaf identification. The proposed framework works in multiple phases i.e. pre-processing, feature extraction, classification using bagging approach. Initially, leaf images are pre-processed using image processing operations such as boundary extraction and cropping. In the feature extraction phase, popular nature inspired optimization algorithms viz. Spider Monkey Optimization (SMO), Particle Swarm Optimization (PSO) and Gray Wolf Optimization (GWO) have been exploited for reducing the dimensionality of features. In the last phase, a leaf image is classified by multiple classifiers and then output of these classifiers is combined using majority voting. The effectiveness of the proposed framework is established based on the experimental results obtained on three datasets i.e. Flavia, Swedish and self-collected leaf images. On all the datasets, it has been observed that the classification accuracy of the proposed method is better than the individual classifiers. Furthermore, the classification accuracy for the proposed approach is comparable to deep learning based method on the Flavia dataset.

  相似文献   

14.
In classification problems, a large number of features are typically used to describe the problem’s instances. However, not all of these features are useful for classification. Feature selection is usually an important pre-processing step to overcome the problem of “curse of dimensionality”. Feature selection aims to choose a small number of features to achieve similar or better classification performance than using all features. This paper presents a particle swarm Optimization (PSO)-based multi-objective feature selection approach to evolving a set of non-dominated feature subsets which achieve high classification performance. The proposed algorithm uses local search techniques to improve a Pareto front and is compared with a pure multi-objective PSO algorithm, three well-known evolutionary multi-objective algorithms and a current state-of-the-art PSO-based multi-objective feature selection approach. Their performances are examined on 12 benchmark datasets. The experimental results show that in most cases, the proposed multi-objective algorithm generates better Pareto fronts than all other methods.  相似文献   

15.
Over the last few years, the dimensionality of datasets involved in data mining applications has increased dramatically. In this situation, feature selection becomes indispensable as it allows for dimensionality reduction and relevance detection. The research proposed in this paper broadens the scope of feature selection by taking into consideration not only the relevance of the features but also their associated costs. A new general framework is proposed, which consists of adding a new term to the evaluation function of a filter feature selection method so that the cost is taken into account. Although the proposed methodology could be applied to any feature selection filter, in this paper the approach is applied to two representative filter methods: Correlation-based Feature Selection (CFS) and Minimal-Redundancy-Maximal-Relevance (mRMR), as an example of use. The behavior of the proposed framework is tested on 17 heterogeneous classification datasets, employing a Support Vector Machine (SVM) as a classifier. The results of the experimental study show that the approach is sound and that it allows the user to reduce the cost without compromising the classification error.  相似文献   

16.
Computer-aided detection/diagnosis (CAD) systems can enhance the diagnostic capabilities of physicians and reduce the time required for accurate diagnosis. The objective of this paper is to review the recent published segmentation and classification techniques and their state-of-the-art for the human brain magnetic resonance images (MRI). The review reveals the CAD systems of human brain MRI images are still an open problem. In the light of this review we proposed a hybrid intelligent machine learning technique for computer-aided detection system for automatic detection of brain tumor through magnetic resonance images. The proposed technique is based on the following computational methods; the feedback pulse-coupled neural network for image segmentation, the discrete wavelet transform for features extraction, the principal component analysis for reducing the dimensionality of the wavelet coefficients, and the feed forward back-propagation neural network to classify inputs into normal or abnormal. The experiments were carried out on 101 images consisting of 14 normal and 87 abnormal (malignant and benign tumors) from a real human brain MRI dataset. The classification accuracy on both training and test images is 99% which was significantly good. Moreover, the proposed technique demonstrates its effectiveness compared with the other machine learning recently published techniques. The results revealed that the proposed hybrid approach is accurate and fast and robust. Finally, possible future directions are suggested.  相似文献   

17.
Quality data mining analysis based on microarray gene expression data is a good approach for disease classification and other fields, such as pharmacology, as well as a useful tool for medical innovation. One of the challenges in classification is that microarrays involve high dimensionality and a large number of redundant and irrelevant features. Feature selection is the most popular method for determining the optimal number of features that will be used for classification. Feature selection is important to accelerate learning, which is represented only by the optimal feature subset. The current approach for microarray feature selection for the filter method is to simply select the top-ranked genes, i.e., keeping the 50 or 100 best-ranked genes. However, the current approach is determined by human intuition; it requires trial and error, and thus, is time-consuming. Accordingly, this study aims to propose a metaheuristic approach for selecting the top n relevant genes in drug microarray data to enhance the minimum redundancy–maximum relevance (mRMR) filter method. Three metaheuristics are applied, namely, particle swarm optimization (PSO), cuckoo search (CS), and artificial bee colony (ABC). Subsequently, k-nearest neighbor and support vector machine are used as classifiers to evaluate classification performance. The experiment used a microarray gene dataset of liver xenobiotic and pharmacological responses. Experimental results show that meta-heuristic is more efficient approaches that have reduced the complexity of the classifier. Furthermore, the results show that mRMR-CS exhibits the best performance compared with mRMR-PSO and mRMR-ABC.  相似文献   

18.
Gait recognition using multi-bipolarized contour vector   总被引:2,自引:0,他引:2  
Gait recognition has recently attracted increasing interest from the biometric community. In this paper, we propose a simple yet powerful new feature called multi-bipolarized contour vector (MBCV) for gait recognition. The proposed MBCV feature consists of four components: (1) the Vertical Positive Contour Vector, (2) the Vertical Negative Contour Vector, (3) the Horizontal Positive Contour Vector, and (4) the Horizontal Negative Contour Vector. We furthermore develop a gait recognition system based on the proposed MBCV feature. The system consists of three steps: image preprocessing including background subtraction and silhouette normalization, extraction of the MBCV feature, and classification. To reduce the dimensionality of MBCV, we use principal component analysis (PCA). To solve the classification problem, we use the Euclidean distance and a nearest neighbor (NN) approach. Finally, we fuse the proposed gait features at all levels to improve recognition performance. The proposed recognition system is applied to the well-known NLPR gait database and its effectiveness is demonstrated via comparison with previous works.  相似文献   

19.
Selecting high discriminative genes from gene expression data has become an important research. Not only can this improve the performance of cancer classification, but it can also cut down the cost of medical diagnoses when a large number of noisy, redundant genes are filtered. In this paper, a hybrid Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) method is used for gene selection, and Support Vector Machine (SVM) is adopted as the classifier. The proposed approach is tested on three benchmark gene expression datasets: Leukemia, Colon and breast cancer data. Experimental results show that the proposed method can reduce the dimensionality of the dataset, and confirm the most informative gene subset and improve classification accuracy.  相似文献   

20.
This paper combines a powerful algorithm, called Dongguang Li (DGL) global optimization, with the methods of cancer diagnosis through gene selection and microarray analysis. A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is proposed and applied to two test cancer cases, colon and leukemia. The study attempts to analyze multiple sets of genes simultaneously, for an overall global solution to the gene??s joint discriminative ability in assigning tumors to known classes. With the workable concepts and methodologies described here an accurate classification of the type and seriousness of cancer can be made. Using the orthogonal arrays for sampling and a search space reduction process, a computer program has been written that can operate on a personal laptop computer. Both the colon cancer and the leukemia microarray data can be classified 100% correctly without previous knowledge of their classes. The classification processes are automated after the gene expression data being inputted. Instead of examining a single gene at a time, the DGL method can find the global optimum solutions and construct a multi-subsets pyramidal hierarchy class predictor containing up to 23 gene subsets based on a given microarray gene expression data collection within a period of several hours. An automatically derived class predictor makes the reliable cancer classification and accurate tumor diagnosis in clinical practice possible.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号