首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Software defects, produced inevitably in software projects, seriously affect the efficiency of software testing and maintenance. An appealing solution is the software defect prediction (SDP) that has achieved good performance in many software projects. However, the difference between features and the difference of the same feature between training data and test data may degrade defect prediction performance if such differences violate the model's assumption. To address this issue, we propose a SDP method based on feature transfer learning (FTL), which performs a transformation sequence for each feature in order to map the original features to another feature space. Specifically, FTL first uses the reinforcement learning scheme that automatically learns a strategy for transferring the potential feature knowledge from the training data. Then, we use the learned feature knowledge to inspire the transformation of the test data. The classifier is trained by the transformed training data and predicts defects for transformed test data. We evaluate the validity of FTL on 43 projects from PROMISE and NASA MDP using three classifiers, logistic regression, random forest, and Naive Bayes (NB). Experimental results indicate that FTL is better than the original classifiers and has the best performance on the NB classifier. For PROMISE, after using FTL, the average results of F1-score, AUC, MCC are 0.601, 0.757, and 0.350 respectively, which are 24.9%, 2.6%, and 16.7% higher than the original NB classifier results. The number of projects with improved performance accounts for 83.87%, 83.87%, and 64.52%. Similarly, FTL performs well on NASA MDP. Besides, compared with four feature engineering (FE) methods, FTL achieves an excellent improvement on most projects and the average performance is also better than or close to the FE methods.  相似文献   

2.
3.
樊康新 《计算机工程》2009,35(24):191-193
针对朴素贝叶斯(NB)分类器在分类过程中存在诸如分类模型对样本具有敏感性、分类精度难以提高等缺陷,提出一种基于多种特征选择方法的NB组合文本分类器方法。依据Boosting分类算法,采用多种不同的特征选择方法建立文本的特征词集,训练NB分类器作为Boosting迭代过程的基分类器,通过对基分类器的加权投票生成最终的NB组合文本分类器。实验结果表明,该组合分类器较单NB文本分类器具有更好的分类性能。  相似文献   

4.
A Novel Bayes Model: Hidden Naive Bayes   总被引:1,自引:0,他引:1  
Because learning an optimal Bayesian network classifier is an NP-hard problem, learning-improved naive Bayes has attracted much attention from researchers. In this paper, we summarize the existing improved algorithms and propose a novel Bayes model: hidden naive Bayes (HNB). In HNB, a hidden parent is created for each attribute which combines the influences from all other attributes. We experimentally test HNB in terms of classification accuracy, using the 36 UCI data sets selected by Weka, and compare it to naive Bayes (NB), selective Bayesian classifiers (SBC), naive Bayes tree (NBTree), tree-augmented naive Bayes (TAN), and averaged one-dependence estimators (AODE). The experimental results show that HNB significantly outperforms NB, SBC, NBTree, TAN, and AODE. In many data mining applications, an accurate class probability estimation and ranking are also desirable. We study the class probability estimation and ranking performance, measured by conditional log likelihood (CLL) and the area under the ROC curve (AUC), respectively, of naive Bayes and its improved models, such as SBC, NBTree, TAN, and AODE, and then compare HNB to them in terms of CLL and AUC. Our experiments show that HNB also significantly outperforms all of them.  相似文献   

5.
Bayesian Network Classifiers   总被引:154,自引:0,他引:154  
Friedman  Nir  Geiger  Dan  Goldszmidt  Moises 《Machine Learning》1997,29(2-3):131-163
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state-of-the-art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we evaluate approaches for inducing classifiers from data, based on the theory of learning Bayesian networks. These networks are factored representations of probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness that characterize naive Bayes. We experimentally tested these approaches, using problems from the University of California at Irvine repository, and compared them to C4.5, naive Bayes, and wrapper methods for feature selection.  相似文献   

6.
朴素Bayes分类器是一种简单有效的机器学习工具.本文用朴素Bayes分类器的原理推导出"朴素Bayes组合"公式,并构造相应的分类器.经过测试,该分类器有较好的分类性能和实用性,克服了朴素Bayes分类器精确度差的缺点,并且比其他分类器更加快速而不会显著丧失精确度.  相似文献   

7.
基于特征加权的朴素贝叶斯分类器   总被引:13,自引:0,他引:13  
程克非  张聪 《计算机仿真》2006,23(10):92-94,150
朴素贝叶斯分类器是一种广泛使用的分类算法,其计算效率和分类效果均十分理想。但是,由于其基础假设“朴素贝叶斯假设”与现实存在一定的差异,因此在某些数据上可能导致较差的分类结果。现在存在多种方法试图通过放松朴素贝叶斯假设来增强贝叶斯分类器的分类效果,但是通常会导致计算代价大幅提高。该文利用特征加权技术来增强朴素贝叶斯分类器。特征加权参数直接从数据导出,可以看作是计算某个类别的后验概率时,某个属性对于该计算的影响程度。数值实验表明,特征加权朴素贝叶斯分类器(FWNB)的效果与其他的一些常用分类算法,例如树扩展朴素贝叶斯(TAN)和朴素贝叶斯树(NBTree)等的分类效果相当,其平均错误率都在17%左右;在计算速度上,FWNB接近于NB,比TAN和NBTree快至少一个数量级。  相似文献   

8.
基于多重判别分析的朴素贝叶斯分类器   总被引:4,自引:1,他引:4  
通过分析朴素贝叶斯分类器的分类原理,并结合多重判别分析的优点,提出了一种基于多重判别分析的朴素贝叶斯分类器DANB(Discriminant Analysis Naive Bayesian classifier).将该分类方法与朴素贝叶斯分类器(Naive Bayesian classifier, NB)和TAN分类器(Tree Augmented Naive Bayesian classifier)进行实验比较,实验结果表明在大多数数据集上,DANB分类器具有较高的分类正确率.  相似文献   

9.
In this paper, an Automated Brain Image Analysis (ABIA) system that classifies the Magnetic Resonance Imaging (MRI) of human brain is presented. The classification of MRI images into normal or low grade or high grade plays a vital role for the early diagnosis. The Non-Subsampled Shearlet Transform (NSST) that captures more visual information than conventional wavelet transforms is employed for feature extraction. As the feature space of NSST is very high, a statistical t-test is applied to select the dominant directional sub-bands at each level of NSST decomposition based on sub-band energies. A combination of features that includes Gray Level Co-occurrence Matrix (GLCM) based features, Histograms of Positive Shearlet Coefficients (HPSC), and Histograms of Negative Shearlet Coefficients (HNSC) are estimated. The combined feature set is utilized in the classification phase where a hybrid approach is designed with three classifiers; k-Nearest Neighbor (kNN), Naive Bayes (NB) and Support Vector Machine (SVM) classifiers. The output of individual trained classifiers for a testing input is hybridized to take a final decision. The quantitative results of ABIA system on Repository of Molecular Brain Neoplasia Data (REMBRANDT) database show the overall improved performance in comparison with a single classifier model with accuracy of 99% for normal/abnormal classification and 98% for low and high risk classification.  相似文献   

10.
Due to being fast, easy to implement and relatively effective, some state-of-the-art naive Bayes text classifiers with the strong assumption of conditional independence among attributes, such as multinomial naive Bayes, complement naive Bayes and the one-versus-all-but-one model, have received a great deal of attention from researchers in the domain of text classification. In this article, we revisit these naive Bayes text classifiers and empirically compare their classification performance on a large number of widely used text classification benchmark datasets. Then, we propose a locally weighted learning approach to these naive Bayes text classifiers. We call our new approach locally weighted naive Bayes text classifiers (LWNBTC). LWNBTC weakens the attribute conditional independence assumption made by these naive Bayes text classifiers by applying the locally weighted learning approach. The experimental results show that our locally weighted versions significantly outperform these state-of-the-art naive Bayes text classifiers in terms of classification accuracy.  相似文献   

11.
Bayesian networks are important knowledge representation tools for handling uncertain pieces of information. The success of these models is strongly related to their capacity to represent and handle dependence relations. Some forms of Bayesian networks have been successfully applied in many classification tasks. In particular, naive Bayes classifiers have been used for intrusion detection and alerts correlation. This paper analyses the advantage of adding expert knowledge to probabilistic classifiers in the context of intrusion detection and alerts correlation. As examples of probabilistic classifiers, we will consider the well-known Naive Bayes, Tree Augmented Naïve Bayes (TAN), Hidden Naive Bayes (HNB) and decision tree classifiers. Our approach can be applied for any classifier where the outcome is a probability distribution over a set of classes (or decisions). In particular, we study how additional expert knowledge such as “it is expected that 80 % of traffic will be normal” can be integrated in classification tasks. Our aim is to revise probabilistic classifiers’ outputs in order to fit expert knowledge. Experimental results show that our approach improves existing results on different benchmarks from intrusion detection and alert correlation areas.  相似文献   

12.
一种限定性的双层贝叶斯分类模型   总被引:28,自引:1,他引:28  
朴素贝叶斯分类模型是一种简单而有效的分类方法,但它的属性独立性假设使其无法表达属性变量间存在的依赖关系,影响了它的分类性能.通过分析贝叶斯分类模型的分类原则以及贝叶斯定理的变异形式,提出了一种基于贝叶斯定理的新的分类模型DLBAN(double-level Bayesian network augmented naive Bayes).该模型通过选择关键属性建立属性之间的依赖关系.将该分类方法与朴素贝叶斯分类器和TAN(tree augmented naive Bayes)分类器进行实验比较.实验结果表明,在大多数数据集上,DLBAN分类方法具有较高的分类正确率.  相似文献   

13.
Reports of traffic accidents show that a considerable percentage of the accidents are caused by human factors. Human-centric driver assistance systems, with integrated sensing, processing and networking, aim to find solutions to this problem and other relevant issues. The key technology in such systems is the capability to automatically understand and characterize driver behaviors. In this paper, we propose a novel, efficient feature extraction approach for driving postures from a video camera, which consists of Homomorphic filter, skin-like regions segmentation, canny edge detection, connected regions detection, small connected regions deletion and spatial scale ratio calculation. With features extracted from a driving posture dataset we created at Southeast University (SEU), holdout and cross-validation experiments on driving posture classification are then conducted using Bayes classifier. Compared with a number of commonly used classification methods including naive Bayes classifier, subspace classifier, linear perception classifier and Parzen classifier, the holdout and cross-validation experiments show that the Bayes classifier offers better classification performance than the other four classifiers. Among the four predefined classes, i.e., grasping the steering wheel, operating the shift gear, eating a cake and talking on a cellular phone, the class of talking on a cellular phone is the most difficult to classify. With Bayes classifier, the classification accuracies of talking on a cellular phone are over 90 % in holdout and cross-validation experiments, which shows the effectiveness of the proposed feature extraction method and the importance of Bayes classifier in automatically understanding and characterizing driver behaviors towards human-centric driver assistance systems.  相似文献   

14.
朴素贝叶斯分类器是一种简单而高效的分类器,但是其属性独立性假设限制了对实际数据的应用。提出一种新的算法,该算法为避免数据预处理时,训练集的噪声及数据规模使属性约简的效果不太理想,并进而影响分类效果,在训练集上通过随机属性选取生成若干属性子集,并以这些子集构建相应的贝叶斯分类器,进而采用遗传算法进行优选。实验表明,与传统的朴素贝叶斯方法相比,该方法具有更好的分类精度。  相似文献   

15.
NB方法条件独立性假设和BAN方法小训练集难以建模。为此,提出一种基于贝叶斯学习的集成流量分类方法。构造单独的NB和BAN分类器,在此基础上利用验证集得到各分类器的权重,通过加权平均组合各分类器的输出,实现网络流量分类。以Moore数据集为实验数据,并与NB方法和BAN方法相比较,结果表明,该方法具有更高的分类准确率和稳定性。  相似文献   

16.
Skin cancer is usually classified as melanoma and non-melanoma. Melanoma now represents 75% of humans passing away worldwide and is one of the most brutal types of cancer. Previously, studies were not mainly focused on feature extraction of Melanoma, which caused the classification accuracy. However, in this work, Histograms of orientation gradients and local binary patterns feature extraction procedures are used to extract the important features such as asymmetry, symmetry, boundary irregularity, color, diameter, etc., and are removed from both melanoma and non-melanoma images. This proposed Efficient Classification Systems for the Diagnosis of Melanoma (ECSDM) framework consists of different schemes such as preprocessing, segmentation, feature extraction, and classification. We used Machine Learning (ML) and Deep Learning (DL) classifiers in the classification framework. The ML classifier is Naïve Bayes (NB) and Support Vector Machines (SVM). And also, DL classification framework of the Convolution Neural Network (CNN) is used to classify the melanoma and benign images. The results show that the Neural Network (NNET) classifier’ achieves 97.17% of accuracy when contrasting with ML classifiers.  相似文献   

17.
Some Effective Techniques for Naive Bayes Text Classification   总被引:3,自引:0,他引:3  
While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain. In this paper, we propose two empirical heuristics: per-document text normalization and feature weighting method. While these are somewhat ad hoc methods, our proposed naive Bayes text classifier performs very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM  相似文献   

18.
The presence of complex distributions of samples concealed in high-dimensional, massive sample-size data challenges all of the current classification methods for data mining. Samples within a class usually do not uniformly fill a certain (sub)space but are individually concentrated in certain regions of diverse feature subspaces, revealing the class dispersion. Current classifiers applied to such complex data inherently suffer from either high complexity or weak classification ability, due to the imbalance between flexibility and generalization ability of the discriminant functions used by these classifiers. To address this concern, we propose a novel representation of discriminant functions in Bayesian inference, which allows multiple Bayesian decision boundaries per class, each in its individual subspace. For this purpose, we design a learning algorithm that incorporates the naive Bayes and feature weighting approaches into structural risk minimization to learn multiple Bayesian discriminant functions for each class, thus combining the simplicity and effectiveness of naive Bayes and the benefits of feature weighting in handling high-dimensional data. The proposed learning scheme affords a recursive algorithm for exploring class density distribution for Bayesian estimation, and an automated approach for selecting powerful discriminant functions while keeping the complexity of the classifier low. Experimental results on real-world data characterized by millions of samples and features demonstrate the promising performance of our approach.  相似文献   

19.
In statistical image classification it is usually assumed that feature observations given class labels are independently distributed. Even in the case when training sample is formed by dependent feature observations, the feature observations to be classified are usually assumed to be independent from training sample. In this paper we propose the original method of the incorporation of spatial information into the per-pixel classifiers. Our approach is based on the retraction of the independence assumption by proposing stationary Gaussian random field (GRF) model for features. The conditional distribution of class label of observation to be classified is assumed to be dependent on its spatial adjacency within the spatial framework of the training sample. For a given training sample, plug-in version of the Bayes discriminant function (PBDF) is proposed for classification. Performance of the proposed PBDF is tested and compared with ones ignoring dependence among feature observations to be classified and training sample. For illustration the image of figure corrupted by the additive GRF is analyzed. The advantage of the proposed classifier against the competing one is shown visually and numerically in the first example. In the second example, three spatial sampling designs for training data are compared on the basis of the actual error rate values of the proposed PBDF. For the remotely sensed image, the advantage of the proposed classification method against popular unsupervised classification method is shown in terms of visual evaluation and empirical errors of misclassification.  相似文献   

20.
Web page classification has become a challenging task due to the exponential growth of the World Wide Web. Uniform Resource Locator (URL)‐based web page classification systems play an important role, but high accuracy may not be achievable as URL contains minimal information. Nevertheless, URL‐based classifiers along with rejection framework can be used as a first‐level filter in a multistage classifier, and a costlier feature extraction from contents may be done in later stages. However, noisy and irrelevant features present in URL demand feature selection methods for URL classification. Therefore, we propose a supervised feature selection method by which relevant URL features are identified using statistical methods. We propose a new feature weighting method for a Naive Bayes classifier by embedding the term goodness obtained from the feature selection method. We also propose a rejection framework to the Naive Bayes classifier by using posterior probability for determining the confidence score. The proposed method is evaluated on the Open Directory Project and WebKB data sets. Experimental results show that our method can be an effective first‐level filter. McNemar tests confirm that our approach significantly improves the performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号