首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
通过对朴素贝叶斯(NBC)分类器与传统的基于树扩展的贝叶斯(TAN)分类器的分析,对TAN分类器进行改进,提出CTAN分类器。朴素贝叶斯分类器对非类属性独立性进行完全独立假设,传统TAN则弱化所有属性的独立性.提出的CTAN则是通过操作TAN保留对数对部分相关属性有选择的进行弱化。CTAN改进的方向主要是对属性关系树的部分利用,通过实验证明,CTAN要优于传统TAN分类器。  相似文献   

2.
Heart failure is now widely spread throughout the world. Heart disease affects approximately 48% of the population. It is too expensive and also difficult to cure the disease. This research paper represents machine learning models to predict heart failure. The fundamental concept is to compare the correctness of various Machine Learning (ML) algorithms and boost algorithms to improve models’ accuracy for prediction. Some supervised algorithms like K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Trees (DT), Random Forest (RF), Logistic Regression (LR) are considered to achieve the best results. Some boosting algorithms like Extreme Gradient Boosting (XGBoost) and CatBoost are also used to improve the prediction using Artificial Neural Networks (ANN). This research also focuses on data visualization to identify patterns, trends, and outliers in a massive data set. Python and Scikit-learns are used for ML. Tensor Flow and Keras, along with Python, are used for ANN model training. The DT and RF algorithms achieved the highest accuracy of 95% among the classifiers. Meanwhile, KNN obtained a second height accuracy of 93.33%. XGBoost had a gratified accuracy of 91.67%, SVM, CATBoost, and ANN had an accuracy of 90%, and LR had 88.33% accuracy.  相似文献   

3.
针对Hadoop平台可能遭受的DDos攻击,需要对Hadoop DDos攻击的检测算法进行研究需要对常用的SVM、KNN、神经网络、Decision Tree、Naive Bayesian算法进行研究。文章通过搜集主机正常运行时和遭受攻击时的资源使用信息作为数据集,运用上述算法进行分析后发现,SVM对Hadoop DDos攻击检测具有高达91.75%的准确率。实验结果表明,SVM是最适合Hadoop平台DDos攻击检测的算法。  相似文献   

4.
Bayesian networks are important knowledge representation tools for handling uncertain pieces of information. The success of these models is strongly related to their capacity to represent and handle dependence relations. Some forms of Bayesian networks have been successfully applied in many classification tasks. In particular, naive Bayes classifiers have been used for intrusion detection and alerts correlation. This paper analyses the advantage of adding expert knowledge to probabilistic classifiers in the context of intrusion detection and alerts correlation. As examples of probabilistic classifiers, we will consider the well-known Naive Bayes, Tree Augmented Naïve Bayes (TAN), Hidden Naive Bayes (HNB) and decision tree classifiers. Our approach can be applied for any classifier where the outcome is a probability distribution over a set of classes (or decisions). In particular, we study how additional expert knowledge such as “it is expected that 80 % of traffic will be normal” can be integrated in classification tasks. Our aim is to revise probabilistic classifiers’ outputs in order to fit expert knowledge. Experimental results show that our approach improves existing results on different benchmarks from intrusion detection and alert correlation areas.  相似文献   

5.
In this study Forest Fire Decision Support System (FOFDESS) which is a multi-agent Decision Support System for Forest Fire has been presented. Depending on the existing meteorological state and environmental observations, FOFDESS does the fire danger rating by predicting the forest fire and it can also approximate fire spread speed and quickly detect a started fire. Some data fusion algorithms such as Artificial Neural Network (ANN), Naive Bayes Classifier (NBC), Fuzzy Switching (FS) and image processing have been used for these operations in FOFDESS. These algorithms have been brought together by a designed data fusion framework and a novel hybrid algorithm called NABNEF (Naive Bayes Aided Neural-Fuzzy Algorithm) has been improved for fire danger rating in FOFDESS. In this state, FOFDESS is an integrated system which includes the dimensions of prediction, detection and management. As a result of the experiments, it was found out that FOFDESS helped determining the most accurate strategy for fire fighting by producing effective results.  相似文献   

6.
As the importance of email increases, the amount of malicious email is also increasing, so the need for malicious email filtering is growing. Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques, we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering. Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine (SVM), Naïve Bayes, K-Nearest Neighbor(KNN), and Decision Tree) in terms of execution time and accuracy. Malicious email was filtered with MapReduce programming using the Naïve Bayes technique, which is a supervised machine learning method, in a Hadoop framework with optimized performance and also with the Python program technique with the Naïve Bayes technique applied in a bare metal server environment with the Hadoop environment not applied. According to the results of a comparison of the accuracy and predictive error rates of the two methods, the Hadoop MapReduce Naïve Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naïve Bayes method.  相似文献   

7.
The purpose of this study is to develop a clinical decision support system based on machine learning (ML) algorithms to help the diagnostic of chronic obstructive pulmonary disease (COPD) using forced oscillation (FO) measurements. To this end, the performances of classification algorithms based on Linear Bayes Normal Classifier, K nearest neighbor (KNN), decision trees, artificial neural networks (ANN) and support vector machines (SVM) were compared in order to the search for the best classifier. Four feature selection methods were also used in order to identify a reduced set of the most relevant parameters. The available dataset consists of 7 possible input features (FO parameters) of 150 measurements made in 50 volunteers (COPD, n = 25; healthy, n = 25). The performance of the classifiers and reduced data sets were evaluated by the determination of sensitivity (Se), specificity (Sp) and area under the ROC curve (AUC). Among the studied classifiers, KNN, SVM and ANN classifiers were the most adequate, reaching values that allow a very accurate clinical diagnosis (Se > 87%, Sp > 94%, and AUC > 0.95). The use of the analysis of correlation as a ranking index of the FOT parameters, allowed us to simplify the analysis of the FOT parameters, while still maintaining a high degree of accuracy. In conclusion, the results of this study indicate that the proposed classifiers may contribute to easy the diagnostic of COPD by using forced oscillation measurements.  相似文献   

8.
Detection of malicious software (malware) using machine learning methods has been explored extensively to enable fast detection of new released malware. The performance of these classifiers depends on the induction algorithms being used. In order to benefit from multiple different classifiers, and exploit their strengths we suggest using an ensemble method that will combine the results of the individual classifiers into one final result to achieve overall higher detection accuracy. In this paper we evaluate several combining methods using five different base inducers (C4.5 Decision Tree, Naïve Bayes, KNN, VFI and OneR) on five malware datasets. The main goal is to find the best combining method for the task of detecting malicious files in terms of accuracy, AUC and Execution time.  相似文献   

9.
现有的基于脚部惯性传感数据的人员运动速度估计方法只能对人员低速行走时的速度进行有效的估计。为了采用脚步惯性传感数据识别人员快速行走以及跑步时的速度,该文提出了一种利用单步统计特征进行速度识别的方法。该方法利用脚部惯性传感器对人员在不同速度下运动的惯性数据进行采集,采用峰值检测的方法对数据进行单步划分,最后从单步数据中提取65维统计特征分别采用最小二乘法(LS)、支持向量机(SVM)、K近邻(KNN)、线型贝叶斯正态分类器(LDC)4种常见的机器学习分类方法对人员运动速度进行识别。经实验验证,所建议的方法中采用SVM分类器的识别率高达96.3%,所以采用该方法可以有效的识别人员的运动速度。  相似文献   

10.
This paper evaluates the effect on the predictive accuracy of different models of two recently proposed imputation methods, namely missForest (MF) and Multiple Imputation based on Expectation-Maximization (MIEM), along with two other imputation methods: Sequential Hot-deck and Multiple Imputation based on Logistic Regression (MILR). Their effect is assessed over the classification accuracy of four different models, namely Tree Augmented Naive Bayes (TAN) which has received little attention, Naive Bayes (NB), Logistic Regression (LR), and Support Vector Machine (SVM) with Radial Basis Function (RBF) kernel. Experiments are conducted over fourteen binary datasets with large feature sets, and across a wide range of missing data rates (between 5 and 50%). The results from 10 fold cross-validations show that the performance of the imputation methods varies substantially between different classifiers and at different rates of missing values. The MIEM method is shown to generally give the best results for all the classifiers across all rates of missing data. While NB model does not benefit much from imputation compared to a no imputation baseline, LR and TAN are highly susceptible to gain from the imputation methods at higher rates of missing values. The results also show that MF works best with TAN, and Hot-deck degrades the predictive performance of SVM and NB models at high rates of missing values (over 30%). Detailed analysis of the imputation methods over the different datasets is reported. Implications of these findings on the choice of an imputation method are discussed.  相似文献   

11.
With the widespread usage of social networks, forums and blogs, customer reviews emerged as a critical factor for the customers’ purchase decisions. Since the beginning of 2000s, researchers started to focus on these reviews to automatically categorize them into polarity levels such as positive, negative, and neutral. This research problem is known as sentiment classification. The objective of this study is to investigate the potential benefit of multiple classifier systems concept on Turkish sentiment classification problem and propose a novel classification technique. Vote algorithm has been used in conjunction with three classifiers, namely Naive Bayes, Support Vector Machine (SVM), and Bagging. Parameters of the SVM have been optimized when it was used as an individual classifier. Experimental results showed that multiple classifier systems increase the performance of individual classifiers on Turkish sentiment classification datasets and meta classifiers contribute to the power of these multiple classifier systems. The proposed approach achieved better performance than Naive Bayes, which was reported the best individual classifier for these datasets, and Support Vector Machines. Multiple classifier systems (MCS) is a good approach for sentiment classification, and parameter optimization of individual classifiers must be taken into account while developing MCS-based prediction systems.  相似文献   

12.
A detailed and up-to-date land use of the urban environment is essentially required in many applications. Very high-resolution (VHR), Multispectral Scanner System (MSS) Worldview-3 (WV-3) satellite imagery provides detailed information on urban characteristics, which should be professionally mined. In this research, WV-3 was processed by machine learning (ML) methods to extract the most accurate urban features. Fuze-Go panchromatic sharpening in conjunction with atmospheric and topographic correction was initially utilized to increase the image quality and colour contrast. Three image analysis approaches including, current pixel-based image analysis (PBIA), object-based image analysis (OBIA) and new feature-based image analysis (FBIA) were implemented on WV-3 image. The k-nearest neighbour (k-NN), Naive Bayes (NB), support vector machine (SVM) classifiers were represented by PBIA, the Decision Tree (DT) classifier was examined as OBIA and the Dempster–Shafer (DS) fusion classifier was manifested for the first time as FBIA. In order to engage DS as FBIA, four types of Belief Masses, namely, Precision, Recall, Overall Accuracy, and kappa coefficient (?) were implemented and compared to assign the most likelihood urban features. All the applied classifiers were also trained on the first site and then tested on another site to examine the transferability. The accuracy, reliability, and computational time of all classifiers were examined by confusion matrix and McNemar assessment. Results show improvements on the detailed urban extraction obtained using the proposed FBIA with 92.2% overall accuracy in compared with PBIA and OBIA. The FBIA result of urban extraction is more consistent when transferred to another study area and consumes much lesser time than OBIA. Also, the precision mass belief measurement achieved highest efficiency regarding receiver operating characteristic (ROC) curve rate.  相似文献   

13.
Boosting has been shown to improve the predictive performance of unstable learners such as decision trees, but not of stable learners like Support Vector Machines (SVM), k‐nearest neighbors and Naive Bayes classifiers. In addition to the model stability problem, the high time complexity of some stable learners such as SVM prohibits them from generating multiple models to form an ensemble for large data sets. This paper introduces a simple method that not only enables Boosting to improve the predictive performance of stable learners, but also significantly reduces the computational time to generate an ensemble of stable learners such as SVM for large data sets that would otherwise be infeasible. The method proposes to build local models, instead of global models; and it is the first method, to the best of our knowledge, to solve the two problems in Boosting stable learners at the same time. We implement the method by using a decision tree to define local regions and build a local model for each local region. We show that this implementation of the proposed method enables successful Boosting of three types of stable learners: SVM, k‐nearest neighbors and Naive Bayes classifiers.  相似文献   

14.
Gestational Diabetes Mellitus (GDM) is an illness that represents a certain degree of glucose intolerance with onset or first recognition during pregnancy. In the past few decades, numerous investigations were conducted upon early identification of GDM. Machine Learning (ML) methods are found to be efficient prediction techniques with significant advantage over statistical models. In this view, the current research paper presents an ensemble of ML-based GDM prediction and classification models. The presented model involves three steps such as preprocessing, classification, and ensemble voting process. At first, the input medical data is preprocessed in four levels namely, format conversion, class labeling, replacement of missing values, and normalization. Besides, four ML models such as Logistic Regression (LR), k-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest (RF) are used for classification. In addition to the above, RF, LR, KNN and SVM classifiers are integrated to perform the final classification in which a voting classifier is also used. In order to investigate the proficiency of the proposed model, the authors conducted extensive set of simulations and the results were examined under distinct aspects. Particularly, the ensemble model has outperformed the classical ML models with a precision of 94%, recall of 94%, accuracy of 94.24%, and F-score of 94%.  相似文献   

15.
Extended Naive Bayes classifier for mixed data   总被引:2,自引:0,他引:2  
Naive Bayes induction algorithm is very popular in classification field. Traditional method for dealing with numeric data is to discrete numeric attributes data into symbols. The difference of distinct discredited criteria has significant effect on performance. Moreover, several researches had recently employed the normal distribution to handle numeric data, but using only one value to estimate the population easily leads to the incorrect estimation. Therefore, the research for classification of mixed data using Naive Bayes classifiers is not very successful. In this paper, we propose a classification method, Extended Naive Bayes (ENB), which is capable for handling mixed data. The experimental results have demonstrated the efficiency of our algorithm in comparison with other classification algorithms ex. CART, DT and MLP’s.  相似文献   

16.
Bayesian Network Classifiers   总被引:154,自引:0,他引:154  
Friedman  Nir  Geiger  Dan  Goldszmidt  Moises 《Machine Learning》1997,29(2-3):131-163
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state-of-the-art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we evaluate approaches for inducing classifiers from data, based on the theory of learning Bayesian networks. These networks are factored representations of probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness that characterize naive Bayes. We experimentally tested these approaches, using problems from the University of California at Irvine repository, and compared them to C4.5, naive Bayes, and wrapper methods for feature selection.  相似文献   

17.
Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we compare and illustrate the effectiveness of two standard classifiers in the text classification literature, Naive Bayes (Multinomial) and Support Vector Machines (SVM), on the full English Wikipedia corpus for six different categories. For each category, we build training sets using subject matter experts and Wikipedia portals and then evaluate Precision/Recall values using a random sampling approach. Our results show that SVM (linear kernel) performs exceptionally across all categories, and the accuracy of Naive Bayes is inferior in some categories, whereas its generalizing capability is on par with SVM.  相似文献   

18.
The proliferation of malware has presented a serious threat to the security of computer systems. Traditional signature-based anti-virus systems fail to detect polymorphic/metamorphic and new, previously unseen malicious executables. Data mining methods such as Naive Bayes and Decision Tree have been studied on small collections of executables. In this paper, resting on the analysis of Windows APIs called by PE files, we develop the Intelligent Malware Detection System (IMDS) using Objective-Oriented Association (OOA) mining based classification. IMDS is an integrated system consisting of three major modules: PE parser, OOA rule generator, and rule based classifier. An OOA_Fast_FP-Growth algorithm is adapted to efficiently generate OOA rules for classification. A comprehensive experimental study on a large collection of PE files obtained from the anti-virus laboratory of KingSoft Corporation is performed to compare various malware detection approaches. Promising experimental results demonstrate that the accuracy and efficiency of our IMDS system outperform popular anti-virus software such as Norton AntiVirus and McAfee VirusScan, as well as previous data mining based detection systems which employed Naive Bayes, Support Vector Machine (SVM) and Decision Tree techniques. Our system has already been incorporated into the scanning tool of KingSoft’s Anti-Virus software. A short version of the paper is appeared in [33]. The work is partially supported by NSF IIS-0546280 and an IBM Faculty Research Award. The authors would also like to thank the members in the anti-virus laboratory at KingSoft Corporation for their helpful discussions and suggestions.  相似文献   

19.
For learning a Bayesian network classifier, continuous attributes usually need to be discretized. But the discretization of continuous attributes may bring information missing, noise and less sensitivity to the changing of the attributes towards class variables. In this paper, we use the Gaussian kernel function with smoothing parameter to estimate the density of attributes. Bayesian network classifier with continuous attributes is established by the dependency extension of Naive Bayes classifiers. We also analyze the information provided to a class for each attributes as a basis for the dependency extension of Naive Bayes classifiers. Experimental studies on UCI data sets show that Bayesian network classifiers using Gaussian kernel function provide good classification accuracy comparing to other approaches when dealing with continuous attributes.  相似文献   

20.
Predictive Maintenance is a type of condition-based maintenance that assesses the equipment's states and estimates its failure probability and when maintenance should be performed. Although machine learning techniques have been frequently implemented in this area, the existing studies disregard to the natural order between the target attribute values of the historical sensor data. Thus, these methods cause losing the inherent order of the data that positively affects the prediction performances. To deal with this problem, a novel approach, named Ordinal Multi-dimensional Classification (OMDC), is proposed for estimating the conditions of a hydraulic system's four components by taking into the natural order of class values. To demonstrate the prediction ability of the proposed approach, eleven different multi-dimensional classification algorithms (traditional Binary Relevance (BR), Classifier Chain (CC), Bayesian Classifier Chain (BCC), Monte Carlo Classifier Chain (MCC), Probabilistic Classifier Chain (PCC), Classifier Dependency Network (CDN), Classifier Trellis (CT), Classifier Dependency Trellis (CDT), Label Powerset (LP), Pruned Sets (PS), and Random k-Labelsets (RAKEL)) were implemented using the Ordinal Class Classifier (OCC) algorithm. Besides, seven different classification algorithms (Multilayer Perceptron (MLP), Support Vector Machine (SVM), k-Nearest Neighbour (kNN), Decision Tree (C4.5), Bagging, Random Forest (RF), and Adaptive Boosting (AdaBoost)) were chosen as base learners for the OCC algorithm. The experimental results present that the proposed OMDC approach using binary relevance multi-dimensional classification methods predicts the conditions of a hydraulic system's multiple components with high accuracy. Also, it is clearly seen from the results that the OMDC models that utilize ensemble-based classification algorithms give more reliable prediction performances with an average Hamming score of 0.853 than the others that use traditional algorithms as base learners.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号