首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With increasing Internet connectivity and traffic volume, recent intrusion incidents have reemphasized the importance of network intrusion detection systems for combating increasingly sophisticated network attacks. Techniques such as pattern recognition and the data mining of network events are often used by intrusion detection systems to classify the network events as either normal events or attack events. Our research study claims that the Hidden Naïve Bayes (HNB) model can be applied to intrusion detection problems that suffer from dimensionality, highly correlated features and high network data stream volumes. HNB is a data mining model that relaxes the Naïve Bayes method’s conditional independence assumption. Our experimental results show that the HNB model exhibits a superior overall performance in terms of accuracy, error rate and misclassification cost compared with the traditional Naïve Bayes model, leading extended Naïve Bayes models and the Knowledge Discovery and Data Mining (KDD) Cup 1999 winner. Our model performed better than other leading state-of-the art models, such as SVM, in predictive accuracy. The results also indicate that our model significantly improves the accuracy of detecting denial-of-services (DoS) attacks.  相似文献   

2.
Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naïve Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of text classification. We will classify the training cases with the Naïve Bayes Classifier and set different confidence threshold values for different class association rules (CARs) to different classes by the obtained classification accuracy rate of the Naïve Bayes Classifier to the classes. Since the accuracy rates of all selected CARs of the class are higher than that obtained by the Naïve Bayes Classifier, we could further optimize the classification result through these selected CARs. Moreover, for those unclassified cases, we will classify them with the Naïve Bayes Classifier. The experimental results show that combining the advantages of these two different classifiers better classification result can be obtained than with a single classifier.  相似文献   

3.
Classification problems have a long history in the machine learning literature. One of the simplest, and yet most consistently well-performing set of classifiers is the Naïve Bayes models. However, an inherent problem with these classifiers is the assumption that all attributes used to describe an instance are conditionally independent given the class of that instance. When this assumption is violated (which is often the case in practice) it can reduce classification accuracy due to “information double-counting” and interaction omission. In this paper we focus on a relatively new set of models, termed Hierarchical Naïve Bayes models. Hierarchical Naïve Bayes models extend the modeling flexibility of Naïve Bayes models by introducing latent variables to relax some of the independence statements in these models. We propose a simple algorithm for learning Hierarchical Naïve Bayes models in the context of classification. Experimental results show that the learned models can significantly improve classification accuracy as compared to other frameworks.  相似文献   

4.
This paper proposed a novel centralized hardware fault detection approach for a structured Wireless Sensor Network (WSN) based on Naïve Bayes framework. For most WSNs, power supply is the main constraint of the network because most applications are in severe situation and the sensors are equipped with battery only. In other words, the battery’s life is the network’s life. To maximize the network’s life, the proposed method, Centralized Naïve Bayes Detector (CNBD) analyzes the end-to-end transmission time collected at the sink. Thus all the computation will not be performed in individual sensor node that poses no additional power burden to the battery of each sensor node. We have conducted thorough performance evaluation. The obtained results showed better performance can be obtained under a network size of 100-node WSN simulations at various network traffic conditions and different number of faulty nodes.  相似文献   

5.
A boosting-based ensemble learning can be used to improve classification accuracy by using multiple classification models constructed to cope with errors obtained from their preceding steps. This paper proposes a method to improve boosting-based ensemble learning with penalty profiles via an application of automatic unknown word recognition in Thai language. Treating a sequential problem as a non-sequential problem, the unknown word recognition is required to include a process to rank a set of generated candidates for a potential unknown word position. To strengthen the recognition process with ensemble classification, the penalty profiles are defined to make it more efficient to construct a succeeding classification model which tends to re-rank a set of ranked candidates into a suitable order. As an evaluation, a number of alternative penalty profiles are introduced and their performances are compared for the task of extracting unknown words from a large Thai medical text. Using the Naïve Bayes as the base classifier for ensemble learning, the proposed method with the best setting achieves an accuracy of 90.19%, which is an accuracy gap of 12.88, 10.59, and 6.05 over conventional Naïve Bayes, non-ensemble version, and the flat-penalty profile.  相似文献   

6.
Hierarchical feature selection is a new research area in machine learning/data mining, which consists of performing feature selection by exploiting dependency relationships among hierarchically structured features. This paper evaluates four hierarchical feature selection methods, i.e., HIP, MR, SHSEL and GTD, used together with four types of lazy learning-based classifiers, i.e., Naïve Bayes, Tree Augmented Naïve Bayes, Bayesian Network Augmented Naïve Bayes and k-Nearest Neighbors classifiers. These four hierarchical feature selection methods are compared with each other and with a well-known “flat” feature selection method, i.e., Correlation-based Feature Selection. The adopted bioinformatics datasets consist of aging-related genes used as instances and Gene Ontology terms used as hierarchical features. The experimental results reveal that the HIP (Select Hierarchical Information Preserving Features) method performs best overall, in terms of predictive accuracy and robustness when coping with data where the instances’ classes have a substantially imbalanced distribution. This paper also reports a list of the Gene Ontology terms that were most often selected by the HIP method.  相似文献   

7.
This paper proposes an estimation method for the confidence level of feedback information (CLFI), namely the confidence level of reported information in computer integrated manufacturing (CIM) architecture for logic diagnosis. This confidence estimation provides a diagnosis module with precise reported information to quickly identify the origin of equipment failure. We studied the factors affecting CLFI, such as measurement system reliability, production context, position of sensors in the acquisition chains, type of products, reference metrology, preventive maintenance and corrective maintenance based on historical data and feedback information generated by production equipments. We introduced the new ‘CLFI’ concept based on the Dynamic Bayesian Network approach and Tree Augmented Naïve Bayes model. Our contribution includes an on-line confidence computation module for production equipment data, and an algorithm to compute CLFI. We suggest it to be applied to the semiconductor manufacturing industry.  相似文献   

8.
Naïve Bayes learners are widely used, efficient, and effective supervised learning methods for labeled datasets in noisy environments. It has been shown that naïve Bayes learners produce reasonable performance compared with other machine learning algorithms. However, the conditional independence assumption of naïve Bayes learning imposes restrictions on the handling of real-world data. To relax the independence assumption, we propose a smooth kernel to augment weights for the likelihood estimation. We then select an attribute weighting method that uses the mutual information metric to cooperate with the proposed framework. A series of experiments are conducted on 17 UCI benchmark datasets to compare the accuracy of the proposed learner against that of other methods that employ a relaxed conditional independence assumption. The results demonstrate the effectiveness and efficiency of our proposed learning algorithm. The overall results also indicate the superiority of attribute-weighting methods over those that attempt to determine the structure of the network.  相似文献   

9.
Previous studies have shown that the classification accuracy of a Naïve Bayes classifier in the domain of text-classification can often be improved using binary decompositions such as error-correcting output codes (ECOC). The key contribution of this short note is the realization that ECOC and, in fact, all class-based decomposition schemes, can be efficiently implemented in a Naïve Bayes classifier, so that—because of the additive nature of the classifier—all binary classifiers can be trained in a single pass through the data. In contrast to the straight-forward implementation, which has a complexity of O(n?t?g), the proposed approach improves the complexity to O((n+t)?g). Large-scale learning of ensemble approaches with Naïve Bayes can benefit from this approach, as the experimental results shown in this paper demonstrate.  相似文献   

10.
Naïve–Bayes Classifier (NBC) is widely used for classification in machine learning. It is considered as the first choice for many classification problems because of its simplicity and classification accuracy as compared to other supervised learning methods. However, for high dimensional data like gene expression data, it does not perform well due to two major limitations i.e. underflow and overfitting. In order to address the problem of underflow, the existing approach adopted is to add the logarithms of probabilities rather than multiplying probabilities and the estimate approach is used for providing solution to overfitting problem. However, in practice for gene expression data, these approaches do not perform well. In this paper, a novel approach has been proposed to overcome the limitations using a robust function for estimating probabilities in Naïve–Bayes Classifier. The proposed method not only resolves the limitation of NBC but also improves the classification accuracy for gene expression data. The method has been tested over several benchmark gene expression datasets of high dimension. Comparative results of proposed Robust Naïve–Bayes Classifier (R-NBC) and existing NBC for gene expression data have also been illustrated to highlight the effectiveness of the R-NBC. Simulation study has also been performed to depict the robustness of the R-NBC over the existing approaches.  相似文献   

11.
Hand gesture recognition provides an alternative way to many devices for human computer interaction. In this work, we have developed a classifier fusion based dynamic free-air hand gesture recognition system to identify the isolated gestures. Different users gesticulate at different speed for the same gesture. Hence, when comparing different samples of the same gesture, variations due to difference in gesturing speed should not contribute to the dissimilarity score. Thus, we have introduced a two-level speed normalization procedure using DTW and Euclidean distance-based techniques. Three features such as ‘orientation between consecutive points’, ‘speed’ and ‘orientation between first and every trajectory points’ were used for the speed normalization. Moreover, in feature extraction stage, 44 features were selected from the existing literatures. Use of total feature set could lead to overfitting, information redundancy and may increase the computational complexity due to higher dimension. Thus, we have tried to overcome this difficulty by selecting optimal set of features using analysis of variance and incremental feature selection techniques. The performance of the system was evaluated using this optimal set of features for different individual classifiers such as ANN, SVM, k-NN and Naïve Bayes. Finally, the decisions of the individual classifiers were combined using classifier fusion model. Based on the experimental results it may be concluded that classifier fusion provides satisfactory results compared to other individual classifiers. An accuracy of 94.78 % was achieved using the classifier fusion technique as compared to baseline CRF (85.07 %) and HCRF (89.91 %) models.  相似文献   

12.
Recent changes to plant architectural traits that influence the canopy have produced high yielding cultivars in rice, wheat and maize. In breeding programs, rapid assessments of the crop canopy and other structural traits are needed to facilitate the development of advanced cultivars in other crops such as Canola. LiDAR has the potential to provide insights into plant structural traits such as canopy height, aboveground biomass, and light penetration. These parameters all rely heavily on classifying LiDAR returns as ground or vegetation as they rely on the number of ground returns and the number of vegetation returns. The aim of this study is to propose a point classification method for canola using machine learning approach. The training and testing datasets were clusters sampled from field plots for flower, plant and ground. The supervised learning algorithms chosen are Decision Tree, Random Forest, Support Vector Machine, and Naïve Bayes. K-means Clustering was also used as an unsupervised learning algorithm. The results show that Random Forest models (error rate = 0.006%) are the most accurate to use for canola point classification, followed by Support Vector Machine (0.028%) and Decision Tree (0.169%). Naïve Bayes (2.079%) and K-means Clustering (48.806%) are not suitable for this purpose. This method provides the true ground and canopy in point clouds rather than determining ground points via a fixed height rely on the accuracy of the point clouds, subsequently gives more representative measurements of the crop canopy.  相似文献   

13.
The studies on tool condition monitoring along with digital signal processing can be used to prevent damages on cutting tools and workpieces when the tool conditions become faulty. These studies have become more relevant in today’s context where the order realization dates are crunched and deadlines are to be met in order to catch up with the competition. Based on a continuous acquisition of signals with sensor systems it is possible to classify certain wear parameters by the extraction of features. Data mining approach is extensively used to probe into structural health of the tool and the process. This paper discusses condition monitoring of carbide tipped tool using Support Vector Machine and compares the classification efficiency between C-SVC and ν-SVC. It further analyses the results with other classifiers like Decision Tree and Naïve Bayes and Bayes Net. The vibration signals are acquired for various tool conditions like tool-good condition, tip-breakage, etc. The effort is to bring out the better features-classifier combination.  相似文献   

14.
15.
Over a decade ago, Friedman et al. introduced the Tree Augmented Naïve Bayes (TAN) classifier, with experiments indicating that it significantly outperformed Naïve Bayes (NB) in terms of classification accuracy, whereas general Bayesian network (GBN) classifiers performed no better than NB. This paper challenges those claims, using a careful experimental analysis to show that GBN classifiers significantly outperform NB on datasets analyzed, and are comparable to TAN performance. It is found that the poor performance reported by Friedman et al. are not attributable to the GBN per se, but rather to their use of simple empirical frequencies to estimate GBN parameters, whereas basic parameter smoothing (used in their TAN analyses but not their GBN analyses) improves GBN performance significantly. It is concluded that, while GBN classifiers may have some limitations, they deserve greater attention, particularly in domains where insight into classification decisions, as well as good accuracy, is required.  相似文献   

16.
Despite the online availability of data, analysis of this information in academic research is arduous. This article explores the application of supervised machine learning (SML) to overcome challenges associated with online data analysis. In SML classifiers are used to categorize and code binary data. Based on a case study of Dutch employees’ work-related tweets, this paper compares the coding performance of three classifiers, Linear Support Vector Machine, Naïve Bayes, and logistic regression. The performance of these classifiers is assessed by examining accuracy, precision, recall, the area under the precision-recall curve, and Krippendorf’s Alpha. These indices are obtained by comparing the coding decisions of the classifier to manual coding decisions. The findings indicate that the Linear Support Vector Machine and Naïve Bayes classifiers outperform the logistic regression classifier. This study also compared the performance of these classifiers based on stratified random samples and random samples of training data. The findings indicate that in smaller training sets stratified random training samples perform better than random training samples, in large training sets (n = 4000) random samples yield better results. Finally, the Linear Support Vector Machine classifier was trained with 4000 tweets and subsequently used to categorize 578,581 tweets obtained from 430 employees.  相似文献   

17.

With rapid development in wireless sensor networks and continuous improvements in developing artificial intelligence-based scientific solutions, the concept of ambient assisted living has been encouraged and adopted. This is due to its widespread applications in smart homes and healthcare. In this regard, the concept of human activity recognition (HAR) & classification has drawn numerous researchers’ attention as it improves the quality of life. However, before using this concept in real-time scenarios, it is required to analyse its performance following activities of daily living using benchmarked data set. In this continuation, this work has adopted the activity classification algorithms to improve their accuracy further. These algorithms can be used as a benchmark to analyse others’ performance. Initially, the raw 3-axis accelerometer data is first preprocessed to remove noise and make it feasible for training and classification. For this purpose, the sliding window algorithm, linear and Gaussian filters have been applied to raw data. Then Naïve Bayes (NB) and Decision Tree (DT) classification algorithms are used to classify human activities such as: sitting, standing, walking, sitting down and standing up. From results, it can be seen that maximum 89.5% and 99.9% accuracies are achieved using NB and DT classifiers with Gaussian filter. Furthermore, we have also compared the obtained results with its counterpart algorithms in order to prove its effectiveness.

  相似文献   

18.
Cancer class prediction and discovery is beneficial to imperfect non-automated cancer diagnoses which affect patient cancer treatments. Serial Analysis of Gene Expression (SAGE) is a relatively new method for monitoring gene expression levels and is expected to contribute significantly to the progress in cancer treatment by enabling an automatic, precise and early diagnosis. A promising application of SAGE gene expression data is classification of cancers. In this paper, we build three event models (the multivariate Bernoulli model, the multinomial model and the normalized multinomial model) for SAGE gene expression profiles. The event models based methods are compared with the standard Naïve Bayes method. Both binary classification and multicategory classification are investigated. Experiments results on several SAGE datasets show that event models are better than standard Naïve Bayes in general. Normalized Information Gain (NIG), an extension of Information Gain (IG), is proposed for gene selection. The impact of gene correlation on the classification performance is investigated.  相似文献   

19.
Recently, activity recognition using built-in sensors in a mobile phone becomes an important issue. It can help us to provide context-aware services: health care, suitable content recommendation for a user’s activity, and user adaptive interface. This paper proposes a layered hidden Markov model (HMM) to recognize both short-term activity and long-term activity in real time. The first layer of HMMs detects short, primitive activities with acceleration, magnetic field, and orientation data, while the second layer exploits the inference of the previous layer and other sensor values to recognize goal-oriented activities of longer time period. Experimental results demonstrate the superior performance of the proposed method over the alternatives in classifying long-term activities as well as short-term activities. The performance improvement is up to 10 % in the experiments, depending on the models compared.  相似文献   

20.

Recently, healthcare data analysis has become an attractive research topic. Data gathering is the first step in data analysis and processing. During the collection of the data, some errors may occur due to human mistakes, devices’ errors, or the transmission process noise. The correct treatment of the missed data and outliers conserve the data size and improve the model’s performance. This paper provides two enhanced algorithms to handle missing values and outliers in big datasets. The main idea is dividing the dataset into its different classes, or clustering it by using k-means++, then calculate the average value of each part, finally replace the missed data and outliers with its corresponding part mean value. The projected imputation and outliers’ data handling algorithms are tested on a dataset called Pima Indian diabetic, which contains 2768 patients dividing into 952 diabetic and 1816 controls. Four classifiers (Random Forest, Decision Tree, Support Vector Machine, and Naïve Bayes) are used to evaluate the effect of the proposed algorithms. The results show that the proposed algorithms improve classification accuracy by 8% and decrease the RMSE by 17% over Deep Learning (DL). DL is the most powerful algorithms used in repairing the missed data.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号