首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Privacy-preserving Naïve Bayes classification   总被引:1,自引:0,他引:1  
Privacy-preserving data mining—developing models without seeing the data – is receiving growing attention. This paper assumes a privacy-preserving distributed data mining scenario: data sources collaborate to develop a global model, but must not disclose their data to others. The problem of secure distributed classification is an important one. In many situations, data is split between multiple organizations. These organizations may want to utilize all of the data to create more accurate predictive models while revealing neither their training data/databases nor the instances to be classified. Naïve Bayes is often used as a baseline classifier, consistently providing reasonable classification performance. This paper brings privacy-preservation to that baseline, presenting protocols to develop a Naïve Bayes classifier on both vertically as well as horizontally partitioned data.  相似文献   

2.
The generalized Dirichlet distribution has been shown to be a more appropriate prior than the Dirichlet distribution for naïve Bayesian classifiers. When the dimension of a generalized Dirichlet random vector is large, the computational effort for calculating the expected value of a random variable can be high. In document classification, the number of distinct words that is the dimension of a prior for naïve Bayesian classifiers is generally more than ten thousand. Generalized Dirichlet priors can therefore be inapplicable for document classification from the viewpoint of computational efficiency. In this paper, some properties of the generalized Dirichlet distribution are established to accelerate the calculation of the expected values of random variables. Those properties are then used to construct noninformative generalized Dirichlet priors for naïve Bayesian classifiers with multinomial models. Our experimental results on two document sets show that generalized Dirichlet priors can achieve a significantly higher prediction accuracy and that the computational efficiency of naïve Bayesian classifiers is preserved.  相似文献   

3.
Fingerprint classification reduces the number of possible matches in automated fingerprint identification systems by categorizing fingerprints into predefined classes. Support vector machines (SVMs) are widely used in pattern classification and have produced high accuracy when performing fingerprint classification. In order to effectively apply SVMs to multi-class fingerprint classification systems, we propose a novel method in which the SVMs are generated with the one-vs-all (OVA) scheme and dynamically ordered with na?¨ve Bayes classifiers. This is necessary to break the ties that frequently occur when working with multi-class classification systems that use OVA SVMs. More specifically, it uses representative fingerprint features as the FingerCode, singularities and pseudo ridges to train the OVA SVMs and na?¨ve Bayes classifiers. The proposed method has been validated on the NIST-4 database and produced a classification accuracy of 90.8% for five-class classification with the statistical significance. The results show the benefits of integrating different fingerprint features as well as the usefulness of the proposed method in multi-class fingerprint classification.  相似文献   

4.
Multimedia Tools and Applications - Twitter is a social media platform which has been proven to be a great tool for insights of emotions about products, policies etc. through a 280-character...  相似文献   

5.
Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naïve Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of text classification. We will classify the training cases with the Naïve Bayes Classifier and set different confidence threshold values for different class association rules (CARs) to different classes by the obtained classification accuracy rate of the Naïve Bayes Classifier to the classes. Since the accuracy rates of all selected CARs of the class are higher than that obtained by the Naïve Bayes Classifier, we could further optimize the classification result through these selected CARs. Moreover, for those unclassified cases, we will classify them with the Naïve Bayes Classifier. The experimental results show that combining the advantages of these two different classifiers better classification result can be obtained than with a single classifier.  相似文献   

6.
The Naive Bayes (NB) learning algorithm is simple and effective in many domains including text classification. However, its performance depends on the accuracy of the estimated conditional probability terms. Sometimes these terms are hard to be accurately estimated especially when the training data is scarce. This work transforms the probability estimation problem into an optimization problem, and exploits three metaheuristic approaches to solve it. These approaches are Genetic Algorithms (GA), Simulated Annealing (SA), and Differential Evolution (DE). We also propose a novel DE algorithm that uses multi-parent mutation and crossover operations (MPDE) and three different methods to select the final solution. We create an initial population by manipulating the solution generated by a method used for fine tuning the NB. We evaluate the proposed methods by using their resulted solutions to build NB classifiers and compare their results with the results of obtained from classical NB and Fine-Tuning Naïve Bayesian (FTNB) algorithm, using 53 UCI benchmark data sets. We name these obtained classifiers NBGA, NBSA, NBDE, and NB-MPDE respectively. We also evaluate the performance NB-MPDE for text-classification using 18 text-classification data sets, and compare its results with the results of obtained from FTNB, BNB, and MNB. The experimental results show that using DE in general and the proposed MPDE algorithm in particular are more convenient for fine-tuning NB than all other methods, including the other two metaheuristic methods (GA, and SA). They also indicate that NB-MPDE achieves superiority over classical NB, FTNB, NBDE, NBGA, NBSA, MNB, and BNB.  相似文献   

7.
Aljemely  Anas H.  Xuan  Jianping  Xu  Long  Jawad  Farqad K. J.  Al-Azzawi  Osama 《Applied Intelligence》2021,51(10):6932-6950
Applied Intelligence - Fault identification is a vital task to ensure the integrity and reliability of rotating machinery. The vibration signals produced by the defective system components...  相似文献   

8.
Previous studies have shown that the classification accuracy of a Naïve Bayes classifier in the domain of text-classification can often be improved using binary decompositions such as error-correcting output codes (ECOC). The key contribution of this short note is the realization that ECOC and, in fact, all class-based decomposition schemes, can be efficiently implemented in a Naïve Bayes classifier, so that—because of the additive nature of the classifier—all binary classifiers can be trained in a single pass through the data. In contrast to the straight-forward implementation, which has a complexity of O(n?t?g), the proposed approach improves the complexity to O((n+t)?g). Large-scale learning of ensemble approaches with Naïve Bayes can benefit from this approach, as the experimental results shown in this paper demonstrate.  相似文献   

9.
In most of the industries related to mechanical engineering, the usage of pumps is high. Hence, the system which takes care of the continuous running of the pump becomes essential. In this paper, a vibration based condition monitoring system is presented for monoblock centrifugal pumps as it plays relatively critical role in most of the industries. This approach has mainly three steps namely feature extraction, classification and comparison of classification. In spite of availability of different efficient algorithms for fault detection, the wavelet analysis for feature extraction and Naïve Bayes algorithm and Bayes net algorithm for classification is taken and compared. This paper presents the use of Naïve Bayes algorithm and Bayes net algorithm for fault diagnosis through discrete wavelet features extracted from vibration signals of good and faulty conditions of the components of centrifugal pump. The classification accuracies of different discrete wavelet families were calculated and compared to find the best wavelet for the fault diagnosis of the centrifugal pump.  相似文献   

10.
In a large number of experimental problems, high dimensionality of the search area and economical constraints can severely limit the number of experimental points that can be tested. Within these constraints, classical optimization techniques perform poorly, in particular, when little a priori knowledge is available. In this work we investigate the possibility of combining approaches from statistical modeling and bio-inspired algorithms to effectively explore a huge search space, sampling only a limited number of experimental points. To this purpose, we introduce a novel approach, combining ant colony optimization (ACO) and naïve Bayes classifier (NBC) that is, the naïve Bayes ant colony optimization (NACO) procedure. We compare NACO with other similar approaches developing a simulation study. We then derive the NACO procedure with the goal to design artificial enzymes with no sequence homology to the extant one. Our final aim is to mimic the natural fold of 200 amino acids 1AGY serine esterase from Fusarium solani.  相似文献   

11.
It is well known that call centers suffer from high levels of employee turnover; however, call centers are services that have excellent operational records of telemarketing activities performed by each employee. With this information, we propose to use the Random Forest and the naïve Bayes algorithms to build classifiers and predict turnover of the sales agents. The results of 2407 sales agents’ operational performance records showed that, although the naïve Bayes is much simpler than Random Forest, both classifiers performed similarly, achieving interesting accuracy rates in turnover prediction. Moreover, evidence was found that incorporating performance differences over time increases significantly the accuracy of the predictive models up to 85%, with the naïve Bayes being quite competitive with the Random Forest classifier when the amount of information is increased. The results obtained in this study could be useful for management decision-making to monitor and identify potential turnover due to poor performance, and therefore, to take a preventive action.  相似文献   

12.
13.
Since naïve Bayesian classifiers are suitable for processing discrete attributes, many methods have been proposed for discretizing continuous ones. However, none of the previous studies apply more than one discretization method to the continuous attributes in a data set for naïve Bayesian classifiers. Different approaches employ different information embedded in continuous attributes to determine the boundaries for discretization. It is likely that discretizing the continuous attributes in a data set using different methods can utilize the information embedded in the attributes more thoroughly and thus improve the performance of naïve Bayesian classifiers. In this study, we propose a nonparametric measure to evaluate the dependence level between a continuous attribute and the class. The nonparametric measure is then used to develop a hybrid method for discretizing continuous attributes so that the accuracy of the naïve Bayesian classifier can be enhanced. This hybrid method is tested on 20 data sets, and the results demonstrate that discretizing the continuous attributes in a data set by various methods can generally have a higher prediction accuracy.  相似文献   

14.
Multimedia Tools and Applications - In recent years, researchers have been trying to create recommender systems. There are many different recommender systems. Point of Interest (POI) is a new type...  相似文献   

15.
With increasing Internet connectivity and traffic volume, recent intrusion incidents have reemphasized the importance of network intrusion detection systems for combating increasingly sophisticated network attacks. Techniques such as pattern recognition and the data mining of network events are often used by intrusion detection systems to classify the network events as either normal events or attack events. Our research study claims that the Hidden Naïve Bayes (HNB) model can be applied to intrusion detection problems that suffer from dimensionality, highly correlated features and high network data stream volumes. HNB is a data mining model that relaxes the Naïve Bayes method’s conditional independence assumption. Our experimental results show that the HNB model exhibits a superior overall performance in terms of accuracy, error rate and misclassification cost compared with the traditional Naïve Bayes model, leading extended Naïve Bayes models and the Knowledge Discovery and Data Mining (KDD) Cup 1999 winner. Our model performed better than other leading state-of-the art models, such as SVM, in predictive accuracy. The results also indicate that our model significantly improves the accuracy of detecting denial-of-services (DoS) attacks.  相似文献   

16.
Naïve–Bayes Classifier (NBC) is widely used for classification in machine learning. It is considered as the first choice for many classification problems because of its simplicity and classification accuracy as compared to other supervised learning methods. However, for high dimensional data like gene expression data, it does not perform well due to two major limitations i.e. underflow and overfitting. In order to address the problem of underflow, the existing approach adopted is to add the logarithms of probabilities rather than multiplying probabilities and the estimate approach is used for providing solution to overfitting problem. However, in practice for gene expression data, these approaches do not perform well. In this paper, a novel approach has been proposed to overcome the limitations using a robust function for estimating probabilities in Naïve–Bayes Classifier. The proposed method not only resolves the limitation of NBC but also improves the classification accuracy for gene expression data. The method has been tested over several benchmark gene expression datasets of high dimension. Comparative results of proposed Robust Naïve–Bayes Classifier (R-NBC) and existing NBC for gene expression data have also been illustrated to highlight the effectiveness of the R-NBC. Simulation study has also been performed to depict the robustness of the R-NBC over the existing approaches.  相似文献   

17.
Since approximately 90% of the people with PD (Parkinson’s disease) suffer from speech disorders including disorders of laryngeal, respiratory and articulatory function, using voice analysis disease can be diagnosed remotely at an early stage with more reliability and in an economic way. All previous works are done to distinguish healthy people from people with Parkinson’s disease (PWP). In this paper, we propose to go further by multiclass classification with three classes of Parkinson stages and healthy control. So we have used 40 features dataset, all the features are analyzed and 9 features are selected to classify PWP subjects in four classes, based on unified Parkinson’s disease Rating Scale (UPDRS). Various classifiers are used and their comparison is done to find out which one gives the best results. Results show that the subspace discriminant reach more than 93% overall classification accuracy.  相似文献   

18.
The abundance of unlabelled data alongside limited labelled data has provoked significant interest in semi-supervised learning methods. “Naïve labelling” refers to the following simple strategy for using unlabelled data in on-line classification. A new data point is first labelled by the current classifier and then added to the training set together with the assigned label. The classifier is updated before seeing the subsequent data point. Although the danger of a run-away classifier is obvious, versions of naïve labelling pervade in on-line adaptive learning. We study the asymptotic behaviour of naïve labelling in the case of two Gaussian classes and one variable. The analysis shows that if the classifier model assumes correctly the underlying distribution of the problem, naïve labelling will drive the parameters of the classifier towards their optimal values. However, if the model is not guessed correctly, the benefits are outweighed by the instability of the labelling strategy (run-away behaviour of the classifier). The results are based on exact calculations of the point of convergence, simulations, and experiments with 25 real data sets. The findings in our study are consistent with concerns about general use of unlabelled data, flagged up in the recent literature.  相似文献   

19.
Biometric-based approaches, including keystroke dynamics on keyboards, mice, and mobile devices, have incorporated machine learning algorithms to learn users’ typing behavior for authentication systems. Among the machine learning algorithms, one-class naïve Bayes (ONENB) has been shown to be effective when it is applied to anomaly tests; however, there have been few studies on applying the ONENB algorithm to keystroke dynamics-based authentication. We applied the ONENB algorithm to calculate the likelihood of attributes in keystroke dynamics data. Additionally, we propose the speed inspection in typing skills (SITS) algorithm designed from the observation that every person has a different typing speed on specific keys. These specific characteristics, also known as the keystroke’s index order, can be used as essential patterns for authentication systems to distinguish between a genuine user and imposter. To further evaluate the effectiveness of the SITS algorithm and examine the quality of each attribute type (e.g., dwell time and flight time), we investigated the influence of attribute types on the keystroke’s index order. From the experimental results of the proposed algorithms and their combination, we observed that the shortest/longest time attributes and separation of the attributes are useful for enhancing the performance of the proposed algorithms.  相似文献   

20.
The prior distribution of an attribute in a naïve Bayesian classifier is typically assumed to be a Dirichlet distribution, and this is called the Dirichlet assumption. The variables in a Dirichlet random vector can never be positively correlated and must have the same confidence level as measured by normalized variance. Both the generalized Dirichlet and the Liouville distributions include the Dirichlet distribution as a special case. These two multivariate distributions, also defined on the unit simplex, are employed to investigate the impact of the Dirichlet assumption in naïve Bayesian classifiers. We propose methods to construct appropriate generalized Dirichlet and Liouville priors for naïve Bayesian classifiers. Our experimental results on 18 data sets reveal that the generalized Dirichlet distribution has the best performance among the three distribution families. Not only is the Dirichlet assumption inappropriate, but also forcing the variables in a prior to be all positively correlated can deteriorate the performance of the naïve Bayesian classifier.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号