共查询到20条相似文献,搜索用时 15 毫秒
1.
Extended Naive Bayes classifier for mixed data 总被引:2,自引:0,他引:2
Chung-Chian Hsu Yan-Ping Huang Keng-Wei Chang 《Expert systems with applications》2008,35(3):1080-1083
Naive Bayes induction algorithm is very popular in classification field. Traditional method for dealing with numeric data is to discrete numeric attributes data into symbols. The difference of distinct discredited criteria has significant effect on performance. Moreover, several researches had recently employed the normal distribution to handle numeric data, but using only one value to estimate the population easily leads to the incorrect estimation. Therefore, the research for classification of mixed data using Naive Bayes classifiers is not very successful. In this paper, we propose a classification method, Extended Naive Bayes (ENB), which is capable for handling mixed data. The experimental results have demonstrated the efficiency of our algorithm in comparison with other classification algorithms ex. CART, DT and MLP’s. 相似文献
2.
将机器学习的理论和方法应用于气象预报领域,基于贝叶斯推理学习的理论,使用朴素贝叶斯分类器(Na(i)ve Bayes classifier)对降雨量预测问题进行了分类预测研究,提出了预测降雨量的朴素贝叶斯算法learn-and-classify--rainfall,将各预测因子及预测目标按照气象学分级标准进行分级,以历年气象数据为训练集,在训练集上学习各预测目标的先验概率及各预测因子的条件概率,用NBC计算出极大后验假设作为预测目标值,该算法具有鲁棒性强、易实现等优点,表现出较强的实用性和有效性,经实验表明,预测精度明显高于目前短期气候预测中采用的回归分析、聚类分析等其它预测方法.同时它还对困扰气象工作者的如何选择预测因子的问题具有指导作用. 相似文献
3.
A leaders set which is derived using the leaders clustering method can be used in place of a large training set to reduce the computational burden of a classifier. Recently, a fast and efficient leader-based classifier called weighted k-nearest leader-based classifier is shown by us to be an efficient and faster classifier. But, there exist some uncertainty while calculating the relative importance (weight) of the prototypes. This paper proposes a generalization over the earlier proposed k-nearest leader-based classifier where a novel soft computing approach is used to resolve the uncertainty. Combined principles of rough set theory and fuzzy set theory are used to analyze the proposed method. The proposed method called rough-fuzzy weighted k-nearest leader classifier (RF-wk-NLC) uses a two level hierarchy of prototypes along with their relative importance. RF-wk-NLC is shown by using some standard data sets to have improved performance and is compared with the earlier related methods. 相似文献
4.
5.
Due to its simplicity, efficiency and efficacy, naive Bayes (NB) continues to be one of the top 10 data mining algorithms. A mass of improved approaches to NB have been proposed to weaken its conditional independence assumption. However, there has been little work, up to the present, on instance weighting filter approaches to NB. In this paper, we propose a simple, efficient, and effective instance weighting filter approach to NB. We call it attribute (feature) value frequency-based instance weighting and denote the resulting improved model as attribute value frequency weighted naive Bayes (AVFWNB). In AVFWNB, the weight of each training instance is defined as the inner product of its attribute value frequency vector and the attribute value number vector. The experimental results on 36 widely used classification problems show that AVFWNB significantly outperforms NB, yet at the same time maintains the computational simplicity that characterizes NB. 相似文献
6.
基于改进贝叶斯算法的入侵检测方法 总被引:2,自引:0,他引:2
贝叶斯分类模型是入侵检测中用于攻击类型分类的有力工具。在总结前人成果的基础上,提出了一个改进的贝叶斯模型,对朴素贝叶斯算法进行了改进,降低了朴素贝叶斯算法的强独立性假设,提高了入侵检测的分类精度,并通过试验对算法进行了验证和性能分析。同时,指出了下一步的研究方向。 相似文献
7.
在自动文本分类系统中,特征选择是有效降低文本向量维数的一种方法.朴素贝叶斯文本分类模型是一种简单而高效的文本分类模型.提出一个新的评价函数,即互信息差值.特其用于改进的贝叶斯文本分类模型"树桩网络".结果表明,在大多数数据集上该方法具有良好的分类效果. 相似文献
8.
基于贝叶斯分类器的移动机器人避障 总被引:2,自引:1,他引:2
贝叶斯分类是一种基于贝叶斯定理的统计学分类方法,它是结合先验信息与样本信息计算出后验概率。介绍了一种基于贝叶斯分类的移动机器人避障方法。对摄像头所获得的图像进行图像分割以抽取图像的障碍物边缘信息,根据所得到的障碍物轮廓对其左右边界进行标定。叙述了朴素贝叶斯分类器在先验概率未知情况下的工作过程。基于朴素贝叶斯分类器对未知类标号的样本进行分类,从而得到机器人移动的控制指令。实验结果表明了该方法的有效性和可行性。 相似文献
9.
Toh Koon Charlie Neo Dan Ventura 《Pattern recognition letters》2012,33(1):92-102
Though the k-nearest neighbor (k-NN) pattern classifier is an effective learning algorithm, it can result in large model sizes. To compensate, a number of variant algorithms have been developed that condense the model size of the k-NN classifier at the expense of accuracy. To increase the accuracy of these condensed models, we present a direct boosting algorithm for the k-NN classifier that creates an ensemble of models with locally modified distance weighting. An empirical study conducted on 10 standard databases from the UCI repository shows that this new Boosted k-NN algorithm has increased generalization accuracy in the majority of the datasets and never performs worse than standard k-NN. 相似文献
10.
A k-means clustering algorithm for designing binary tree classifiers is introduced for the classification of cervical cells. At each nonterminal node of the designed binary tree classifier, two sets of effective feature are selected: one is based on the Bhattacharyya distance, a measure of separability between two classes; the other is based on the merits of classification accuracy. The classification result has shown the effectiveness of the features and the binary tree classifier used. 相似文献
11.
WANG Jun 《数字社区&智能家居》2008,(7)
目前我国的课程体系体现大专业、宽专业基础、多方向的原则,但面对众多课程,学生对课程间的关联关系不甚清晰,对专业选修课方向的选择非常迷茫。本文充分利用以往各届学生选择专业选修课方向的先验知识,建立适用于专业方向选择的分类模型,对需要指导的学生根据他们的学习情况进行分类预测,帮助他们合理地选择专业选修课方向。 相似文献
12.
Stochastic volatility (SV) models usually assume that the distribution of asset returns conditional on the latent volatility is normal. This article analyzes SV models with a mixture-of-normal distributions in order to compare with other heavy-tailed distributions such as the Student-t distribution and generalized error distribution (GED). A Bayesian method via Markov-chain Monte Carlo (MCMC) techniques is used to estimate parameters and Bayes factors are calculated to compare the fit of distributions. The method is illustrated by analyzing daily data from the Yen/Dollar exchange rate and the Tokyo stock price index (TOPIX). According to Bayes factors, we find that while the t distribution fits the TOPIX better than the normal, the GED and the normal mixture, the mixture-of-normal distributions give a better fit to the Yen/Dollar exchange rate than other models. The effects of the specification of error distributions on the Bayesian confidence intervals of future returns are also examined. Comparison of SV with GARCH models shows that there are cases that the SV model with the normal distribution is less effective to capture leptokurtosis than the GARCH with heavy-tailed distributions. 相似文献
13.
This article outlines a Bayesian bootstrap method for case based imprecision estimates in Bayes classification. We argue that this approach is an important complement to methods such as k-fold cross validation that are based on overall error rates. It is shown how case based imprecision estimates may be used to improve Bayes classifiers under asymmetrical loss functions. In addition, other approaches to making use of case based imprecision estimates are discussed and illustrated on two real world data sets. Contrary to the common assumption, Bayesian bootstrap simulations indicate that the uncertainty associated with the output of a Bayes classifier is often far from normally distributed. 相似文献
14.
针对入侵检测系统在实时检测能力和自适应能力方面的不足,提出了一个改进的贝叶斯分类器,通过引入滑动窗口技术改善入侵检测的实时性.同时通过所设计的性能调节器对贝叶斯分类器中参数的动态设置,实现了入侵检测系统的自适应性.改进后的贝叶斯分类器有效地实现了入侵检测的实时性、主动性和自适应性. 相似文献
15.
Sotirios P. Chatzis Author Vitae Dimitrios I. Kosmopoulos Author Vitae 《Pattern recognition》2011,44(2):295-306
The Student's-t hidden Markov model (SHMM) has been recently proposed as a robust to outliers form of conventional continuous density hidden Markov models, trained by means of the expectation-maximization algorithm. In this paper, we derive a tractable variational Bayesian inference algorithm for this model. Our innovative approach provides an efficient and more robust alternative to EM-based methods, tackling their singularity and overfitting proneness, while allowing for the automatic determination of the optimal model size without cross-validation. We highlight the superiority of the proposed model over the competition using synthetic and real data. We also demonstrate the merits of our methodology in applications from diverse research fields, such as human computer interaction, robotics and semantic audio analysis. 相似文献
16.
基于最小风险贝叶斯分类器的茶叶茶梗分类 总被引:1,自引:0,他引:1
目前在茶叶实际生产加工过程中,茶叶茶梗分拣自动化技术还处于不成熟阶段,分拣机械的精确度和效率还不能达到预期目的,必须通过再次人工分拣过程,大大增加了时间和人力成本。针对数码相机采集到的茶叶、茶梗数字图像,经过预处理后提取出样本的颜色和形状特征,并利用多元高斯模型进行建模,通过最小风险贝叶斯分类器对其进行分类。实验证明基于最小风险的贝叶斯分类器的分类方法是可行的,并取得了良好的分类效果。 相似文献
17.
Recently, mining from data streams has become an important and challenging task for many real-world applications such as credit
card fraud protection and sensor networking. One popular solution is to separate stream data into chunks, learn a base classifier
from each chunk, and then integrate all base classifiers for effective classification. In this paper, we propose a new dynamic
classifier selection (DCS) mechanism to integrate base classifiers for effective mining from data streams. The proposed algorithm
dynamically selects a single “best” classifier to classify each test instance at run time. Our scheme uses statistical information
from attribute values, and uses each attribute to partition the evaluation set into disjoint subsets, followed by a procedure
that evaluates the classification accuracy of each base classifier on these subsets. Given a test instance, its attribute
values determine the subsets that the similar instances in the evaluation set have constructed, and the classifier with the
highest classification accuracy on those subsets is selected to classify the test instance. Experimental results and comparative
studies demonstrate the efficiency and efficacy of our method. Such a DCS scheme appears to be promising in mining data streams
with dramatic concept drifting or with a significant amount of noise, where the base classifiers are likely conflictive or
have low confidence.
A preliminary version of this paper was published in the Proceedings of the 4th IEEE International Conference on Data Mining,
pp 305–312, Brighton, UK
Xingquan Zhu received his Ph.D. degree in Computer Science from Fudan University, Shanghai, China, in 2001. He spent four months with
Microsoft Research Asia, Beijing, China, where he was working on content-based image retrieval with relevance feedback. From
2001 to 2002, he was a Postdoctoral Associate in the Department of Computer Science, Purdue University, West Lafayette, IN.
He is currently a Research Assistant Professor in the Department of Computer Science, University of Vermont, Burlington, VT.
His research interests include Data mining, machine learning, data quality, multimedia computing, and information retrieval.
Since 2000, Dr. Zhu has published extensively, including over 40 refereed papers in various journals and conference proceedings.
Xindong Wu is a Professor and the Chair of the Department of Computer Science at the University of Vermont. He holds a Ph.D. in Artificial
Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems,
and Web information exploration. He has published extensively in these areas in various journals and conferences, including
IEEE TKDE, TPAMI, ACM TOIS, IJCAI, ICML, KDD, ICDM, and WWW, as well as 11 books and conference proceedings. Dr. Wu is the
Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (by the IEEE Computer Society), the founder and current Steering Committee Chair of the IEEE International Conference on
Data Mining (ICDM), an Honorary Editor-in-Chief of Knowledge and Information Systems (by Springer), and a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP).
He is the 2004 ACM SIGKDD Service Award winner.
Ying Yang received her Ph.D. in Computer Science from Monash University, Australia in 2003. Following academic appointments at the
University of Vermont, USA, she currently holds a Research Fellow at Monash University, Australia. Dr. Yang is recognized
for contributions in the fields of machine learning and data mining. She has published many scientific papers and book chapters
on adaptive learning, proactive mining, noise cleansing and discretization. Contact her at yyang@mail.csse.monash.edu.au. 相似文献
18.
Diagnosability has played an important role in the reliability of multiprocessor systems. The strongly t-diagnosable system is (t+1) diagnosable except when all of the neighbors of a node are simultaneously faulty. In this paper, we discuss the in-depth properties of diagnosability for t-regular and t-connected networks under the comparison model. We show that a t-regular and t-connected multiprocessor system with at least 2t+6 nodes, for t?4, is strongly t-diagnosable under the comparison model if the following two conditions hold: (1) the system is triangle free, and (2) there are at most t−2 common neighbors for each pair of distinct nodes in the system. 相似文献
19.
A simple learning algorithm for maximal margin classifiers (also support vector machines with quadratic cost function) is proposed. We build our iterative algorithm on top of the Schlesinger-Kozinec algorithm (S-K-algorithm) from 1981 which finds a maximal margin hyperplane with a given precision for separable data. We suggest a generalization of the S-K-algorithm (i) to the non-linear case using kernel functions and (ii) for non-separable data. The requirement in memory storage is linear to the data. This property allows the proposed algorithm to be used for large training problems.The resulting algorithm is simple to implement and as the experiments showed competitive to the state-of-the-art algorithms. The implementation of the algorithm in Matlab is available. We tested the algorithm on the problem aiming at recognition poor quality numerals. 相似文献