首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
王影  王浩  俞奎  姚宏亮 《计算机科学》2012,39(1):185-189
目前基于节点排序的贝叶斯网络分类器忽略了节点序列中已选变量和类标签之间的信息,导致分类器的准确率很难进一步提高。针对这个问题,提出了一种简单高效的贝叶斯网络分类器的学习算法:L1正则化的贝叶斯网络分类器(L1-BNC)。通过调整Lasso方法中的约束值,充分利用回归残差的信息,结合点序列中已选变量和类标签的信息,形成一条优秀的有序变量拓扑序列(L1正则化路径);基于该序列,利用K2算法生成优良的贝叶斯网络分类器。实验表明,L1-BNC在分类精度上优于已有的贝叶斯网络分类器。L1-BNC也与SVM,KNN和J48分类算法进行了比较,在大部分数据集上,L1-BNC优于这些算法。  相似文献   

2.
王中锋  王志海 《计算机学报》2012,35(2):2364-2374
通常基于鉴别式学习策略训练的贝叶斯网络分类器有较高的精度,但在具有冗余边的网络结构之上鉴别式参数学习算法的性能受到一定的限制.为了在实际应用中进一步提高贝叶斯网络分类器的分类精度,该文定量描述了网络结构与真实数据变量分布之间的关系,提出了一种不存在冗余边的森林型贝叶斯网络分类器及其相应的FAN学习算法(Forest-Augmented Naive Bayes Algorithm),FAN算法能够利用对数条件似然函数的偏导数来优化网络结构学习.实验结果表明常用的限制性贝叶斯网络分类器通常存在一些冗余边,其往往会降低鉴别式参数学习算法的性能;森林型贝叶斯网络分类器减少了结构中的冗余边,更加适合于采用鉴别式学习策略训练参数;应用条件对数似然函数偏导数的FAN算法在大多数实验数据集合上提高了分类精度.  相似文献   

3.
For learning a Bayesian network classifier, continuous attributes usually need to be discretized. But the discretization of continuous attributes may bring information missing, noise and less sensitivity to the changing of the attributes towards class variables. In this paper, we use the Gaussian kernel function with smoothing parameter to estimate the density of attributes. Bayesian network classifier with continuous attributes is established by the dependency extension of Naive Bayes classifiers. We also analyze the information provided to a class for each attributes as a basis for the dependency extension of Naive Bayes classifiers. Experimental studies on UCI data sets show that Bayesian network classifiers using Gaussian kernel function provide good classification accuracy comparing to other approaches when dealing with continuous attributes.  相似文献   

4.
Numerous models have been proposed to reduce the classification error of Na¨ ve Bayes by weakening its attribute independence assumption and some have demonstrated remarkable error performance. Considering that ensemble learning is an effective method of reducing the classification error of the classifier, this paper proposes a double-layer Bayesian classifier ensembles (DLBCE) algorithm based on frequent itemsets. DLBCE constructs a double-layer Bayesian classifier (DLBC) for each frequent itemset the new instance contained and finally ensembles all the classifiers by assigning different weight to different classifier according to the conditional mutual information. The experimental results show that the proposed algorithm outperforms other outstanding algorithms.  相似文献   

5.
Boosted Bayesian network classifiers   总被引:2,自引:0,他引:2  
The use of Bayesian networks for classification problems has received a significant amount of recent attention. Although computationally efficient, the standard maximum likelihood learning method tends to be suboptimal due to the mismatch between its optimization criteria (data likelihood) and the actual goal of classification (label prediction accuracy). Recent approaches to optimizing classification performance during parameter or structure learning show promise, but lack the favorable computational properties of maximum likelihood learning. In this paper we present boosted Bayesian network classifiers, a framework to combine discriminative data-weighting with generative training of intermediate models. We show that boosted Bayesian network classifiers encompass the basic generative models in isolation, but improve their classification performance when the model structure is suboptimal. We also demonstrate that structure learning is beneficial in the construction of boosted Bayesian network classifiers. On a large suite of benchmark data-sets, this approach outperforms generative graphical models such as naive Bayes and TAN in classification accuracy. Boosted Bayesian network classifiers have comparable or better performance in comparison to other discriminatively trained graphical models including ELR and BNC. Furthermore, boosted Bayesian networks require significantly less training time than the ELR and BNC algorithms.  相似文献   

6.
构造精确的贝叶斯网络分类器已被证明为NP难问题,提出了一种基于捕食逃逸粒子群优化(PSO)算法的通用贝叶斯网络分类器,能有效避免数据预处理时的属性约简对分类效果的直接影响,实现对贝叶斯网络结构的精确学习和搜索。另外,将所提出的分类器应用于高职院校就业预测分析,并在Weka平台上实现对该分类器的构建和验证,与其他几种贝叶斯网络分类器的对比实验结果表明,该分类器具有更好的性能。  相似文献   

7.
作为概率图模型,无限制多维贝叶斯网络分类器(GMBNC)是贝叶斯网络(BN)应用在多维分类应用时的精简模型,只包含对预测有效的局部结构.为了获得GMBNC,传统方法是先学习全局BN;为了避免全局搜索,提出了仅执行局部搜索的结构学习算法DOS-GMBNC.该算法继承了之前提出的IPC-GMBNC算法的主体框架,基于进一步挖掘的结构拓扑信息来动态调整搜索次序,以避免执行无效用的计算.实验研究验证了DOS-GMBNC算法的效果和效率:(1)该算法输出的网络质量与IPC-GMBNC一致,优于经典的PC算法;(2)在一个包含100个节点的问题中,该算法相对于PC和IPC-GMBNC算法分别节省了近89%和45%的计算量.  相似文献   

8.
In the information retrieval framework, there are problems where the goal is to recover objects of a particular class from big sets of unlabelled objects. In some of these problems, only examples from the class we want to recover are available. For such problems, the machine learning community has developed algorithms that are able to learn binary classifiers in the absence of negative examples. Among them, we can find the positive Bayesian network classifiers, algorithms that induce Bayesian network classifiers from positive and unlabelled examples. The main drawback of these algorithms is that they require some previous knowledge about the a priori probability distribution of the class. In this paper, we propose a wrapper approach to tackle the learning when no such information is available, setting this probability at the optimal value in terms of the recovery of positive examples. The evaluation of classifiers in positive unlabelled learning problems is a non-trivial question. We have also worked on this problem, and we have proposed a new guiding metric to be used in the search for the optimal a priori probability of the positive class that we have called the pseudo F. We have empirically tested the proposed metric and the wrapper classifiers on both synthetic and real-life datasets. The results obtained in this empirical comparison show that the wrapper Bayesian network classifiers provide competitive results, particularly when the actual a priori probability of the positive class is high.  相似文献   

9.
In this paper, we treat the problem of combining fingerprint and speech biometric decisions as a classifier fusion problem. By exploiting the specialist capabilities of each classifier, a combined classifier may yield results which would not be possible in a single classifier. The Feedforward Neural Network provides a natural choice for such data fusion as it has been shown to be a universal approximator. However, the training process remains much to be a trial-and-error effort since no learning algorithm can guarantee convergence to optimal solution within finite iterations. In this work, we propose a network model to generate different combinations of the hyperbolic functions to achieve some approximation and classification properties. This is to circumvent the iterative training problem as seen in neural networks learning. In many decision data fusion applications, since individual classifiers or estimators to be combined would have attained a certain level of classification or approximation accuracy, this hyperbolic functions network can be used to combine these classifiers taking their decision outputs as the inputs to the network. The proposed hyperbolic functions network model is first applied to a function approximation problem to illustrate its approximation capability. This is followed by some case studies on pattern classification problems. The model is finally applied to combine the fingerprint and speaker verification decisions which show either better or comparable results with respect to several commonly used methods.  相似文献   

10.
为了提高贝叶斯分类器的分类性能,针对贝叶斯网络分类器的构成特征,提出一种基于参数集成的贝叶斯分类器判别式参数学习算法PEBNC。该算法将贝叶斯分类器的参数学习视为回归问题,将加法回归模型应用于贝叶斯网络分类器的参数学习,实现贝叶斯分类器的判别式参数学习。实验结果表明,在大多数实验数据上,PEBNC能够明显提高贝叶斯分类器的分类准确率。此外,与一般的贝叶斯集成分类器相比,PEBNC不必存储成员分类器的参数,空间复杂度大大降低。  相似文献   

11.
用Matlab语言建构贝叶斯分类器   总被引:2,自引:1,他引:2  
文本分类是文本挖掘的基础与核心,分类器的构建是文本分类的关键,利用贝叶斯网络可以构造出分类性能较好的分类器。文中利用Matlab构造出了两种分类器:朴素贝叶斯分类器NBC,用互信息测度和条件互信息测度构建了TANC。用UCI上下载的标准数据集验证所构造的分类器,实验结果表明,所建构的几种分类器的性能总体比文献中列的高些,从而表明所建立的分类器的有效性和正确性。笔者对所建构的分类器进行优化并应用于文本分类中。  相似文献   

12.
基于集成聚类的流量分类架构   总被引:1,自引:0,他引:1  
鲁刚  余翔湛  张宏莉  郭荣华 《软件学报》2016,27(11):2870-2883
流量分类是优化网络服务质量的基础与关键.机器学习算法利用数据流统计特征分类流量,对于识别加密私有协议流量具有重要意义.然而,特征偏置和类别不平衡是基于机器学习的流量分类研究所面临的两大挑战.特征偏置是指一些数据流统计特征在提高部分应用识别准确率的同时也降低了另外一部分应用识别的准确率.类别不平衡是指机器学习流量分类器对样本数较少的应用识别的准确率较低.为解决上述问题,提出了基于集成聚类的流量分类架构(traffic classification framework based on ensemble clustering,简称TCFEC).TCFEC由多个基于不同特征子空间聚类的基分类器和一个最优决策部件构成,能够提高流量分类的准确率.具体而言,与传统的机器学习流量分类器相比,TCFEC的平均流准确率最高提升5%,字节准确率最高提升6%.  相似文献   

13.
This paper describes the modeling of a weed infestation risk inference system that implements a collaborative inference scheme based on rules extracted from two Bayesian network classifiers. The first Bayesian classifier infers a categorical variable value for the weed–crop competitiveness using as input categorical variables for the total density of weeds and corresponding proportions of narrow and broad-leaved weeds. The inferred categorical variable values for the weed–crop competitiveness along with three other categorical variables extracted from estimated maps for the weed seed production and weed coverage are then used as input for a second Bayesian network classifier to infer categorical variables values for the risk of infestation. Weed biomass and yield loss data samples are used to learn the probability relationship among the nodes of the first and second Bayesian classifiers in a supervised fashion, respectively. For comparison purposes, two types of Bayesian network structures are considered, namely an expert-based Bayesian classifier and a naïve Bayes classifier. The inference system focused on the knowledge interpretation by translating a Bayesian classifier into a set of classification rules. The results obtained for the risk inference in a corn-crop field are presented and discussed.  相似文献   

14.
Bayesian networks are graphical models that describe dependency relationships between variables, and are powerful tools for studying probability classifiers. At present, the causal Bayesian network learning method is used in constructing Bayesian network classifiers while the contribution of attribute to class is over-looked. In this paper, a Bayesian network specifically for classification-restricted Bayesian classification networks is proposed. Combining dependency analysis between variables, classification accuracy evaluation criteria and a search algorithm, a learning method for restricted Bayesian classification networks is presented. Experiments and analysis are done using data sets from UCI machine learning repository. The results show that the restricted Bayesian classification network is more accurate than other well-known classifiers.  相似文献   

15.
摘 要: 多维分类根据数据实例的特征向量将数据实例在多个维度上进行分类,具有广泛的应用前景。在多维分类算法的模型学习过程中,海量的训练数据使得准确的分类算法需要很长的模型训练时间。为了提高多维分类的执行效率,同时保持高的预测准确性,本文提出了一种基于贝叶斯网络的多维分类学习方法。首先,将多维分类问题描述为条件概率分布问题。其次,根据类别向量之间的依赖关系建立了条件树贝叶斯网络模型。最后,根据训练数据集对条件树贝叶斯网络模型的结构和参数进行学习,并提出了一种多维分类预测算法。大量的真实数据集实验表明,本文提出的方法与当前最好的多维分类算法MMOC相比,在保持高准确性的同时将模型的训练时间降低了两个数量级。因此,本文提出的方法更适用于海量数据的多维分类应用中。  相似文献   

16.
An external context like weather conditions, lighting, etc. influences classification results, but it is frequently omitted in a mathematical model of the problem at hand. Our aim is to propose a mathematical model, which extends the Bayesian problem of pattern recognition by incorporating external context variables. They are implanted as functions, which influence parameters of class distributions. We prove that context variables influence a shape or a position of the optimal class separating surface, without enlarging the dimensionality of a pattern space. Thus, one can treat the proposed extended Bayesian model as a fusion of patterns and external context variables, embedded into the same pattern space. Then, learning algorithms for neural network classifiers are proposed, which take context variables into account.  相似文献   

17.
Machine learning techniques have been widely applied to solve the classification problem of highly dimensional and complex data in the field of bioinformatics. Among them, Bayesian regularized neural network (BRNN) became one of the popular choices due to its robustness and ability to avoid over fitting. On the other hand, Bayesian approach applied to neural network training offers computational burden and increases its time complexity. This restricts the use of BRNN in an on-line machine learning system. In this article, a Bayesian regularized neural network decision Tree (BrNdT) ensemble model, is proposed to combat high computational time complexity of a classifier model. The key idea behind the proposed ensemble methodology is to weigh and combine several individual classifiers and apply majority voting decision scheme to obtain an efficient classifier which outperforms each one of them. The simulation results show that the proposed method achieves a significant reduction in time complexity and maintains high accuracy over other conventional techniques.  相似文献   

18.
Twitter and Reddit are two of the most popular social media sites used today. In this paper, we study the use of machine learning and WordNet-based classifiers to generate an interest profile from a user’s tweets and use this to recommend loosely related Reddit threads which the reader is most likely to be interested in. We introduce a genre classification algorithm using a similarity measure derived from WordNet lexical database for English to label genres for nouns in tweets. The proposed algorithm generates a user’s interest profile from their tweets based on a referencing taxonomy of genres derived from the genre-tagged Brown Corpus augmented with a technology genre. The top K genres of a user’s interest profile can be used for recommending subreddit articles in those genres. Experiments using real life test cases collected from Twitter have been done to compare the performance on genre classification by using the WordNet classifier and machine learning classifiers such as SVM, Random Forests, and an ensemble of Bayesian classifiers. Empirically, we have obtained similar results from the two different approaches with a sufficient number of tweets. It seems that machine learning algorithms as well as the WordNet ontology are viable tools for developing recommendation engine based on genre classification. One advantage of the WordNet approach is simplicity and no learning is required. However, the WordNet classifier tends to have poor precision on users with very few tweets.  相似文献   

19.
产生式方法和判别式方法是解决分类问题的两种不同框架,具有各自的优势。为利用两种方法各自的优势,文中提出一种产生式与判别式线性混合分类模型,并设计一种基于遗传算法的产生式与判别式线性混合分类模型的学习算法。该算法将线性混合分类器混合参数的学习看作一个最优化问题,以两个基分类器对每个训练数据的后验概率值为数据依据,用遗传算法找出线性混合分类器混合参数的最优值。实验结果表明,在大多数数据集上,产生式与判别式线性混合分类器的分类准确率优于或近似于它的两个基分类器中的优者。  相似文献   

20.
张丹  杨斌  张瑞禹 《遥感信息》2009,(5):41-43,55
在遥感影像分类应用中,不同分类器的分类精度是不同的,而同一分类器对不同类别的分类精度也是不相同的。多分类器结合的思想就是利用现有分类器之间的互补性,通过适当的方法将不同的分类器之间进行优势互补,往往可以得到比单个分类器更好的分类结果。本文研究了如何在Matlab下采用最短距离分类器、贝叶斯分类器、BP神经网络分类器对影像进行分类,并采用投票法进行多种分类器结合的遥感影像分类,最后进行分类后处理。实验结果表明多分类器结合的遥感影像分类比单一分类器分类的精度高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号