期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Bayesian Method for the Induction of Probabilistic Networks from Data 总被引：108，自引：3，他引：108

Gregory F. Cooper Edward Herskovits 《Machine Learning》1992,9(4):309-347

This paper presents a Bayesian method for constructing probabilistic networks from databases. In particular, we focus on constructing Bayesian belief networks. Potential applications include computer-assisted hypothesis testing, automated scientific discovery, and automated construction of probabilistic expert systems. We extend the basic method to handle missing data and hidden (latent) variables. We show how to perform probabilistic inference by averaging over the inferences of multiple belief networks. Results are presented of a preliminary evaluation of an algorithm for constructing a belief network from a database of cases. Finally, we relate the methods in this paper to previous work, and we discuss open problems. 相似文献

2.

基于贝叶斯方法的Q型聚类算法研究

余丽《计算机与数字工程》2007,35(7):16-17

聚类分析根据类对象划分为Q型聚类和R型聚类,基于贝叶斯方法的Q型聚类算法,详细说明该算法的基本思想和具体实现过程.实验结果表明算法的可行性,该算法对于数据挖掘具有一定的参考价值. 相似文献

3.

Bayesian Networks for Data Mining 总被引：80，自引：0，他引：80

David Heckerman 《Data mining and knowledge discovery》1997,1(1):79-119

A Bayesian network is a graphical model that encodesprobabilistic relationships among variables of interest. When used inconjunction with statistical techniques, the graphical model hasseveral advantages for data modeling. One, because the model encodesdependencies among all variables, it readily handles situations wheresome data entries are missing. Two, a Bayesian network can be used tolearn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequencesof intervention. Three, because the model has both a causal andprobabilistic semantics, it is an ideal representation for combiningprior knowledge (which often comes in causal form) and data. Four,Bayesian statistical methods in conjunction with Bayesian networksoffer an efficient and principled approach for avoiding theoverfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarizeBayesian statistical methods for using data to improve these models.With regard to the latter task, we describe methods for learning boththe parameters and structure of a Bayesian network, includingtechniques for learning with incomplete data. In addition, we relateBayesian-network methods for learning to techniques for supervised andunsupervised learning. We illustrate the graphical-modeling approachusing a real-world case study. 相似文献

4.

Classification of Multivariate Time Series and Structured Data Using Constructive Induction 总被引：2，自引：0，他引：2

Mohammed Waleed Kadous Claude Sammut 《Machine Learning》2005,58(2-3):179-216

We present a method of constructive induction aimed at learning tasks involving multivariate time series data. Using metafeatures, the scope of attribute-value learning is expanded to domains with instances that have some kind of recurring substructure, such as strokes in handwriting recognition, or local maxima in time series data. The types of substructures are defined by the user, but are extracted automatically and are used to construct attributes.Metafeatures are applied to two real domains: sign language recognition and ECG classification. Using metafeatures we are able to generate classifiers that are either comprehensible or accurate, producing results that are comparable to hand-crafted preprocessing and comparable to human experts. 相似文献

5.

贝叶斯网络参数的在线学习算法及应用

张少中杨南海王秀坤《小型微型计算机系统》2004,25(10):1799-1801

以EM算法为基础，在给定贝叶斯网络结构情况下。研究分析了Voting EM算法并利用该算法对防洪决策贝叶斯网络进行在线参数学习，将该算法与EM算法的学习结果进行了比较分析，结果表明Voting EM算法不但能够进行在线参数学习，而且也具有较高的学习精度．相似文献

6.

完备数据集上贝叶斯网络结构学习研究

杜一平《计算机光盘软件与应用》2011,(14)

贝叶斯网络是用来描述不确定变量之间潜在依赖关系的图形模型。从完备数据集上学习贝叶斯网络是一个研究热点。分析了完备数据集上构建贝叶斯网的常见理论方法。相似文献

7.

Interactive Concept-Learning and Constructive Induction by Analogy 总被引：1，自引：0，他引：1

de Raedt Luc Bruynooghe Maurice 《Machine Learning》1992,8(2):107-150

The available concept-learners only partially fulfill the needs imposed by the learning apprentice generation of learners. We present a novel approach to interactive concept-learning and constructive induction that better fits the requirements imposed by the learning apprentice paradigm. The approach is incorporated in the system Clint-Cia, which integrates several user-friendly features into one working whole: it is interactive, generates examples, shifts its bias, identifies concepts in the limit, copes with indirect relevance, recovers from errors, performs constructive induction and invents new concepts by analogy to previously learned ones. 相似文献

8.

Bayesian Clustering by Dynamics 总被引：4，自引：0，他引：4

Ramoni Marco Sebastiani Paola Cohen Paul 《Machine Learning》2002,47(1):91-121

This paper introduces a Bayesian method for clustering dynamic processes. The method models dynamics as Markov chains and then applies an agglomerative clustering procedure to discover the most probable set of clusters capturing different dynamics. To increase efficiency, the method uses an entropy-based heuristic search strategy. A controlled experiment suggests that the method is very accurate when applied to artificial time series in a broad range of conditions and, when applied to clustering sensor data from mobile robots, it produces clusters that are meaningful in the domain of application. 相似文献

9.

Data Mining by Means of Binary Representation: A Model for Similarity and Clustering 总被引：1，自引：0，他引：1

Zippy Erlich Roy Gelbard Israel Spiegler 《Information Systems Frontiers》2002,4(2):187-197

In this paper we outline a new method for clustering that is based on a binary representation of data records. The binary database relates each entity to all possible attribute values (domain) that entity may assume. The resulting binary matrix allows for similarity and clustering calculation by using the positive (1 bits) of the entity vector. We formulate two indexes: Pair Similarity Index (PSI) to measure similarity between two entities and Group Similarity Index (GSI) to measure similarity within a group of entities. A threshold factor for each attribute domain is defined that is dependent on the domain but independent of the number of entities in the group. The similarity measure provides simplicity of storage and efficiency of calculation. A comparison of our similarity index to other indexes is made. Experiments with sample data indicate a 48% improvement of group similarity over standard methods pointing to the potential and merit of the binary approach to clustering and data mining. 相似文献

10.

聚类思想在贝叶斯算法中的应用

余瑞康施润身《计算机工程与应用》2006,42(28):159-160,163

在数据挖掘过程中,缺损数据是不可避免的,因此,数据预处理是必不可少的前提工作。在传统的数据预处理工作中,朴素贝叶斯算法是最常用的缺损数据修补算法。然而,现实世界中的数据经常不满足其属性独立性假设,分类结果不令人满意。文章基于聚类分析思想,提出了一种改进的贝叶斯算法。对大量数据的计算结果表明此方法的合理性、可信度优于朴素贝叶斯算法。相似文献

11.

数据聚类的共同进化方法

邹丽珊郑金华《计算机工程与应用》2004,40(18):77-79,101

共同进化算法是一种新的进化算法,由于它采用了解空间分离编码,能有效地克服一般进化算法中固有的早熟收敛问题。该文针对数据聚类问题——当前数据挖掘与探查性数据分析中的一个重要课题——将数据聚类问题抽象成为一个赋值图的分割问题,应用共同进化算法来加以解决,使得聚类的结果不必依赖于初始聚类中心,并对该算法的性能加以分析。将该算法与一般的遗传算法相比较,通过实验证明了该算法的优越性能。相似文献

12.

一种半监督K均值多关系数据聚类算法 总被引：3，自引：1，他引：3

高滢刘大有齐红刘赫《软件学报》2008,19(11)

提出了一种半监督K均值多关系数据聚类算法.该算法在K均值聚类算法的基础上扩展了其初始类簇的选择方法和对象相似性度量方法,以用于多关系数据的半监督学习.为了获取高性能,该算法在聚类过程中充分利用了标记数据、对象属性及各种关系信息.多关系数据库Movie上的实验结果验证了该算法的有效性. 相似文献

13.

面向不平衡数据的特征加权聚类算法

蒋盛益苗邦王连喜《小型微型计算机系统》2013,34(8)

不平衡数据集类别分布严重倾斜,传统的聚类算法由于以提高整体学习性能为目标,往往偏向于聚集多数类,而忽视更有价值的稀有类.本文提出一种基于迭代的特征加权聚类算法,根据当前聚类后簇的特点以及特征重要性度量函数确定特征权值,利用所得权值进行下一轮聚类,直到权值稳定后结束迭代.在多个UCI不平衡数据集上的实验效果表明,本文算法能够较好地识别出重要特征并提高它们的权重,避免聚类算法过度偏向多数类,有效地提高了聚类性能. 相似文献

14.

Learning Bayesian network parameters under incomplete data with domain knowledge

Wenhui Liao Author Vitae Qiang Ji^{Author Vitae} 《Pattern recognition》2009,42(11):3046-3056

Bayesian networks (BNs) have gained increasing attention in recent years. One key issue in Bayesian networks is parameter learning. When training data is incomplete or sparse or when multiple hidden nodes exist, learning parameters in Bayesian networks becomes extremely difficult. Under these circumstances, the learning algorithms are required to operate in a high-dimensional search space and they could easily get trapped among copious local maxima. This paper presents a learning algorithm to incorporate domain knowledge into the learning to regularize the otherwise ill-posed problem, to limit the search space, and to avoid local optima. Unlike the conventional approaches that typically exploit the quantitative domain knowledge such as prior probability distribution, our method systematically incorporates qualitative constraints on some of the parameters into the learning process. Specifically, the problem is formulated as a constrained optimization problem, where an objective function is defined as a combination of the likelihood function and penalty functions constructed from the qualitative domain knowledge. Then, a gradient-descent procedure is systematically integrated with the E-step and M-step of the EM algorithm, to estimate the parameters iteratively until it converges. The experiments with both synthetic data and real data for facial action recognition show our algorithm improves the accuracy of the learned BN parameters significantly over the conventional EM algorithm. 相似文献

15.

数据挖掘中聚类方法比较研究 总被引：4，自引：0，他引：4

王鑫王洪国王珺王金枝《微机发展》2006,16(10):20-22

数据挖掘是近年来信息产业界非常热门的研究方向,聚类分析是数据挖掘中的核心技术。聚类算法已被广泛深入地研究,其间产生了许多不同的适用于数据挖掘的聚类算法,但这些算法仅适用于特定的问题及用户。为了更好地使用这些算法,文中对数据挖掘领域的聚类分析方法及代表算法进行了分析,提出了数据挖掘对聚类的典型要求,并基于这些要求对数据挖掘中常用的聚类算法作了比较,以便于人们更容易、更快速地选择一种适用于具体问题的聚类算法。相似文献

16.

数据挖掘中聚类算法的研究简

覃艳王洪周全华《网络安全技术与应用》2014,(1):65-66

随着信息技术的不断发展,数据挖掘在我们的工作和生活中的应用也越来越广泛,目前聚类算法在数据挖掘中则是一个热点研究领域。本文深入研究了现阶段比较成熟的几种聚类算法,总结了这些算法的优缺点以及适用范围,提出用来评价聚类算法性能优劣的指标,也是今后聚类算法研究的出发点。相似文献

17.

基于一趟聚类的不平衡数据下抽样算法 总被引：1，自引：0，他引：1

蒋盛益苗邦余雯《小型微型计算机系统》2012,33(2):232-236

抽样是处理不平衡数据集的一种常用方法,其主要思想是改变类别的分布,缩小稀有类与多数类的分布比例差距.提出一种基于一趟聚类的下抽样方法,根据聚类后簇的特征与数据倾斜程度确定抽样比例,按照每个簇的抽样比例对该簇进行抽样,密度大的簇少抽,密度小的簇多抽或全抽.在压缩数据集的同时,保证了少数类的数量.实验结果表明,本文提出的抽样方法使不平衡数据样本具有较高的代表性,聚类与分类性能得到了提高. 相似文献

18.

基于MapReduce的分布式网络数据聚类算法

陈东明刘健王冬琦徐晓伟《计算机工程》2013,39(7)

时空复杂度较高以及物理机器内存不足,会导致传统聚类算法不能有效地分析处理大规模数据网络.针对该问题,在MapReduce分布式模型的基础上,提出一种网络数据分布式聚类算法.根据MRC理论设计有限MapReduce轮数,控制混洗过程所需时间,利用Map内合并技术对网络流量进行控制,在进行中间结果合并时仅对社团合并,而不考虑社团内部节点,以控制内存开销.使用模拟生成的数据在集群中进行实验,结果表明,当数据规模和集群规模增大时,该算法具有较好的加速比和扩展性. 相似文献

19.

基于单元区域的高维数据聚类算法 总被引：1，自引：0，他引：1

谢坤武毕晓玲叶斌《计算机研究与发展》2007,44(9):1618-1623

高维数据空间维数较高,数据点分布稀疏、密度平均,从中发现数据聚类比较困难,而用基于距离的方法进行高维数据聚类,维数的增多会使得计算对象间距离的时间开销增大. CAHD(clustering algorithm of high-dimensional data)算法首先采用双向搜索策略在指定的n维空间或其子空间上发现数据点密集的单元区域,然后采用逐位与的方法为这些密集单元区域进行聚类分析.双向搜索策略能够有效地减少搜索空间,从而提高算法效率,同时,聚类密集单元区域只用到逐位与和位移两种机器指令,使得算法效率得到进一步提高.算法CAHD可以有效地处理高维数据的聚类问题.基于数据集的实验表明,算法具有很好的有效性. 相似文献

20.

数据聚类算法的改进及其应用

席泓《电脑开发与应用》2006,19(11):7-8

在分析和比较k平均分区算法和层次凝聚算法的基础上,提出了一种新的改进算法(NQ算法)。并以贵州民族学院近四年学生试卷数据作为测试数据,对NQ算法与k平均分区算法和层次凝聚算法进行了性能对比,实践证明:NQ算法是有效、可靠和快速的。相似文献