首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
在错误数据集中进行知识发现是当前数据挖掘研究中的一个热点问题。以往的算法往往需要大量的先验知识或假设,同时会造成数据浪费。该文在总结以往对于错误数据预处理的一般方法的基础上,对关系型数据库常见错误类型进行了分析,提出利用数据可能性构建模糊数据库的方法对数据预处理过程中的可疑数据进行处理,并比较了该种算法与传统算法之间的优缺点。可以看出,该文的算法较好地解决了先验知识缺乏和数据浪费等问题,利用模糊化的方法来对数据库中的可疑数据进行处理是一个十分有价值的研究方向。  相似文献   

2.
基于动态剪枝的关联规则挖掘算法   总被引:13,自引:0,他引:13  
介绍了目前关联规则挖掘的研究工作 .分两个部分提出了基于动态剪枝的关联规则发现方法 .讨论了如何实施动态剪枝 ,给出了一个基于三元组结构的树式存储结构 ,在此基础上描述了交易数据库中知识发现算法 .并将提出的方法与关联规则挖掘中具有里程碑意义的 Apriori算法进行了对比分析 ,给出了相应的分析结果 ,实验表明该方法能有效地从数据集中发现关联规则  相似文献   

3.
事件序列中的知识发现研究   总被引:3,自引:1,他引:2  
事件的序列是数据的一种常见形式,其中的知识发现问题是近年来KDD的一个活跃的研究领域。本文首先给出了事件序列中知识发现问题的形式描述和框架算法,然后讨论了数据库中发现大序列的问题,具体描述了发现数据库中大序列的算法。  相似文献   

4.
相联规则模型可以用于在海量数据库中发现有价值的知识 ,各种求解相联规则的算法都需要较大的计算量 .随着时间推移 ,数据库中的数据也随之发生变化 .由于原先发现的规则已经过时 ,需要重新在数据集中挖掘规则 .本文通过向数据库中不追加数据的方法研究孕育在数据库中的相联规则变化情况 .在借鉴 FUP等算法的基础上 ,充分利用前次挖掘中获得的有关信息 ,提出用于再次挖掘的相联规则增量算法 Super FU P.该算法核心思想是更多地注重新增数据集 ,有效利用前次挖掘信息 ,仅仅对整个数据库扫描一次就能达到求解更新相联规则的目的 ,提高了相联规则增量算法的效率 .  相似文献   

5.
基于核属性依赖的属性约简算法研究   总被引:1,自引:0,他引:1  
路松峰  胡波 《计算机仿真》2007,24(4):69-71,107
数据库中的数据往往含有大量冗余或不必要的属性,严重降低了数据挖掘算法的时间效率和算法质量,因此删除数据的冗余属性和无关属性即属性约简就成了数据预处理过程中的主要任务,而粗糙集理论是处理属性约简的一个非常实用的理论工具.在深入研究粗糙集理论的基础上,结合数据库操作知识给出了基于核属性依赖的属性约简新方法.该算法能过滤掉属性集合中的无关属性和冗余属性,从而得到满意的属性约简,该算法复杂度较小.实验结果证明了该算法有效.  相似文献   

6.
粗糙集理论在对不精确、不确定和不完全的数据进行分类分析和知识获取中具有突出的优势.从粗糙集理论基本内容属性约简的理论介绍开始,讨论了属性约简的核心算法实现方法.然后使用Visual C 实现了该算法,并将其应用于林业信息管理系统中,对立地因子数据库进行数据预处理,以减少数据维度,提高数据分析的效率.最后指出属性约简算法实现方法在数据规模过大时的不足之处.  相似文献   

7.
基于粗糙集和决策树的增量式规则约简算法   总被引:2,自引:0,他引:2  
粗糙集方法是一种处理不确定或模糊知识的重要工具。传统的粗糙集模型对最简规则集的研究都是针对静态数据的,对于动态数据却显得无能为力。但在实际应用中,数据库中的数据往往是动态变化的,因此,对规则约简的增量式算法的研究是知识发现领域所急需解决的问题之一。文章给出了一种基于粗糙集和决策树的增量式规则约简算法,并与传统算法和RRIA算法进行了对比分析,实验结果表明该算法的方法和效果更好。  相似文献   

8.
数据库中的知识发现是人工智能领域的一个重要课题.本文针对时序数据中复杂模式的问题,提出了一种新的时序序列模式的逻辑表示法,并设计出一种新的时序序列建模算法.  相似文献   

9.
连续数据的分割及关联规则发现   总被引:2,自引:1,他引:1  
关联规则的挖掘是一个重要的数据挖掘问题,目前的算法主要是研究基于支持-信任框架理论的关联规则挖掘,但是基于支持-信任框架理论的关联规则只适用于交易类型的数据库,然而现实的数据库中有许多连续数据,经典的关联规则就不适用了.该文介绍一种对连续数据集进行预处理过程,即对数据库中的数据项进行距离划分,并给出基于聚类方法的算法设计思想.  相似文献   

10.
为对异构数据库中的大量孤立、没有语义描述的数据进行自动归类及本体建模,实现异构数据库数据的知识获取,提出了一个基于本体与Web服务的异构数据库知识获取框架,给出了通过Web服务包装异构数据库的访问机制,设计出贝叶斯分类器并应用该分类器对获取的异构数据自动映射到相关的本体.该方法能够通过贝叶斯分类器自动对异构数据归类,实现了异构数据库的交互知识获取.  相似文献   

11.
This paper deals with an approach to knowledge discovery in databases applied in order to identify a dynamic model of a real-existing machine. The problem considered within the paper is how to identify dynamic models suitable for model-based diagnosing of a physical object. A special attention is paid to identification on unsupervised way, while big databases collected by a SCADA system is handled.In the paper a method of identification of dynamic models of objects and processes is presented. The usefulness of the method in technical diagnostics are shown. The elaborated method of analysis of quantitative dynamic data is based on applications of accessible methods of knowledge discovery in databases. The essence of the method is to project values of considered set of attributes into the so-called multidimensional space of regressors. In order to select the subset of relevant features the genetic algorithm was used. Knowledge was induced using the support vector machines (SVM) method. The AIC measure as well as our own heuristic function were applied as evaluation criteria. The method was applied in a process of discovery of a model of changes of temperature of a pump. Within framework of the research, data gathered by means of an industrial system registering data on a peculiar object, which was deep-well pumping station, was analyzed.  相似文献   

12.
Knowledge discovery in time series databases   总被引:13,自引:0,他引:13  
Adding the dimension of time to databases produces time series databases (TSDB) and introduces new aspects and difficulties to data mining and knowledge discovery. In this correspondence, we introduce a general methodology for knowledge discovery in TSDB. The process of knowledge discovery in TSDR includes cleaning and filtering of time series data, identifying the most important predicting attributes, and extracting a set of association rules that can be used to predict the time series behavior in the future. Our method is based on signal processing techniques and the information-theoretic fuzzy approach to knowledge discovery. The computational theory of perception (CTP) is used to reduce the set of extracted rules by fuzzification and aggregation. We demonstrate our approach on two types of time series: stock-market data and weather data.  相似文献   

13.
聚类算法能从空间数据库中直接发现一些有意义的聚类结构而不需要背景知识,是空间数据发掘和知识发现的重要手段。在分析已有聚类算法的基础上,提出了一种基于数学形态学的聚类算法,该算法能够处理任意形状的聚类,采用启发式方法自动确定最优聚类数。同时,该算法也可以在矢量型空间数据库中得到实现。试验表明算法是可行和有效的,且能处理存在噪音的数据。  相似文献   

14.
This paper presents a simple, efficient computer-based method for discovering causal relationships from databases that contain observational data. Observational data is passively observed, as contrasted with experimental data. Most of the databases available for data mining are observational. There is great potential for mining such databases to discover causal relationships. We illustrate how observational data can constrain the causal relationships among measured variables, sometimes to the point that we can conclude that one variable is causing another variable. The presentation here is based on a constraint-based approach to causal discovery. A primary purpose of this paper is to present the constraint-based causal discovery method in the simplest possible fashion in order to (1) readily convey the basic ideas that underlie more complex constraint-based causal discovery techniques, and (2) permit interested readers to rapidly program and apply the method to their own databases, as a start toward using more elaborate causal discovery algorithms.  相似文献   

15.
LEARNING IN RELATIONAL DATABASES: A ROUGH SET APPROACH   总被引:49,自引:0,他引:49  
Knowledge discovery in databases, or dala mining, is an important direction in the development of data and knowledge-based systems. Because of the huge amount of data stored in large numbers of existing databases, and because the amount of data generated in electronic forms is growing rapidly, it is necessary to develop efficient methods to extract knowledge from databases. An attribute-oriented rough set approach has been developed for knowledge discovery in databases. The method integrates machine-learning paradigm, especially learning-from-examples techniques, with rough set techniques. An attribute-oriented concept tree ascension technique is first applied in generalization, which substantially reduces the computational complexity of database learning processes. Then the cause-effect relationship among the attributes in the database is analyzed using rough set techniques, and the unimportant or irrelevant attributes are eliminated. Thus concise and strong rules with little or no redundant information can be learned efficiently. Our study shows that attribute-oriented induction combined with rough set theory provide an efficient and effective mechanism for knowledge discovery in database systems.  相似文献   

16.
基于Rough集的数据挖掘模型研究   总被引:13,自引:0,他引:13  
这项工作的主要目的是表明怎样能够有效地实现基于Rough集的数据挖掘技术,在这篇论文里,我们详细讨论了Rough集理论,为了从基于Rough集的数据库中发现新的规则,研究了一种适合数据挖掘的面向对象的软件系结构,给出了数据挖掘算法、规则发现算法和规则约简算法,从初始数据库的信息出发,依次建造差别矩阵、约简表和规则表,最后给出了一个模拟实例,表明我们的模型和算法是可行的。  相似文献   

17.
阮建国  李陆军 《计算机工程》2010,36(12):232-233
针对数字视频解码芯片设计中多种视频协议的解析问题,提出一种专用微控制器设计方法。该方法采用面向视频解析的指令集,针对视频解析过程的特点对指令进行特别优化,采用配合该专用微控制器的视频解析模型,较好实现了MPEG1/2、AVS、H.264等视频协议的兼容,保证了解码效率且不会增加芯片面积和功耗。  相似文献   

18.
优势关系下属性值粗化细化时近似集分析   总被引:2,自引:1,他引:1       下载免费PDF全文
基于优势关系粗糙集模型反映属性间的偏好情况,实际上多数数据库中的数据是动态变化的。如何利用已有的信息更新近似集对于提高知识发现效率有重要意义。提出不完备信息系统在优势关系下属性值粗化细化的定义,讨论优势关系下不完备信息系统中属性值粗化细化时近似集的变化情况,对比分析优势关系下属性值粗化细化前后的粗糙近似精度和粗糙近似质量。通过实例分析验证了该方法的有效性。  相似文献   

19.
Databases developed independently in a common open distributed environment may be heterogeneous with respect to both data schema and the embedded semantics. Managing schema and semantic heterogeneities brings considerable challenges to learning from distributed data and to support applications involving cooperation between different organisations. In this paper, we are concerned mainly with heterogeneous databases that hold aggregates on a set of attributes, which are often the result of materialised views of native large-scale distributed databases. A model-based clustering algorithm is proposed to construct a mixture model where each component corresponds to a cluster which is used to capture the contextual heterogeneity among databases from different populations. Schema heterogeneity, which can be recast as incomplete information, is handled within the clustering process using Expectation-Maximisation estimation and integration is carried out within a clustering iteration. Our proposed algorithm resolves the schema heterogeneity as part of the clustering process, thus avoiding transformation of the data into a unified schema. Results of algorithm evaluation on classification, scalability and reliability, using both real and synthetic data, demonstrate that our algorithm can achieve good performance by incorporating all of the information from available heterogeneous data. Our clustering approach has great potential for scalable knowledge discovery from semantically heterogeneous databases and for applications in an open distributed environment, such as the Semantic Web.  相似文献   

20.
The main purpose of this paper is to make clear the connection between Lipski's approach to incomplete information databases and Zadeh's possibility theory which both appeared recently, quite simultaneously, but in different contexts. Lipski's approach is extended by introducing [0, 1]-valued levels of possibility in order to take into account the fact that our incomplete knowledge about properties of objects in databases may be based on soft (non-binary) information or on statistical-like data. Moreover, possibility theory, enlarged by the introduction of the concept of necessity, seems easier to manipulate than the semantics of usual modal logic used by Lipski. The problems of dependencies, of “yes-no” queries or of queries involving the cardinalities of specified sets when available information is incomplete are considered. Two illustrative examples are dealt with.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号