首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We consider the problem of finding association rules in a database with binary attributes. Most algorithms for finding such rules assume that all the data is available at the start of the data mining session. In practice, the data in the database may change over time, with records being added and deleted. At any given time, the rules for the current set of data are of interest. The naive, and highly inefficient, solution would be to rerun the association generation algorithm from scratch following the arrival of each new batch of data. This paper describes the Borders algorithm, which provides an efficient method for generating associations incrementally, from dynamically changing databases. Experimental results show an improved performance of the new algorithm when compared with previous solutions to the problem.  相似文献   

2.
基于搜索引擎的知识发现   总被引:3,自引:0,他引:3  
数据挖掘一般用于高度结构化的大型数据库,以发现其中所蕴含的知识。随着在线文本的增多,其中所蕴含的知识也越来越丰富,但是,它们却难以被分析利用。因而,研究一套行之有效的方案发现文本中所蕴含的知识是非常重要的,也是当前重要的研究课题。该文利用搜索引擎Google获取相关Web页面,进行过滤和清洗后得到相关文本,然后,进行文本聚类,利用Episode进行事件识别和信息抽取,数据集成及数据挖掘,从而实现知识发现。最后给出了原型系统,对知识发现进行实践检验,收到了很好的效果。  相似文献   

3.
The purpose of this work is to analyse the cognitive process of the domain theories in terms of the measurement theory to develop a computational machine learning approach for implementing it. As a result, the relational data mining approach, the authors proposed in the preceding books, was improved. We present the approach as an implementation of the cognitive process as the measurement theory perceived. We analyse the cognitive process in the first part of the paper and present the theory and method of the logically most powerful empirical theory discovery in the second. The theory is based on the notion of law-like rules, which conform to all the properties of laws of nature, namely generality, simplicity, maximum refutability and minimum number of parameters. This notion is defined for deterministic and probabilistic cases. Based on the method, the discovery system is developed. The system was successfully applied to many practical tasks.  相似文献   

4.
基于知识元语义网格平台的知识发现研究   总被引:6,自引:0,他引:6  
文章讨论了Internet的增长和普及带来的科学研究对象和方法的巨大变化,科学家在跨领域合作研究过程中对数据的语义和复杂度要求越来越高,寻找关联方式发现知识成为科学家所面临的最严峻的挑战。面对这一挑战,文章提出构建知识元语义网格平台,实现以知识元为知识单位的知识发现服务体系结构。文章对基于网格的知识发现概念、网格的知识服务结构、基于知识元的语义网格知识发现基本模型作了研究。  相似文献   

5.
知识发现在CRM中的应用   总被引:7,自引:0,他引:7  
企业在广泛应用CRM(Customer Relationship Management)对客户信息进行管理的同时,如何进一步提高CRM系统数据资源利用率是本文的重要论题。本文利用知识发现的相关技术以及数据库查询语言对隐藏在大量客户信息中的知识发现算法进行描述,并在此基础上结合CRM系统给出了知识发现的求解过程和结果。  相似文献   

6.
We are obtaining a large database of some objects' records of fluctuations of a stock market,medical treatments,changes of weather in certain area and so on,where each record consists of multi-attributes taking multi-values changing with time. Our work is motivated by prediction,which is different from the work in 4,5,8,11. We want to help learn from past data and make informed decisions for the future. This paper is very significant to perfect the theory and the development of the temporal data mining.  相似文献   

7.
粗集理论对股票时间序列的知识发现   总被引:3,自引:0,他引:3  
提出了将粗集理论应用于时间序列的知识发现。知识发现的过程包括时间序列数据预处理、属性约简和规则抽取三部分。其中数据预处理主要用信号处理技术清洗数据,然后将清洗后的时间序列按照某个变量的变化趋势进行分割,分割后每个时间段内的变化趋势不变,从而将时间序列转换成为一系列静态模式(每种模式代表一种行为趋势),从而去掉其时间依赖性。把决定各种模式的相关属性抽取出来组成一个适用于粗集理论的信息表,然后采用粗集理论对信息表进行属性约简和规则抽取,所得到的规则可以用于预测时间序列在未来的行为。最后将该方法用于股票的趋势预测,取得良好效果。  相似文献   

8.
在分析与研究已有研究成果的基础上,该文提出了知识发现状态空间统一模型UMKDSS,将结构化数据挖掘与复杂类型数据挖掘联系起来,成为知识发现领域的一种统一框架理论,为复杂类型数据挖掘提供理论指导。  相似文献   

9.
Knowledge Discovery in Complex Objects   总被引:1,自引:0,他引:1  
Learning concepts and rules from structured (complex) objects is a quite challenging but very relevant problem in the area of machine learning and knowledge discovery. In order to take into account and exploit the semantic relationships that hold between atomic components of structured objects, we propose a knowledge discovery process, which starts from a set of complex objects to produce a set of related atomic objects (called contexts). The second step of the process makes use of the concatenation product to get a global context in which binary relations of individual contexts coexist with relations produced by the application of some operators to individual contexts. The last step permits the discovery of concepts and implication rules using the concept lattice as a framework in order to discover and interpret nontrivial concepts and rules that may relate different components of complex objects. This paper focuses on two main steps of the knowledge discovery process, namely data mining and interpretation.  相似文献   

10.
Parallel Algorithms for Discovery of Association Rules   总被引:2,自引:0,他引:2  
Discovery of association rules is an important data mining task. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the set of frequent itemsets (a subset of database items), thus incurring high I/O overhead. In the parallel case, most algorithms perform a sum-reduction at the end of each pass to construct the global counts, also incurring high synchronization cost. In this paper we describe new parallel association mining algorithms. The algorithms use novel itemset clustering techniques to approximate the set of potentially maximal frequent itemsets. Once this set has been identified, the algorithms make use of efficient traversal techniques to generate the frequent itemsets contained in each cluster. We propose two clustering schemes based on equivalence classes and maximal hypergraph cliques, and study two lattice traversal techniques based on bottom-up and hybrid search. We use a vertical database layout to cluster related transactions together. The database is also selectively replicated so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithms do not need any further communication or synchronization. The algorithms minimize I/O overheads by scanning the local database portion only twice. Once in the set-up phase, and once when processing the itemset clusters. Unlike previous parallel approaches, the algorithms use simple intersection operations to compute frequent itemsets and do not have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithms on various databases, and compare it against a well known parallel algorithm. The best new algorithm outperforms it by an order of magnitude.  相似文献   

11.
摘 归纳了最新的数据挖掘和知识发现方法的理论和应用进展,详细总结了研究和应用的一些关键技术,最后对数据挖掘和知识发现将来的理论发展趋势和应用趋势做出了展望。  相似文献   

12.
数据挖掘与知识发现是一个以数据库、人工智能、数理统计、可视化四大支柱技术为基础,多学科交叉、渗透、融合形成的新的交叉学科,其研究内容十分广泛。从数据挖掘与知识发现的概念开始入手,对数据挖掘技术常见的方法进行了分类讲解,同时比较了不同种方法之间的优缺点。  相似文献   

13.
随着社会的发展,人们对因果关系的研究越来越受到重视,但到目前为止基本都是从现有的知识中寻找因果关系,存在无法发现更深层次的关系和规律的缺陷。为此,我们利用有限自动机可以精确地刻画软件系统或其子系统的行为的特性,从有限自动机入手,运用知识发现的方法,针对需要解决的问题挖掘出更深层次的因果关系和规律,将知识发现理论与因果关系的研究有机结合,较系统地形成因果关系的理论和方法,建立因果状态空间,形成基于知识发现的因果自动机的初步理论框架,以解决和发现不同形态下的因果关系。  相似文献   

14.
范例推理中的知识发现技术   总被引:6,自引:0,他引:6  
范例推理中有许多相关的知识 ,相应地有知识获取过程 ,其中也存在一定程度的知识获取瓶颈问题 .本文着重探讨在范例推理系统中引入一系列可以使用的知识发现技术 ,以期提高范例推理系统的知识获取的自动化程度 ;本文针对提出的两类算法 ,进行了实验与讨论  相似文献   

15.
16.
实验提出了一种基于词频统计的蛋白质关系知识发现方法,该方法首先通过生物命名实体识别技术识别出蛋白质实体,然后统计共出现频率,形成候选实体对,从而发现最有可能的实体关联。  相似文献   

17.
实验提出了一种基于词频统计的蛋白质关系知识发现方法,该方法首先通过生物命名实体识别技术识别出蛋白质实体,然后统计共出现频率,形成候选实体对,从而发现最有可能的实体关联。  相似文献   

18.
Discovery of frequent DATALOG patterns   总被引:19,自引:0,他引:19  
Discovery of frequent patterns has been studied in a variety of data mining settings. In its simplest form, known from association rule mining, the task is to discover all frequent itemsets, i.e., all combinations of items that are found in a sufficient number of examples. The fundamental task of association rule and frequent set discovery has been extended in various directions, allowing more useful patterns to be discovered with special purpose algorithms. We present WARMR, a general purpose inductive logic programming algorithm that addresses frequent query discovery: a very general DATALOG formulation of the frequent pattern discovery problem.The motivation for this novel approach is twofold. First, exploratory data mining is well supported: WARMR offers the flexibility required to experiment with standard and in particular novel settings not supported by special purpose algorithms. Also, application prototypes based on WARMR can be used as benchmarks in the comparison and evaluation of new special purpose algorithms. Second, the unified representation gives insight to the blurred picture of the frequent pattern discovery domain. Within the DATALOG formulation a number of dimensions appear that relink diverged settings.We demonstrate the frequent query approach and its use on two applications, one in alarm analysis, and one in a chemical toxicology domain.  相似文献   

19.
粗糙集知识发现的研究现状和展望   总被引:18,自引:4,他引:14  
通过对粗糙集知识发现理论发展历史的问题,对粗糙集知识发现研究现状的探讨,结合目前主要的粗糙集知识发现系统,指出了粗糙集知识发现存在的问题,并对今后几年的研究进行了展望。  相似文献   

20.
Knowledge discovery from image data is a multi-step iterative process. This paper describes the procedure we have used to develop a knowledge discovery system that classifies regions of the ocean floor based on textural features extracted from acoustic imagery. The image is subdivided into rectangular cells called texture elements (texels); a gray-level co-occurence matrix (GLCM) is computed for each texel in four directions. Secondary texture features are then computed from the GLCM resulting in a feature vector representation of each texel instance. Alternatively, a region-growing approach is used to identify irregularly shaped regions of varying size which have a homogenous texture and for which the texture features are computed. The Bayesian classifier Autoclass is used to cluster the instances. Feature extraction is one of the major tasks in knowledge discovery from images. The initial goal of this research was to identify regions of the image characterized by sand waves. Experiments were designed to use expert judgements to select the most effective set of features, to identify the best texel size, and to determine the number of meaningful classes in the data. The region-growing approach has proven to be more successful than the texel-based approach. This method provides a fast and accurate method for identifying provinces in the ocean floor of interest to geologists.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号