首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
本文研究了事件序列中情节的发现问题,提出了在事件序列中发现频繁串行情节的增量式算法。如果在事件序列中发现了频繁情节及其出现频率,我们就可以生成描述或预测该序列行为的情节规则。  相似文献   

2.
本文研究事件序列中频繁情节的发现问题,提出了在事件序列中发现频繁串行情节的增量式算法.如果在事件序列中发现了频繁情节及其出现频率,我们就可以生成描述或预测该序列行为的情节规则.  相似文献   

3.
相联规则发现的一般性算法研究   总被引:4,自引:0,他引:4  
大型事务数据库中相联规则的发现是KDD中一个很重要的问题。本文描述了相联规则发现的一般性算法,对其核心问题进行了全面和较深入的探讨,并提出了一些提高算法效率的方法。  相似文献   

4.
本文对如何构建一个全新的解析智能系统进行了描述。该系统的特点是把基于化学计量学的解析算法和专家系统柔性集成到一起,并把数据库知识发现(KDD)技术和专家数据库(ED)技术作为构建专家系统知识库的核心工具。利用解析算法可使复杂化学体系简化,结合改进的专家系统可实现化学体系定性定量及结构解析的智能化。  相似文献   

5.
基于时序数据的模式发现算法研究   总被引:1,自引:1,他引:0  
数据库中的知识发现是人工智能领域的一个重要课题,该文针对时序数据中复杂模式的问题,提出了一种新的时序序列模式的逻辑表示法,并设计出一种新的时序序列建模算法。  相似文献   

6.
数据库中的知识发现是人工智能领域的一个重要课题.本文针对时序数据中复杂模式的问题,提出了一种新的时序序列模式的逻辑表示法,并设计出一种新的时序序列建模算法.  相似文献   

7.
时间序列是信息系统一储存在的一类重要数据对象,而序列间的距离计算是很多时间序列数据开采或数据提取问题的核心。针对目前的序列距离定义模型对非总体的细微关联特征不敏感的问题,提出了一种新的时间序列距离定义模-时间序列的细微距离MD(X,Y),并提出了一种将时间序列由时域映射到频域,在频域中分离出不同的序列变化形式,以确定时间序列细微差别程度的算法-FDD算法。FDD算法具有较高的效率,且可以 作基准值  相似文献   

8.
将正则最小二乘前馈网络学习算法应用干时间序列的知识发现。正则最小二乘算法将正则化网络和节点删除算法结合起来,大大提高了前馈网络的泛化性能。将其应用于股票时间序列数据库的暂态规则的知识发现.发现过程包括时间序列数据库预处理和数据挖掘(规则发现)两部分。实验结果表明预测效果良好。  相似文献   

9.
基于集合理论的KDD方法   总被引:3,自引:1,他引:2  
本文运用集合理论对KDD进行了描述,介绍了基于集合理论从数据库中发现分类规则的方法,并给出了实现算法和一个应用实例  相似文献   

10.
数据挖掘领域的一个活跃分支就是序列模式的发现,即在序列数据库中找出所有的频繁子序列。介绍序列模式挖掘的基本概念,然后对序列模式中的经典算法PrefixSpan算法和基于PrefixSpan框架的闭合序列模式CloSpan算法进行了描述,并对它们的执行过程及其特点进行了分析与比较,总结了各自的优缺点,指出PrefixSpan算法适用于短序列方面挖掘,而CloSpan算法在长序列或者阈值较低时胜过PrefixSpan算法且CloSpan算法挖掘大型的数据库有更好的性能,得出的结果对序列模式挖掘的设计有重要的参考价值。  相似文献   

11.
We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller suffix-based classes. Each class can be solved in main-memory using efficient search techniques and simple join operations. Furthermore, each class can be solved independently on each processor requiring no synchronization. However, dynamic interclass and intraclass load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup and excellent scaleup results.  相似文献   

12.
Mining useful information and helpful knowledge from large databases has evolved into an important research area in recent years. Among the classes of knowledge derived, finding sequential patterns in temporal transaction databases is very important since it can help model customer behavior. In the past, researchers usually assumed databases were static to simplify data-mining problems. In real-world applications, new transactions may be added into databases frequently. Designing an efficient and effective mining algorithm that can maintain sequential patterns as a database grows is thus important. In this paper, we propose a novel incremental mining algorithm for maintaining sequential patterns based on the concept of pre-large sequences to reduce the need for rescanning original databases. Pre-large sequences are defined by a lower support threshold and an upper support threshold that act as gaps to avoid the movements of sequences directly from large to small and vice versa. The proposed algorithm does not require rescanning original databases until the accumulative amount of newly added customer sequences exceeds a safety bound, which depends on database size. Thus, as databases grow larger, the numbers of new transactions allowed before database rescanning is required also grow. The proposed approach thus becomes increasingly efficient as databases grow.  相似文献   

13.
Mining sequential patterns from large databases has been recognized by many researchers as an attractive task of data mining and knowledge discovery.Previous algorithms scan the databases for many times,which is often unendurable due to the very large amount of databases.In this paper,the authors introduce an effective algorithm for mining sequential patterns from large databases.In the algorithm,the original database is not used at all for counting the support of sequences after the first pass.Rather,a tidlist structure generated in the previous pass is employed for the purpose based on set intersection operations,avoiding the multiple scans of the databases.  相似文献   

14.
基于频繁模式树的关联规则增量式更新算法   总被引:48,自引:1,他引:48  
研究了大型事务数据库中关联规则的增量式更新总是,提出了一种基于频繁模式树的关联规则增量式更新算法,以处理最小支持度或事务数据库发生变化后相应关联规则的更新问题,并对其性能进行了分析。  相似文献   

15.
空间数据库的聚类方法   总被引:4,自引:0,他引:4  
1 引言近年来,数据库的数量和单个数据库的容量都大大增长了。比如,空间物体数据库包括几十亿个望远镜图像,NASA地球观测系统每小时都会产生50GB的数据。这么大的数据量已经远远超出了人为分析解释的能力范围。数据库中的知识发现(KDD)是识别数据中有价值的、新的、潜在有用的、可理解的模式的一  相似文献   

16.
Chien-Yu  Wen-Chin  Chung-Tsai   《Pattern recognition》2006,39(12):2356-2369
In the field of proteomics, protein hierarchies based on sequence analysis have been extensively applied to automate the annotations of new proteins and facilitate the discovery and analysis of protein families. However, the presence of ambiguous similarities in large databases increases the difficulty of delivering protein family hierarchies with favorable sensitivity and specificity. This work develops the HomoClust algorithm that exploits the homogeneity of protein sequences in generating protein family hierarchies. HomoClust improves the clustering quality of traditional hierarchical clustering algorithms by adopting different clustering mechanisms for different levels of sequence similarity. With considering homogeneity detection during clustering process, HomoClust increases the sensitivity of protein clusters without a drop in high specificity.  相似文献   

17.
无重复投影数据库扫描的序列模式挖掘算法   总被引:5,自引:0,他引:5  
序列模式挖掘在Web点击流分析、自然灾害预测、DNA和蛋白质序列模式发现等领域有着广泛应用.基于频繁模式增长的PrefixSpan是目前性能最好的序列模式挖掘算法之一.然而在密数据集和长序列模式挖掘过程中会出现大量的重复投影数据库,使得这类算法性能下降.算法SPMDS通过对投影数据库的伪投影做单项杂凑函数,如MD5等,检查是否存在重复的投影数据库,避免大量重复数据库的扫描,并采用一些必要条件简化投影数据库的搜索,进而提高算法的性能.实验和分析都表明SPMDS性能优于PrefixSpan.  相似文献   

18.
Approaches for scaling DBSCAN algorithm to large spatial databases   总被引:7,自引:0,他引:7       下载免费PDF全文
The huge amount of information stored in datablases owned by coporations(e.g.retail,financial,telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining.Clustering.in data mining,is a useful technique for discovering intersting data distributions and patterns in the underlying data,and has many application fields,such as statistical data analysis,pattern recognition,image processsing,and other business application,s Although researchers have been working on clustering algorithms for decades,and a lot of algorithms for clustering have been developed,there is still no efficient algorithm for clustering very large databases and high dimensional data,As an outstanding representative of clustering algorithms,DBSCAN algorithm shows good performance in spatial data clustering.However,for large spatial databases,DBSCAN requires large volume of memory supprot and could incur substatial I/O costs because it operates directly on the entrie database,In this paper,several approaches are proposed to scale DBSCAN algorithm to large spatial databases.To begin with,a fast DBSCAN algorithm is developed.which considerably speeeds up the original DBSCAN algorithm,Then a sampling based DBSCAN algorithm,a partitioning-based DBSCAN algorithm,and a parallel DBSCAN algorithm are introduced consecutively.Following that ,based on the above-proposed algorithms,a synthetic algorithm is also given,Finally,some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.  相似文献   

19.
数据挖掘中的关联分析技术旨在发现大量数据项集之间有趣的关联关系,其核心问题是寻找频繁项集。针对传统的基于矩阵的关联挖掘算法中矩阵规模和事务数据库大小相关,在处理超大型事务数据库时,仍会存在内存瓶颈的问题,提出了一个矩阵规模和事务数据库大小无关、通过矩阵约束预挖掘后验证的频繁项集发现算法。实验结果显示,该算法提高了频繁项集的挖掘速度。  相似文献   

20.
挖掘空间关联规则的前缀树算法设计与实现   总被引:5,自引:0,他引:5       下载免费PDF全文
空间关联规则挖掘是在空间数据库中进行知识发现的一类重要问题.为此提出了挖掘空间关联规则的二阶段策略,通过多轮次单层布尔型关联规则挖掘,自顶向下逐步细化空间谓词的粒度,从而空间谓词的计算量大大减少.同时,设计了一种基于前缀树的单层布尔型关联规则挖掘算法(FPT-Generate),不需要反复扫描数据库,不产生候选模式集,并在关键优化技术上取得了突破.实验表明,以FPT-Generate为挖掘引擎的空间关联规则发现系统的时间效率与空间可伸缩性远远优于以经典算法Apriori为引擎的系统。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号