首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 154 毫秒
1.
序列模式在基因分析、金融预测等方面有着重要的应用,是数据挖掘的一个主要分支,鉴于数据流应用的日益增多。本文在研究传统序列模式挖掘算法的基础上,提出了一种基于可扩展滑动窗口和贝叶斯概率过滤的面向数据流的序列模式挖掘算法(BMSP—DS算法),目的是简化序列模式发现的中间结果,提高挖掘效率.以便在小的存储空间和低的运算时间内快速发现流数据的频繁序列模式,同时算法也减少了因主观支持度取值不当对模式发现造成的负面影响,实验结果表明,该算法是可行、较优的.  相似文献   

2.
数据挖掘在异常入侵检测系统中的应用   总被引:4,自引:0,他引:4  
在分析现有入侵检测技术和系统的基础上,本文提出了一种基于数据挖掘和可滑动窗口的异常检测模型,该模型综合利用了关联规则和序列模式算法对网络数据进行充分挖掘,分别给出了基于时间窗口的训练阶段和检测阶段的挖掘算法,并建立贝叶斯网络,进一步判定规则挖掘中的可疑行为,提高检测的准确率。  相似文献   

3.
面向入侵检测的数据挖掘是目前国际上网络安全和数据库、信息决策领域的最前沿的研究方向之一。入侵检测中进行序列模式挖掘时,由于频繁网络模式和频繁系统活动模式只能在网络或操作系统的单个审计数据流中获得,因而传统从事件流数据中获取单序列模式的算法,以及从不同多数据序列中获取多个序列模式的算法都不再适用。本文研究了入侵数据的特性,提出了网络入侵检测中序列模式挖掘框架和实时序列模式挖掘模型,并设计了一种新的面向入侵检测.基于轴属性、参考属性、相关支持度的序列模式挖掘算法SPM—ID(Sequential Patterns Mining for Intrusion Detection)。最后在KDD Cup99数据集的基础上实现算法及分析算法的性能。  相似文献   

4.
异常检测是入侵检测的一种重要手段, 异常检测的关键在于正常模式的刻画, 而正常模式的质量取决于数据的质量。对于纯净( 不带噪声) 的数据, 正常模式的准确度相对较高; 对于不太纯净的数据, 就有可能丢掉某些真正的用户特征, 从而会增加误警率。基于此提出了一个ASM 用户行为序列特征挖掘算法, 该算法结合数据挖掘中的序列挖掘方法, 利用模糊匹配技术来挖掘隐藏在噪声背后的用户行为序列。实验表明, 采用模糊匹配技术为入侵检测提取正常序列模式是可行的、有效的。  相似文献   

5.
为了更好地分析购物篮数据,挖掘出潜在客户,序列模式挖掘应运而生。序列模式挖掘是数据挖掘一个重要研究内容,近年来在很多领域得到广泛运用。概述序列模式挖掘的发展现状,研究基本挖掘框架的经典挖掘算法与扩展模型挖掘算法,特别针对近年来出现的新数据形式序列模式挖掘,以及基于零压缩二叉决策图(ZBDD)结构的挖掘算法做了阐述,最后对序列模式挖掘发展趋势进行了展望。  相似文献   

6.
序列模式挖掘综述   总被引:4,自引:0,他引:4  
综述了序列模式挖掘的研究状况。首先介绍了序列模式挖掘背景与相关概念;其次总结了序列模式挖掘的一般方法,介绍并分析了最具代表性的序列模式挖掘算法;最后展望序列模式挖掘的研究方向。便于研究者对已有算法进行改进,提出具有更好性能的新的序列模式挖掘算法。  相似文献   

7.
挖掘多数据流的异步偶合模式的抗噪声算法   总被引:1,自引:0,他引:1  
挖掘多数据流的异步偶合模式是具有挑战性的工作.主要的研究工作包括:(1) 研究Haar小波滤波技术在挖掘流数据的异步偶合模式中的应用;(2) 引入小波系数序列来度量数据流的异步局域偶合度;证明了一系列定理,保证了度量方法的正确性;(3) 设计了环形滑动窗口和挖掘异步偶合模式的抗噪声增量算法,其时间复杂性小于O(n2);(4) 使用真实数据进行模拟实验,验证了算法的有效性.  相似文献   

8.
针对当数据集含有敏感信息时,直接发布频繁序列模式本身及其支持度计数都有可能泄露用户隐私信息的问题,提出一种满足差分隐私(DP)的频繁序列模式挖掘(DP-FSM)算法。该算法利用向下封闭性质生成候选序列模式集,基于智能截断方法从候选模式中挑选出频繁的序列模式,最后采用几何机制对所选出模式的真实支持度添加噪声进行扰动。另外,为了提高挖掘结果的可用性,设计了一个阈值修正的策略来减小挖掘过程中的截断误差和传播误差。理论分析证明了该算法满足ε-差分隐私。实验结果表明了该算法在拒真率(FNR)和相对支持度误差(RSE)两个指标上明显低于对比算法PFS2,有效地提高了挖掘结果的准确度。  相似文献   

9.
为了减少AprioriAll算法挖掘过程中候选序列的生成以及对序列数据库的扫描次数,提高算法的挖掘效率,提出了一种基于改进的AprioriAll算法的Web序列模式挖掘方法.首先对数据进行预处理,然后利用经过改进的AprioriAll算法进行模式挖掘.算法的改进主要有两点:一个通过改变候选序列的连接方式来减少候选序列的产生;二是通过减少不必要的数据库扫描操作来提高算法的效率.通过实验验证了改进后算法在Web序列模式挖掘过程中的高效性和正确性.  相似文献   

10.
蛋白质序列作为生物序列数据一个重要组成部分,对其的分析研究已经成为生物信息学中的一个重要的研究方向和内容.通过对序列进行模式挖掘,可以对蛋白质序列或某一蛋白质家族序列进行研究,因此蛋白质序列的模式挖掘已经成为蛋白质序列研究中的一项重要任务.MBioPM是一种最新的生物序列模式挖掘算法,该算法通过引入模式划分概念,提高算法的效率,但该算法在效率方面仍存在不足,而且挖掘结果存在冗余性的问题.因此,提出一种优化算法BioPMMH,通过带有模式划分特点的Hash链表结构来优化算法中的搜索空间及策略,并在算法过程中对重复模式进行过滤.实验表明,算法BioPMMH能有效提高模式挖掘的效率,并解决结果的冗余性问题.  相似文献   

11.
连续数据离散化是数据挖掘分类方法中的重要预处理过程。本文提出一种基于最小描述长度原理的均衡离散化方法,该方法基于最小描述长度理论提出一种均衡的离散化函数,很好地衡量了离散区间与分类错误之间的关系。同时,基于均衡函数提出一种有效的启发式算法,寻找最佳的断点序列。仿真结果表明,在C5.0决策树和Naive贝叶斯分类器上,提出的算法有较好的分类学习能力。  相似文献   

12.
为了在处理噪声数据时获得更可靠的分类规则,提出了一种粗糙规则挖掘算法.通过粗糙规则集的不确定量度,在变精度粗糙集理论下近似约简分析的基础上,引入了信息熵,建立了变精度意义下的决策表的度量方式.利用离散粒子群算法,提出一种基于粒子群优化的粗糙集知识的近似约简算法,导出了粗糙规则集.经过实例分析说明,这种算法不但具有一定的噪声容忍度,而且该算法得到的规则具有较高的正确度和覆盖度,从而保证分类的准确性.  相似文献   

13.
A recommender system is an approach performed by e-commerce for increasing smooth users’ experience. Sequential pattern mining is a technique of data mining used to identify the co-occurrence relationships by taking into account the order of transactions. This work will present the implementation of sequence pattern mining for recommender systems within the domain of e-commerce. This work will execute the Systolic tree algorithm for mining the frequent patterns to yield feasible rules for the recommender system. The feature selection's objective is to pick a feature subset having the least feature similarity as well as highest relevancy with the target class. This will mitigate the feature vector's dimensionality by eliminating redundant, irrelevant, or noisy data. This work presents a new hybrid recommender system based on optimized feature selection and systolic tree. The features were extracted using Term Frequency-Inverse Document Frequency (TF-IDF), feature selection with the utilization of River Formation Dynamics (RFD), and the Particle Swarm Optimization (PSO) algorithm. The systolic tree is used for pattern mining, and based on this, the recommendations are given. The proposed methods were evaluated using the MovieLens dataset, and the experimental outcomes confirmed the efficiency of the techniques. It was observed that the RFD feature selection with systolic tree frequent pattern mining with collaborative filtering, the precision of 0.89 was achieved.  相似文献   

14.
Given the explosive growth of data collected from current business environment, data mining can potentially discover new knowledge to improve managerial decision making. This paper proposes a novel data mining approach that employs an evolutionary algorithm to discover knowledge represented in Bayesian networks. The approach is applied successfully to handle the business problem of finding response models from direct marketing data. Learning Bayesian networks from data is a difficult problem. There are two different approaches to the network learning problem. The first one uses dependency analysis, while the second one searches good network structures according to a metric. Unfortunately, both approaches have their own drawbacks. Thus, we propose a novel hybrid algorithm of the two approaches, which consists of two phases, namely, the conditional independence (CI) test and the search phases. In the CI test phase, dependency analysis is conducted to reduce the size of the search space. In the search phase, good Bayesian network models are generated by using an evolutionary algorithm. A new operator is introduced to further enhance the search effectiveness and efficiency. In a number of experiments and comparisons, the hybrid algorithm outperforms MDLEP, our previous algorithm which uses evolutionary programming (EP) for network learning, and other network learning algorithms. We then apply the approach to two data sets of direct marketing and compare the performance of the evolved Bayesian networks obtained by the new algorithm with those by MDLEP, the logistic regression models, the na/spl inodot//spl uml/ve Bayesian classifiers, and the tree-augmented na/spl inodot//spl uml/ve Bayesian network classifiers (TAN). In the comparison, the new algorithm outperforms the others.  相似文献   

15.
This paper presents a fully Bayesian way to solve the simultaneous localization and spatial prediction problem using a Gaussian Markov random field (GMRF) model. The objective is to simultaneously localize robotic sensors and predict a spatial field of interest using sequentially collected noisy observations by robotic sensors. The set of observations consists of the observed noisy positions of robotic sensing vehicles and noisy measurements of a spatial field. To be flexible, the spatial field of interest is modeled by a GMRF with uncertain hyperparameters. We derive an approximate Bayesian solution to the problem of computing the predictive inferences of the GMRF and the localization, taking into account observations, uncertain hyperparameters, measurement noise, kinematics of robotic sensors, and uncertain localization. The effectiveness of the proposed algorithm is illustrated by simulation results as well as by experiment results. The experiment results successfully show the flexibility and adaptability of our fully Bayesian approach in a data‐driven fashion.  相似文献   

16.
A graphical model for audiovisual object tracking   总被引:3,自引:0,他引:3  
We present a new approach to modeling and processing multimedia data. This approach is based on graphical models that combine audio and video variables. We demonstrate it by developing a new algorithm for tracking a moving object in a cluttered, noisy scene using two microphones and a camera. Our model uses unobserved variables to describe the data in terms of the process that generates them. It is therefore able to capture and exploit the statistical structure of the audio and video data separately, as well as their mutual dependencies. Model parameters are learned from data via an EM algorithm, and automatic calibration is performed as part of this procedure. Tracking is done by Bayesian inference of the object location from data. We demonstrate successful performance on multimedia clips captured in real world scenarios using off-the-shelf equipment.  相似文献   

17.
Wireless sensor networks (WSNs) consist of small sensors with limited computational and communication capabilities. Reading data in WSN is not always reliable due to open environmental factors such as noise, weakly received signal strength, and intrusion attacks. The process of detecting highly noisy data is called anomaly or outlier detection. The challenging aspect of noise detection in WSN is related to the limited computational and communication capabilities of sensors. The purpose of this research is to design a local time-series-based data noise and anomaly detection approach for WSN. The proposed local outlier detection algorithm (LODA) is a decentralized noise detection algorithm that runs on each sensor node individually with three important features: reduction mechanism that eliminates the noneffective features, determination of the memory size of data histogram to accomplish the effective available memory, and classification for predicting noisy data. An adaptive Bayesian network is used as the classification algorithm for prediction and identification of outliers in each sensor node locally. Results of our approach are compared with four well-known algorithms using benchmark real-life datasets, which demonstrate that LODA can achieve higher (up to 89%) accuracy in the prediction of outliers in real sensory data.  相似文献   

18.
For surface reconstruction problems with noisy and incomplete range data, a Bayesian estimation approach can improve the overall quality of the surfaces. The Bayesian approach to surface estimation relies on a likelihood term, which ties the surface estimate to the input data, and the prior, which ensures surface smoothness or continuity. This paper introduces a new high-order, nonlinear prior for surface reconstruction. The proposed prior can smooth complex, noisy surfaces, while preserving sharp, geometric features, and it is a natural generalization of edge-preserving methods in image processing, such as anisotropic diffusion. An exact solution would require solving a fourth-order partial differential equation (PDE), which can be difficult with conventional numerical techniques. Our approach is to solve a cascade system of two second-order PDEs, which resembles the original fourth-order system. This strategy is based on the observation that the generalization of image processing to surfaces entails filtering the surface normals. We solve one PDE for processing the normals and one for refitting the surface to the normals. Furthermore, we implement the associated surface deformations using level sets. Hence, the algorithm can accommodate very complex shapes with arbitrary and changing topologies. This paper gives the mathematical formulation and describes the numerical algorithms. We also show results using range and medical data.  相似文献   

19.
Biao Qin  Yuni Xia  Shan Wang  Xiaoyong Du 《Knowledge》2011,24(8):1151-1158
Data uncertainty can be caused by numerous factors such as measurement precision limitations, network latency, data staleness and sampling errors. When mining knowledge from emerging applications such as sensor networks or location based services, data uncertainty should be handled cautiously to avoid erroneous results. In this paper, we apply probabilistic and statistical theory on uncertain data and develop a novel method to calculate conditional probabilities of Bayes theorem. Based on that, we propose a novel Bayesian classification algorithm for uncertain data. The experimental results show that the proposed method classifies uncertain data with potentially higher accuracies than the Naive Bayesian approach. It also has a more stable performance than the existing extended Naive Bayesian method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号