首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mining With Noise Knowledge: Error-Aware Data Mining   总被引:1,自引:0,他引:1  
Real-world data mining deals with noisy information sources where data collection inaccuracy, device limitations, data transmission and discretization errors, or man-made perturbations frequently result in imprecise or vague data. Two common practices are to adopt either data cleansing approaches to enhance the data consistency or simply take noisy data as quality sources and feed them into the data mining algorithms. Either way may substantially sacrifice the mining performance. In this paper, we consider an error-aware (EA) data mining design, which takes advantage of statistical error information (such as noise level and noise distribution) to improve data mining results. We assume that such noise knowledge is available in advance, and we propose a solution to incorporate it into the mining process. More specifically, we use noise knowledge to restore original data distributions, which are further used to rectify the model built from noise- corrupted data. We materialize this concept by the proposed EA naive Bayes classification algorithm. Experimental comparisons on real-world datasets will demonstrate the effectiveness of this design.  相似文献   

2.
Attack mitigation schemes actively throttle attack traffic generated in distributed denial-of-service (DDoS) attacks. This paper presents attack diagnosis (AD), a novel attack mitigation scheme that adopts a divide-and-conquer strategy. AD combines the concepts of pushback and packet marking, and its architecture is in line with the ideal DDoS attack countermeasure paradigm - attack detection is performed near the victim host and packet filtering is executed close to the attack sources. AD is a reactive defense mechanism that is activated by a victim host after an attack is detected. By instructing its upstream routers to mark packets deterministically, the victim can trace back one attack source and command an AD-enabled router close to the source to filter the attack packets. This process isolates one attacker and throttles it, which is repeated until the attack is mitigated. We also propose an extension to AD called parallel attack diagnosis (PAD) that is capable of throttling traffic coming from a large number of attackers simultaneously. AD and PAD are analyzed and evaluated using the Skitter Internet map, Lumeta's Internet map, and the 6-degree complete tree topology model. Both schemes are shown to be robust against IP spoofing and to incur low false positive ratios  相似文献   

3.
用SQL Server 2000 构建数据挖掘解决方案   总被引:3,自引:0,他引:3  
Microsoft的SQL Server2000第一次包含了数据挖掘特性。Microsoft的数据挖掘解决方案是基于针对指定的数据挖掘OLE DE上的。OLE DB是Microsoft制定的工业标准并被一系列数据挖掘ISV所支持。这种指定为数据挖掘提出了一种新的类SQL语言,这种语言使数据库开发者能更好地建立数据挖掘的应用。本文给出了一个关于运用SQL Server 2000构建数据挖掘应用的示例。  相似文献   

4.
用神经网络选择多维数据挖掘空间的研究   总被引:1,自引:0,他引:1  
多维数据挖掘是基于数据仓库系统的重要决策支持技术,遗憾的是,由于多维数据的复杂性使数据挖掘的效率和实用性都很差,本文通过对多维数据挖掘模型的分析,说明在多维数据挖掘过程中挖掘空间的选择是影响结果成败的关键步骤,在此基础上提出一种用于挖掘空间选择的神经网络模型,并通过实例应用说明该模型能够正确寻找到正确的挖掘空间,最后文章讨论了该模型的优缺点。  相似文献   

5.
数据分析和预测的高质量性和高效性是非常重要的,尤其是在复杂的数据环境中,其作用更加明显。采用层次分析法构建挖掘模型(AHP Construct Mining Component,ACMC)策略可以更加直观地进行数据挖掘,其优点非常明显。AC-MC策略能够很好地升华原本的层次分析理念。本文基于复杂的数据环境,对ACMC的实用性进行充分的研究和分析。  相似文献   

6.
原始数据集中含有大量噪声数据,且数据的规模很大,直接进行关联规则挖掘会影响准确度和效率。文章提出了一种对原始数据先进行聚类,再提取关联规则的挖掘策略,可以在一定程度内减少噪声数据的干扰,消除数据对象中的冗余属性,提高规则挖掘的有效性。  相似文献   

7.
In this paper, we propose three divide-and-conquer approaches for Bayesian information criterion (BIC)-based speaker segmentation. The approaches detect speaker changes by recursively partitioning a large analysis window into two sub-windows and recursively verifying the merging of two adjacent audio segments using $Delta BIC$ , a widely-adopted distance measure of two audio segments. We compare our approaches to three popular distance-based approaches, namely, Chen and Gopalakrishnan's window-growing-based approach, Siegler 's fixed-size sliding window approach, and Delacourt and Wellekens's DISTBIC approach, by performing computational cost analysis and conducting speaker change detection experiments on two broadcast news data sets. The results show that the proposed approaches are more efficient and achieve higher segmentation accuracy than the compared distance-based approaches. In addition, we apply the segmentation approaches discussed in this paper to the speaker diarization task. The experiment results show that a more effective segmentation approach leads to better diarization accuracy.   相似文献   

8.
用SQL Server2000实现数据挖掘的技术与策略   总被引:2,自引:0,他引:2  
OLE DB是被许多第三方数据挖掘产品供应商所支持的微软制定的工业标准。OLE DB for DataMining以表的形式表达数据挖掘模型对象,而且也提供一种类似SQL的DDL使开发者能更好地建立数据挖掘应用。给出了一个关于运用SQL Server 2000构建数据挖掘应用的实例。  相似文献   

9.
曹洁 《电脑开发与应用》2010,23(5):44-46,49
扩大数据挖掘系统的使用人群,使普通用户能够方便地操作数据挖掘系统,是数据挖掘算法搜索策略的主要研究目标。建立案例库存储专家经验,采用面向对象的方法来表示案例库中的案例,利用模糊商空间来描述案例库的组织结构,结合统计启发式搜索技术实现案例检索,缩小检索范围,加快求解速度,提高了运行效率和准确率。以银行客户经理分析客户流失群体为例进行相应的操作,验证了案例推理数据挖掘算法搜索策略的准确性。  相似文献   

10.
11.
This paper describes a clustering method for unsupervised classification of objects in large data sets. The new methodology combines the mixture likelihood approach with a sampling and subsampling strategy in order to cluster large data sets efficiently. This sampling strategy can be applied to a large variety of data mining methods to allow them to be used on very large data sets. The method is applied to the problem of automated star/galaxy classification for digital sky data and is tested using a sample from the Digitized Palomar Sky Survey (DPOSS) data. The method is quick and reliable and produces classifications comparable to previous work on these data using supervised clustering.  相似文献   

12.
数据挖掘技术   总被引:4,自引:0,他引:4  
数据挖掘(Data Mining)就是从大量的、不完全的、有噪声的、模糊的、随机的数据中,提取隐含在其中的、人们事先不知道的、但又是潜在有用的信息和知识的过程。本文简略介绍了数据挖掘的分析处理过程、分析处理模式、数据挖掘在电信企业中的应用以及数据挖掘的发展趋势。  相似文献   

13.
In many application domains, the amount of available data increased so much that humans need help from automatic computerized methods for extracting relevant information. Moreover, it is becoming more and more common to store data that possess inherently structural or relational characteristics. These types of data are best represented by graphs, which can very naturally represent entities, their attributes, and their relationships to other entities. In this article, we review the state of the art in graph mining, and we present advances in processing trees and graphs by two Computational Intelligence classes of methods, namely Neural Networks and Kernel Methods.  相似文献   

14.
Web的数据挖掘   总被引:1,自引:0,他引:1  
文章主要描述了WEB页数据挖掘的基本任务,包括内容、结构、使用等。针对Web数据的复杂性和特殊性,Web的数据挖掘除日志等一小部分可以用常用的数据挖掘方法外,必须对Web页做必要的数据处理,使之达到结构化数据的挖掘要求,或使用XML技术来构造半结构数据模式再进行数据挖掘。  相似文献   

15.
数据挖掘(Data Mining)是目前IT业界的热点,其身影随处可见。数据挖掘技术在许多行业中得到了很好的应用,尤其是在市场营销中获得了成功,初步体现了其优越性和发展潜力。该文主要分析了数据挖掘、数据仓库,联机分析处理(OLAP分析)等基本概念及它们之间的联系,并简要介绍了数据挖掘工具和数据挖掘应用领域。  相似文献   

16.
文章主要描述了WEB页数据挖掘的基本任务,包括内容、结构、使用等。针对Web数据的复杂性和特殊性。Web的数据挖掘除日志等一小部分可以用常用的数据挖掘方法外,必须对Web页做必要的数据处理,使之达到结构化数据的挖掘要求,或使用XML技术来构造半结构数据模式再进行数据挖掘。  相似文献   

17.
基于本体论的数据挖掘方法   总被引:16,自引:1,他引:15  
数据挖掘是一个人机交互的过程,领域知识对数据挖掘起着重要作用,提出一种基于本体的数据挖掘算法,使领域知识和数据库无缝连接,该算法能更有效地发现有意义的多层次规则。  相似文献   

18.
Web数据挖掘中的数据预处理   总被引:11,自引:0,他引:11  
Web数据挖掘是分析网络应用的主要手段,其数据源一般是网络服务器日志,然而日志记录的是杂乱的,不完整的,不准确的并且是非结构化的数据,必须进行数据预处理。文章将预处理过程分为3个阶段-数据清洗、区分使用者,会话识别,并提出了一个高效的Web数据挖掘预处理结构WLP和相应的算法。  相似文献   

19.
文章介绍了数据采掘技术的定义、数据采掘的过程和主要技术手段以及空间数据仓库的定义、基本结构框架、处理流程和技术支持,分析了基于空间数据仓库的数据采掘特点.  相似文献   

20.
随着数据挖掘技术的发展,各种各样的数据挖掘工具不断开发出来,如何把握这些工具的功能、挖掘技术和未来发展趋势,是一个非常困难的事情。文中借助数据挖掘技术提出了数据挖掘软件工具的一个多维立方体分类模型,给出了一个具体分类实例,总结出数据挖掘工具的技术发展路线和未来发展趋势,并通过对三个不同阶段的数据挖掘工具的深入比较,进一步验证了文中的结论。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号