共查询到20条相似文献,搜索用时 15 毫秒
1.
Mining With Noise Knowledge: Error-Aware Data Mining 总被引:1,自引:0,他引:1
Xindong Wu Xingquan Zhu 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2008,38(4):917-932
Real-world data mining deals with noisy information sources where data collection inaccuracy, device limitations, data transmission and discretization errors, or man-made perturbations frequently result in imprecise or vague data. Two common practices are to adopt either data cleansing approaches to enhance the data consistency or simply take noisy data as quality sources and feed them into the data mining algorithms. Either way may substantially sacrifice the mining performance. In this paper, we consider an error-aware (EA) data mining design, which takes advantage of statistical error information (such as noise level and noise distribution) to improve data mining results. We assume that such noise knowledge is available in advance, and we propose a solution to incorporate it into the mining process. More specifically, we use noise knowledge to restore original data distributions, which are further used to rectify the model built from noise- corrupted data. We materialize this concept by the proposed EA naive Bayes classification algorithm. Experimental comparisons on real-world datasets will demonstrate the effectiveness of this design. 相似文献
2.
Chen Ruiliang Park Jung-Min Marchany Randolph 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(5):577-588
Attack mitigation schemes actively throttle attack traffic generated in distributed denial-of-service (DDoS) attacks. This paper presents attack diagnosis (AD), a novel attack mitigation scheme that adopts a divide-and-conquer strategy. AD combines the concepts of pushback and packet marking, and its architecture is in line with the ideal DDoS attack countermeasure paradigm - attack detection is performed near the victim host and packet filtering is executed close to the attack sources. AD is a reactive defense mechanism that is activated by a victim host after an attack is detected. By instructing its upstream routers to mark packets deterministically, the victim can trace back one attack source and command an AD-enabled router close to the source to filter the attack packets. This process isolates one attacker and throttles it, which is repeated until the attack is mitigated. We also propose an extension to AD called parallel attack diagnosis (PAD) that is capable of throttling traffic coming from a large number of attackers simultaneously. AD and PAD are analyzed and evaluated using the Skitter Internet map, Lumeta's Internet map, and the 6-degree complete tree topology model. Both schemes are shown to be robust against IP spoofing and to incur low false positive ratios 相似文献
3.
4.
用神经网络选择多维数据挖掘空间的研究 总被引:1,自引:0,他引:1
多维数据挖掘是基于数据仓库系统的重要决策支持技术,遗憾的是,由于多维数据的复杂性使数据挖掘的效率和实用性都很差,本文通过对多维数据挖掘模型的分析,说明在多维数据挖掘过程中挖掘空间的选择是影响结果成败的关键步骤,在此基础上提出一种用于挖掘空间选择的神经网络模型,并通过实例应用说明该模型能够正确寻找到正确的挖掘空间,最后文章讨论了该模型的优缺点。 相似文献
5.
数据分析和预测的高质量性和高效性是非常重要的,尤其是在复杂的数据环境中,其作用更加明显。采用层次分析法构建挖掘模型(AHP Construct Mining Component,ACMC)策略可以更加直观地进行数据挖掘,其优点非常明显。AC-MC策略能够很好地升华原本的层次分析理念。本文基于复杂的数据环境,对ACMC的实用性进行充分的研究和分析。 相似文献
6.
原始数据集中含有大量噪声数据,且数据的规模很大,直接进行关联规则挖掘会影响准确度和效率。文章提出了一种对原始数据先进行聚类,再提取关联规则的挖掘策略,可以在一定程度内减少噪声数据的干扰,消除数据对象中的冗余属性,提高规则挖掘的有效性。 相似文献
7.
《IEEE transactions on audio, speech, and language processing》2010,18(1):141-157
8.
用SQL Server2000实现数据挖掘的技术与策略 总被引:2,自引:0,他引:2
OLE DB是被许多第三方数据挖掘产品供应商所支持的微软制定的工业标准。OLE DB for DataMining以表的形式表达数据挖掘模型对象,而且也提供一种类似SQL的DDL使开发者能更好地建立数据挖掘应用。给出了一个关于运用SQL Server 2000构建数据挖掘应用的实例。 相似文献
9.
扩大数据挖掘系统的使用人群,使普通用户能够方便地操作数据挖掘系统,是数据挖掘算法搜索策略的主要研究目标。建立案例库存储专家经验,采用面向对象的方法来表示案例库中的案例,利用模糊商空间来描述案例库的组织结构,结合统计启发式搜索技术实现案例检索,缩小检索范围,加快求解速度,提高了运行效率和准确率。以银行客户经理分析客户流失群体为例进行相应的操作,验证了案例推理数据挖掘算法搜索策略的准确性。 相似文献
10.
11.
This paper describes a clustering method for unsupervised classification of objects in large data sets. The new methodology combines the mixture likelihood approach with a sampling and subsampling strategy in order to cluster large data sets efficiently. This sampling strategy can be applied to a large variety of data mining methods to allow them to be used on very large data sets. The method is applied to the problem of automated star/galaxy classification for digital sky data and is tested using a sample from the Digitized Palomar Sky Survey (DPOSS) data. The method is quick and reliable and produces classifications comparable to previous work on these data using supervised clustering. 相似文献
12.
13.
In many application domains, the amount of available data increased so much that humans need help from automatic computerized methods for extracting relevant information. Moreover, it is becoming more and more common to store data that possess inherently structural or relational characteristics. These types of data are best represented by graphs, which can very naturally represent entities, their attributes, and their relationships to other entities. In this article, we review the state of the art in graph mining, and we present advances in processing trees and graphs by two Computational Intelligence classes of methods, namely Neural Networks and Kernel Methods. 相似文献
14.
Web的数据挖掘 总被引:1,自引:0,他引:1
李健 《数字社区&智能家居》2006,(32)
文章主要描述了WEB页数据挖掘的基本任务,包括内容、结构、使用等。针对Web数据的复杂性和特殊性,Web的数据挖掘除日志等一小部分可以用常用的数据挖掘方法外,必须对Web页做必要的数据处理,使之达到结构化数据的挖掘要求,或使用XML技术来构造半结构数据模式再进行数据挖掘。 相似文献
15.
数据挖掘(Data Mining)是目前IT业界的热点,其身影随处可见。数据挖掘技术在许多行业中得到了很好的应用,尤其是在市场营销中获得了成功,初步体现了其优越性和发展潜力。该文主要分析了数据挖掘、数据仓库,联机分析处理(OLAP分析)等基本概念及它们之间的联系,并简要介绍了数据挖掘工具和数据挖掘应用领域。 相似文献
16.
李健 《数字社区&智能家居》2006,(11):20-20,90
文章主要描述了WEB页数据挖掘的基本任务,包括内容、结构、使用等。针对Web数据的复杂性和特殊性。Web的数据挖掘除日志等一小部分可以用常用的数据挖掘方法外,必须对Web页做必要的数据处理,使之达到结构化数据的挖掘要求,或使用XML技术来构造半结构数据模式再进行数据挖掘。 相似文献
17.
18.
Web数据挖掘中的数据预处理 总被引:11,自引:0,他引:11
Web数据挖掘是分析网络应用的主要手段,其数据源一般是网络服务器日志,然而日志记录的是杂乱的,不完整的,不准确的并且是非结构化的数据,必须进行数据预处理。文章将预处理过程分为3个阶段-数据清洗、区分使用者,会话识别,并提出了一个高效的Web数据挖掘预处理结构WLP和相应的算法。 相似文献
19.
文章介绍了数据采掘技术的定义、数据采掘的过程和主要技术手段以及空间数据仓库的定义、基本结构框架、处理流程和技术支持,分析了基于空间数据仓库的数据采掘特点. 相似文献
20.
随着数据挖掘技术的发展,各种各样的数据挖掘工具不断开发出来,如何把握这些工具的功能、挖掘技术和未来发展趋势,是一个非常困难的事情。文中借助数据挖掘技术提出了数据挖掘软件工具的一个多维立方体分类模型,给出了一个具体分类实例,总结出数据挖掘工具的技术发展路线和未来发展趋势,并通过对三个不同阶段的数据挖掘工具的深入比较,进一步验证了文中的结论。 相似文献