首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
云计算环境下用户数据的集中存储为数据挖掘提供了便利条件,同时也为用户的隐私保护带来了挑战。为了解决云数据在数据挖掘条件下的隐私保护问题,提出了云计算环境下的隐私保护模型。该模型以公有云为基础,增加了一个分类预处理模块,设定了分类标准,详细讨论了分类后数据的处理方法,并讨论了该模型下数据的检索、还原方法以及运行环境保护、数据的云端销毁等环节。最后对模型的复杂性及安全性进行理论的对比分析,证明了该模型在数据挖掘条件下对云数据隐私保护的有效性。  相似文献   

2.
Web日志是目前Web数据挖掘的重要研究方向。数据预处理是Web日志挖掘中的关键技术。详细的介绍了Web日志挖掘的预处理过程。数据预处理包括数据清理、识别用户、识别会话和框架页面清理、路径补充。用户识别后,框架页面降低了数据挖掘的效率,可以通过过滤框架页面大幅度减少产生的无效页面数。  相似文献   

3.
为了实现高效率低成本的海量数据挖掘,为企业决策提供参考,提出了基于云计算的海量数据挖掘模型。该模型中海量数据的处理和存储都是在云计算环境中进行的,首先对海量的数据进行一定的预处理,形成结构一致的数据后,应用云计算平台上的MapReduce模型进行高效的并行数据处理,最后得到所需的数据挖掘结果。基于云计算的海量数据挖掘的效率明显高于传统的数据挖掘,并且数据挖掘结果的准确性有了一定的提高,而且随着数据量的增多,该模型的优势会愈发明显。  相似文献   

4.
提出一个基于数据挖掘的入侵检测模型框架。在模型中,数据挖掘贯穿于检测过程之中,通过数据采集、数据预处理、数据挖掘、知识发现等过程,使得入侵检测成为一个完整的知识发现过程。  相似文献   

5.
基于用户行为分析的应用层DDoS攻击检测方法*   总被引:2,自引:1,他引:1  
应用层拒绝服务攻击与传统的拒绝服务相比,其破坏性更大,也更难被检测和防御。对此,基于用户浏览行为的分析,提出了一种采用自回归模型来检测应用层DDoS攻击的方法。通过AR模型和卡尔曼滤波,学习和预测正常用户的访问并判断异常;当定位异常访问源后,反馈给前端路由器进行限流或过滤。在电信IDC实际网络环境中,测试结果表明该方法是有效的。  相似文献   

6.
现阶段的数据挖掘研究工作主要集中于挖掘核心算法方面,忽视了对数据预处理的研究。本文将数据预处理无缝集成于数据仓库的构建过程中,提出了一种数据预处理过程模型,对企业成功实施数据挖掘应用做出了有益的探索。  相似文献   

7.
徐文 《软件》2012,(1):39-41,45
用户生成业务(简称UGS),意为普通用户不用编程仅通过简单操作便可自己创建个性化业务。目前该技术在互联网中得到了一定的应用。社交网络的风靡给人与人之间的联系提供了丰富的数据,基于这些数据可以分析用户间的相似度,从而为相似的用户推荐业务。本文研究了传统基于社交网络的用户生成业务框架和应用在物联网环境时传统框架的不足。提出了一个基于动态社交网络的动态社交网络用户生成业务系统框架,提升了用户生成业务系统在物联网业务环境中的性能。  相似文献   

8.
秦永俊 《计算机测量与控制》2017,25(1):111-113, 118
在移动计算环境下,通过对远程用户的体验数据优化挖掘,满足远程用户的个性化需求,提高对远程用户QoS服务质量;传统的数据挖掘方法采用显著特征关联信息提取算法,当远程用户体验数据之间的差异性特征不明显时,挖掘的准确性不好;提出一种基于关联用户自适应链路跟踪补偿的移动计算环境下远程用户体验数据挖掘模型,进行远程用户体验数据挖掘模型的总体设计和数据结构特征分析,对采集的远程用户体验数据进行非线性时间序列分解,对数据序列通过自相关特征匹配和特征压缩实现挖掘数据的指向性信息优化提取,采用关联用户自适应链路跟踪补偿方法实现对数据挖掘误差的控制和补偿,提高了数据挖掘的准确性和有效性;仿真结果表明,采用该挖掘方法进行移动计算环境下远程用户体验数据挖掘的准确度高,实时性较好,满足了移动远程用户的个性化需求,提高了对用户服务的针对性。  相似文献   

9.
大规模网络环境和大数据相关技术的发展对传统数据融合分析技术提出了新的挑战。针对目前多源数据融合分析过程灵活性差、处理效率低的问题,提出了一种基于相似连接的多源数据并行预处理方法,该方法采用了分治和并行的思想。首先,通过对多源数据中的相似语义进行统一、对个性语义进行保留的预处理方法提高了灵活性;其次,提出了一种改进的并行MapReduce框架,提高了相似连接的效率。实验结果表明,所提方法在保证数据完整性的基础上,使总的数据量减小了32%。与传统的MapReduce框架相比,改进后的框架在耗费时间方面减小了43.91%,因此该方法可以有效提高多源数据融合分析的效率。  相似文献   

10.
研究基于机器学习的地震异常数据挖掘方法.在进行地震异常数据挖掘过程中,由于地震监测系统信号时变性及监测环境的不稳定性,采用传统的方法进行挖掘,其挖掘的精确度较低.为此,提出基于机器学习的地震异常数据挖掘方法.根据机器学习的相关理论获取标准方程组和最小均方误差值,实现异常数据挖掘最优模型的构建,通过计算数据的特征向量,建立地震监测数据特征库,依据获取的概率值实现对监测数据的正确判断,从而完成对地震异常数据的有效挖掘.实验结果表明,利用基于机器学习的地震异常数据挖掘方法,能够有效的提高地震异常数据的挖掘准确度与挖掘效率,保证了地震监测系统的有效性.  相似文献   

11.
在Web日志挖掘的过程中,数据预处理是整个Web日志挖掘过程的基础,其直接影响了日志挖掘的质量和结果.由于目前大多数网页都采用框架模式,而传统的预处理技术并没有针对frame页面进行过滤,即使过滤,也会导致页面结构的混乱,从而不能够为路径补充提供正确的信息.基于此,本文提出一种基于重构网站结构的Web日志挖掘数据预处理方法以及基于它的路径补充方法.  相似文献   

12.
针对大型滚转机器轴承故障诊断应用场景中传统故障识别技术通常存在诊断识别精度低的问题,在频域分析基础上提出了一种新的数据挖掘框架——关联频繁模式集挖掘框架(Associated frequency patterns mining framework, AFPMF),由数据预处理、关联频繁模式集挖掘和故障状态监测组成。首先,在数据预处理过程中,AFPMF在时域上使用时间窗分块划分机械振动数据流,再使用傅立叶变换对数据流进行时频变换实现故障频率特征提取。其次,使用基于滑动窗的关联频繁模式树构建压缩树,求解关联频繁模式集,实现数据挖掘过程。最后,根据数据挖掘结果中出现的振动频率判别潜在故障,从而实现监测故障状态。通过对比AFPMF和传统方法在轴承故障诊断应用场景的实验结果可知,相比传统方案,AFPMF具有更优的故障识别性能。  相似文献   

13.
Fuzzy data mining is used to extract fuzzy knowledge from linguistic or quantitative data. It is an extension of traditional data mining and the derived knowledge is relatively meaningful to human beings. In the past, we proposed a mining algorithm to find suitable membership functions for fuzzy association rules based on ant colony systems. In that approach, precision was limited by the use of binary bits to encode the membership functions. This paper elaborates on the original approach to increase the accuracy of results by adding multi-level processing. A multi-level ant colony framework is thus designed and an algorithm based on the structure is proposed to achieve the purpose. The proposed approach first transforms the fuzzy mining problem into a multi-stage graph, with each route representing a possible set of membership functions. The new approach then extends the previous one, using multi-level processing to solve the problem in which the maximum quantities of item values in the transactions may be large. The membership functions derived in a given level will be refined in the subsequent level. The final membership functions in the last level are then outputted to the rule-mining phase to find fuzzy association rules. Experiments are also performed to show the performance of the proposed approach. The experimental results show that the proposed multi-level ant colony systems mining approach can obtain improved results.  相似文献   

14.
Unemployment rate prediction has become critically significant, because it can help government to make decision and design policies. In previous studies, traditional univariate time series models and econometric methods for unemployment rate prediction have attracted much attention from governments, organizations, research institutes, and scholars. Recently, novel methods using search engine query data were proposed to forecast unemployment rate. In this paper, a data mining framework using search engine query data for unemployment rate prediction is presented. Under the framework, a set of data mining tools including neural networks (NNs) and support vector regressions (SVRs) is developed to forecast unemployment trend. In the proposed method, search engine query data related to employment activities is firstly extracted. Secondly, feature selection model is suggested to reduce the dimension of the query data. Thirdly, various NNs and SVRs are employed to model the relationship between unemployment rate data and query data, and genetic algorithm is used to optimize the parameters and refine the features simultaneously. Fourthly, an appropriate data mining method is selected as the selective predictor by using the cross-validation method. Finally, the selective predictor with the best feature subset and proper parameters is used to forecast unemployment trend. The empirical results show that the proposed framework clearly outperforms the traditional forecasting approaches, and support vector regression with radical basis function (RBF) kernel is dominant for the unemployment rate prediction. These findings imply that the data mining framework is efficient for unemployment rate prediction, and it can strengthen government’s quick responses and service capability.  相似文献   

15.
完全加权数据模型的特点是其项目权值分布在各个事务记录中,随着事务记录的不同而变化。现有的加权负关联规则挖掘算法不能适用于完全加权数据模型。该文提出一种新颖的基于概率比和兴趣度的完全加权正负关联规则的挖掘算法,探讨了算法在教育信息化数据中的应用。算法以概率比代替传统的置信度,采用支持度-概率比-兴趣度架构衡量完全加权正负关联规则,获得很好的挖掘效果。以真实的教育数据和文本数据为实验测试集,与现有正负关联规则挖掘算法比较,该文提出的算法更有效、更合理,具有较高的理论价值和应用前景。  相似文献   

16.
为了方便油藏数据特征的分析和石油的勘探开发过程,本文利用Spark并行计算框架分析油藏数据,并通过数据挖掘算法分析油藏属性之间的潜在关系,对油藏的不同层段进行了分类和预测.本文的主要工作包括:搭建Spark分布式集群和数据处理、分析平台,Spark是流行的大数据并行计算框架,相对传统的一些分析方法和工具,可以实现快速、准确的数据挖掘任务;根据油藏数据的特点建立多维异常检测函数,并新增渗孔比判别属性Pr;在处理不平衡数据时,针对逻辑回归分类提出交叉召回训练模型,并优化代价函数,针对决策树,提出KR-SMOTE对小类别样本进行过采样扩充,这两种方法都可以有效处理数据不平衡问题,提高分类精度.  相似文献   

17.
数据挖掘过程的多维视图   总被引:3,自引:0,他引:3  
数据挖掘(DM)是非常具有挑战性的工作,数据挖掘过程是多个因素耦合的决策问题。讨论了当前流行的DM过程CRISPDM和SEMMA的不同之处及优缺点。从机器学习、统计和数据质量角度对挖掘有效性作了讨论,认为一个真正高效的过程应该面向算法,强调探索,以挖掘出高可靠性的具有商业价值的知识目标,并紧跟技术的发展。给出数据挖掘过程的多维视图,将算法分解为组件维、模型维和过程维等维度,以此提出新的DM过程的框架。  相似文献   

18.
This paper presents an informatics framework to apply feature-based engineering concept for cost estimation supported with data mining algorithms. The purpose of this research work is to provide a practical procedure for more accurate cost estimation by using the commonly available manufacturing process data associated with ERP systems. The proposed method combines linear regression and data-mining techniques, leverages the unique strengths of the both, and creates a mechanism to discover cost features. The final estimation function takes the user’s confidence level over each member technique into consideration such that the application of the method can phase in gradually in reality by building up the data mining capability. A case study demonstrates the proposed framework and compares the results from empirical cost prediction and data mining. The case study results indicate that the combined method is flexible and promising for determining the costs of the example welding features. With the result comparison between the empirical prediction and five different data mining algorithms, the ANN algorithm shows to be the most accurate for welding operations.  相似文献   

19.
A decision-theoretic approach to data mining   总被引:1,自引:0,他引:1  
In this paper, we develop a decision-theoretic framework for evaluating data mining systems, which employ classification methods, in terms of their utility in decision-making. The decision-theoretic model provides an economic perspective on the value of "extracted knowledge", in terms of its payoff to the organization, and suggests a wide range of decision problems that arise from this point of view. The relation between the quality of a data mining system and the amount of investment that the decision maker is willing to make is formalized. We propose two ways by which independent data mining systems can be combined and show that the combined data mining system can be used in the decision-making process of the organization to increase payoff. Examples are provided to illustrate the various concepts, and several ways by which the proposed framework can be extended are discussed.  相似文献   

20.
Data mining can be defined as a process for finding trends and patterns in large data. An important technique for extracting useful information, such as regularities, from usually historical data, is called as association rule mining. Most research on data mining is concentrated on traditional relational data model. On the other hand, the query flocks technique, which extends the concept of association rule mining with a ‘generate-and-test’ model for different kind of patterns, can also be applied to deductive databases. In this paper, query flocks technique is extended with view definitions including recursive views. Although in our system query flock technique can be applied to a data base schema including both the intensional data base (IDB) or rules and the extensible data base (EDB) or tabled relations, we have designed an architecture to compile query flocks from datalog into SQL in order to be able to use commercially available data base management systems (DBMS) as an underlying engine of our system. However, since recursive datalog views (IDB's) cannot be converted directly into SQL statements, they are materialized before the final compilation operation. On this architecture, optimizations suitable for the extended query flocks are also introduced. Using the prototype system, which is developed on a commercial database environment, advantages of the new architecture together with the optimizations, are also presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号