首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
随着计算机、网络技术的发展,获得有关资料非常简单易行。但对于数量大、涉及面宽的数据,传统统计方法无法完成这类数据的分析。因此,一种智能化的、综合应用各种统计分析、数据库、智能语言来分析庞大数据资料的“数据挖掘”(DataMining)技术应运而生。首先文章较全面的回顾了数据挖掘技术的研究现状,并且详细介绍了目前数据挖掘技术的一些应用示例,并提出了在数据挖掘中一些问题。  相似文献   

2.
数据挖掘技术综述   总被引:8,自引:0,他引:8  
随着计算机、网络技术的发展,获得有关资料非常简单易行。但对于数量大、涉及面宽的数据,传统统计方法无法完成这类数据的分析。因此,一种智能化的、综合应用各种统计分析、数据库、智能语言来分析庞大数据资料的“数据挖掘”(DateMining)技术应运而生。本文主要介绍了数据挖掘的基本概念以及数据挖掘的方法;本文对数据挖掘的应用及其发展前景也进行了描述。  相似文献   

3.
数值型关联规则挖掘在网络入侵检测系统中的应用研究   总被引:3,自引:0,他引:3  
于枫  王敏  高翔 《计算机应用与软件》2006,23(11):52-53,107
基于数据挖掘技术的入侵检测技术是近年来研究的热点,目前有不少入侵检测系统中都采用了关联分析和聚类分析的数据挖掘方法,然而很多攻击难以从单个网络连接来判别,如果对多个连接进行分析势必会产生大量的统计信息。介绍了一种包含统计信息的数值属性关联规则挖掘方法,给出了采用此方法进行入侵检测的实验结果,并对实验结果进行了分析,提出了进一步的改进方向。  相似文献   

4.
互联网信息时代,如何从复杂的数据中进行有目标的数据挖掘是很多领域的一个中心问题.目前针对此问题的方法大多是基于统计学习理论的机器学习方法,并且粒计算在数据挖掘问题中有着广泛的应用.将粒计算方法与统计学习方法相结合,提出了一个更优的粒计算统计学习方法.给出了一个基于粒计算的统计分类算法,并与支持向量机(support vector machine,SVM)、覆盖算法进行了比较,实验表明通过粒化所得到的支持向量求解出的分类结果较优.  相似文献   

5.
数据挖掘(Data Mining)是目前IT业界的热点,其身影随处可见。数据挖掘技术在许多行业中得到了很好的应用,尤其是在市场营销中获得了成功,初步体现了其优越性和发展潜力。该文主要分析了数据挖掘、数据仓库,联机分析处理(OLAP分析)等基本概念及它们之间的联系,并简要介绍了数据挖掘工具和数据挖掘应用领域。  相似文献   

6.
BI课前预习     
商业智能(Business Intelligence,BI)技术主要指对分散的数据进行收集、整合、分析,从中提取有用的信息,并把所获取的信息用于商业或政府决策过程中。BI技术包括数据仓库(Data Warehousing)、联机分析处理(On-line Analytical Processing,OLAP),数据挖掘(Data Mining)在内的用于统计和分析商务数据的先进的信息技术。  相似文献   

7.
分析违法案件的高发时间与高发地点是公共安全领域中非常重要的一个部分。在空间数据挖掘领域中,热点分析可以用于识别具有统计显著性的高值(热点)和低值(冷点)的空间聚类,得到高值或低值要素在空间上发生聚类的位置。利用了ARCGIS9.1中的热点分析工具,分析了上海2009年的几个侵犯财产类的违法案件数据,找到了违法案件在不同时间的高发位置。通过空间数据挖掘,为情报、指挥和实战部门提供决策参考和防范依据。  相似文献   

8.
数据挖掘在CRM中的应用设计   总被引:6,自引:0,他引:6  
讨论了在客户关系管理 (CRM)中用于客户细分和建立客户轮廓的数据挖掘技术。首先指出 CRM的概念和分类 ,然后分析了几种数据挖掘方法 ,最后提出面向 CRM的数据挖掘应用设计。  相似文献   

9.
随着企业会计信息化的日益发展,企业数据的处理也从简单的查询统计功能向辅助决策功能进展.该文结合数据挖掘的概念,分析了数据挖掘在财务分析中的意义,列举了数据挖掘在企业财务领域中的具体应用,并进一步给出数 据挖掘在财务分析中应用的方法和步骤.  相似文献   

10.
关联规则数据挖掘方法的改进和实现   总被引:9,自引:0,他引:9  
本文首先介绍了关联规则的定义,然后对两种常见的关联规则数据挖掘的典型算法Apriori和MAQA进行了介绍与比较,并且在此基础上提出了适合于销售型数据仓库的独特改进方法一统计关联规则数据挖掘方法SMAR,讨论了该方法的原理、优点以及具体实现,文章最后对数据挖掘的发展作了展望。  相似文献   

11.
There have been many studies, mainly by the use of statistical modeling techniques, as to predicting quality characteristics in machining operations where a large number of process variables need to be considered. In conventional metal removal processes, however, an exact prediction of surface roughness is not possible or very difficult to achieve, due to the stochastic nature of machining processes. In this paper, a novel approach is proposed to solve the quality assurance problem in predicting the acceptance of computer numerical control (CNC) machined parts, rather than focusing on the prediction of precise surface roughness values. One of the data mining techniques, called rough set theory, is applied to derive rules for the process variables that contribute to the surface roughness. The proposed rule-composing algorithm and rule-validation procedure have been tested with the historical data the company has collected over the years. The results indicate a higher accuracy over the statistical approaches in terms of predicting acceptance level of surface roughness.  相似文献   

12.
Landslide incidence can be affected by a variety of environmental factors. Past studies have focused on the identification of these environmental factors, but most are based on statistical analysis. In this paper, spatial information techniques were applied to a case study of landslide occurrence in China by combining remote sensing and geographical information systems with an innovative data mining approach (rough set theory) and statistical analyses. Core and reducts of data attributes were obtained by data mining based on rough set theory. Rules for the impact factors, which can contribute to landslide occurrence, were generated from the landslide knowledge database. It was found that all 11 rules can be classified as both exact and approximate rules. In terms of importance, three main rules were then extracted as the key decision-making rules for landslide predictions. Meanwhile, the relationship between landslide occurrence and environmental factors was statistically analyzed to validate the accuracy of rules extracted by the rough set-based method. It was shown that the rough set-based approach is of use in analyzing environmental factors affecting landslide occurrence, and thus facilitates the decision-making process for landslide prediction.  相似文献   

13.
Credit scoring is without a doubt one of the oldest applications of analytics. In recent years, a multitude of sophisticated classification techniques have been developed to improve the statistical performance of credit scoring models. Instead of focusing on the techniques themselves, this paper leverages alternative data sources to enhance both statistical and economic model performance. The study demonstrates how including call networks, in the context of positive credit information, as a new Big Data source has added value in terms of profit by applying a profit measure and profit-based feature selection. A unique combination of datasets, including call-detail records, credit and debit account information of customers is used to create scorecards for credit card applicants. Call-detail records are used to build call networks and advanced social network analytics techniques are applied to propagate influence from prior defaulters throughout the network to produce influence scores. The results show that combining call-detail records with traditional data in credit scoring models significantly increases their performance when measured in AUC. In terms of profit, the best model is the one built with only calling behavior features. In addition, the calling behavior features are the most predictive in other models, both in terms of statistical and economic performance. The results have an impact in terms of ethical use of call-detail records, regulatory implications, financial inclusion, as well as data sharing and privacy.  相似文献   

14.
包含度与粗糙集数据分析中的度量   总被引:18,自引:1,他引:17  
粗糙集理论是一种新的处理模糊和不确定知识的软计算工具。粗糙集数据分析是粗糙集理论中的主要应用技术之一,它主要用来分析数据的性质、粗糙分类、分析属性的依赖性和属性的重要性、抽取决策规则等,在人工智能与认知科学领域有着重要的应用。该文通过将包含度概念引入到粗糙集理论中,建立了包含度与粗糙集数据分析中的度量之间的关系,证家了粗糙集数据分析中的有关度量均可归结为包含度。这些结论有助于人们深刻理解粗糙数据分析的本质,可作为建立粗糙集数据分析中的度量的主要依据。  相似文献   

15.
Classification in imbalanced domains is a recent challenge in data mining. We refer to imbalanced classification when data presents many examples from one class and few from the other class, and the less representative class is the one which has more interest from the point of view of the learning task. One of the most used techniques to tackle this problem consists in preprocessing the data previously to the learning process. This preprocessing could be done through under-sampling; removing examples, mainly belonging to the majority class; and over-sampling, by means of replicating or generating new minority examples. In this paper, we propose an under-sampling procedure guided by evolutionary algorithms to perform a training set selection for enhancing the decision trees obtained by the C4.5 algorithm and the rule sets obtained by PART rule induction algorithm. The proposal has been compared with other under-sampling and over-sampling techniques and the results indicate that the new approach is very competitive in terms of accuracy when comparing with over-sampling and it outperforms standard under-sampling. Moreover, the obtained models are smaller in terms of number of leaves or rules generated and they can considered more interpretable. The results have been contrasted through non-parametric statistical tests over multiple data sets.  相似文献   

16.
传统的聚类方法大多是基于距离或者是样品间相似度的,这就要求所分析的数据必须是定量的。但是在数据挖掘中,存在着大量的定性数据,传统的聚类分析方法已不再是一个可行的方法,这就需要寻找一个可以有效处理定性数据的聚类方法。粗糙集是处理定性数据的有效方法,在详细阐述粗糙集的相关概念后,利用属性重要性的概念,提出了一种能有效处理定性数据的聚类分析方法,并利用了数据对该方法进行了实证分析,取得了良好的结果。  相似文献   

17.
The use of functional size measurement (FSM) methods in software development organizations is growing during the years. Also, object oriented (OO) techniques have become quite a standard to design the software and, in particular, Use Cases is one of the most used techniques to specify functional requirements. Main FSM methods do not include specific rules to measure the software functionality from its Use Cases analysis. To deal with this issue some other methods like Kramer's functional measurement method have been developed. Therefore, one of the main issues for those organizations willing to use OO functional measurement method in order to facilitate the use cases count procedure is how to convert their portfolio functional size from the previously adopted FSM method towards the new method. The objective of this research is to find a statistical relationship for converting the software functional size units measured by the International Function Point Users Group (IFPUG) function point analysis (FPA) method into Kramer-Smith's use cases points (UCP) method and vice versa. Methodologies for a correct data gathering are proposed and results obtained are analyzed to draw the linear and non-linear equations for this correlation. Finally, a conversion factor and corresponding conversion intervals are given to establish the statistical relationship.  相似文献   

18.
朱建平  曾玉钰 《微机发展》2007,17(12):89-91
传统的聚类方法大多是基于距离或者是样品间相似度的,这就要求所分析的数据必须是定量的。但是在数据挖掘中,存在着大量的定性数据,传统的聚类分析方法已不再是一个可行的方法,这就需要寻找一个可以有效处理定性数据的聚类方法。粗糙集是处理定性数据的有效方法,在详细阐述粗糙集的相关概念后,利用属性重要性的概念,提出了一种能有效处理定性数据的聚类分析方法,并利用了数据对该方法进行了实证分析,取得了良好的结果。  相似文献   

19.
ABSTRACT

Over the last three decades, Network Intrusion Detection Systems (NIDSs), particularly, Anomaly Detection Systems (ADSs), have become more significant in detecting novel attacks than Signature Detection Systems (SDSs). Evaluating NIDSs using the existing benchmark data sets of KDD99 and NSLKDD does not reflect satisfactory results, due to three major issues: (1) their lack of modern low footprint attack styles, (2) their lack of modern normal traffic scenarios, and (3) a different distribution of training and testing sets. To address these issues, the UNSW-NB15 data set has recently been generated. This data set has nine types of the modern attacks fashions and new patterns of normal traffic, and it contains 49 attributes that comprise the flow based between hosts and the network packets inspection to discriminate between the observations, either normal or abnormal. In this paper, we demonstrate the complexity of the UNSW-NB15 data set in three aspects. First, the statistical analysis of the observations and the attributes are explained. Second, the examination of feature correlations is provided. Third, five existing classifiers are used to evaluate the complexity in terms of accuracy and false alarm rates (FARs) and then, the results are compared with the KDD99 data set. The experimental results show that UNSW-NB15 is more complex than KDD99 and is considered as a new benchmark data set for evaluating NIDSs.  相似文献   

20.
With the rapid development of economy and the frequent occurrence of air pollution incidents, the problem of air pollution has become a hot issue of concern to the whole people. The air quality big data is generally characterized by multi-source heterogeneity, dynamic mutability, and spatial–temporal correlation, which usually uses big data technology for air quality analysis after data fusion. In recent years, various models and algorithms using big data techniques have been proposed. To summarize these methodologies of air quality study, in this paper, we first classify air quality monitoring by big data techniques into three categories, consisting of the spatial model, temporal model and spatial–temporal model. Second, we summarize the typical methods by big data techniques that are needed in air quality forecasting into three folds, which are statistical forecasting model, deep neural network model, and hybrid model, presenting representative scenarios in some folds. Third, we analyze and compare some representative air pollution traceability methods in detail, classifying them into two categories: traditional model combined with big data techniques and data-driven model. Finally, we provide an outlook on the future of air quality analysis with some promising and challenging ideas.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号