首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Today’s security threats like malware are more sophisticated and targeted than ever, and they are growing at an unprecedented rate. To deal with them, various approaches are introduced. One of them is Signature-based detection, which is an effective method and widely used to detect malware; however, there is a substantial problem in detecting new instances. In other words, it is solely useful for the second malware attack. Due to the rapid proliferation of malware and the desperate need for human effort to extract some kinds of signature, this approach is a tedious solution; thus, an intelligent malware detection system is required to deal with new malware threats. Most of intelligent detection systems utilise some data mining techniques in order to distinguish malware from sane programs. One of the pivotal phases of these systems is extracting features from malware samples and benign ones in order to make at least a learning model. This phase is called “Malware Analysis” which plays a significant role in these systems. Since API call sequence is an effective feature for realising unknown malware, this paper is focused on extracting this feature from executable files. There are two major kinds of approach to analyse an executable file. The first type of analysis is “Static Analysis” which analyses a program in source code level. The second one is “Dynamic Analysis” that extracts features by observing program’s activities such as system requests during its execution time. Static analysis has to traverse the program’s execution path in order to find called APIs. Because it does not have sufficient information about decision making points in the given executable file, it is not able to extract the real sequence of called APIs. Although dynamic analysis does not have this drawback, it suffers from execution overhead. Thus, the feature extraction phase takes noticeable time. In this paper, a novel hybrid approach, HDM-Analyser, is presented which takes advantages of dynamic and static analysis methods for rising speed while preserving the accuracy in a reasonable level. HDM-Analyser is able to predict the majority of decision making points by utilising the statistical information which is gathered by dynamic analysis; therefore, there is no execution overhead. The main contribution of this paper is taking accuracy advantage of the dynamic analysis and incorporating it into static analysis in order to augment the accuracy of static analysis. In fact, the execution overhead has been tolerated in learning phase; thus, it does not impose on feature extraction phase which is performed in scanning operation. The experimental results demonstrate that HDM-Analyser attains better overall accuracy and time complexity than static and dynamic analysis methods.  相似文献   

2.
基于数据挖掘的SNORT网络入侵检测系统   总被引:1,自引:0,他引:1       下载免费PDF全文
回顾了当前入侵检测技术和数据挖掘技术,对Snort网络入侵检测系统进行了深入的剖析;然后在Snort的基础上构建了基于数据挖掘的网络入侵检测系统模型;重点设计和实现了其中基于k-means算法的异常检测引擎和聚类分析模块,并对k-means算法进行了改进,使其更适用于网络入侵检测系统。  相似文献   

3.
Comparative analysis of data mining methods for bankruptcy prediction   总被引:1,自引:0,他引:1  
A great deal of research has been devoted to prediction of bankruptcy, to include application of data mining. Neural networks, support vector machines, and other algorithms often fit data well, but because of lack of comprehensibility, they are considered black box technologies. Conversely, decision trees are more comprehensible by human users. However, sometimes far too many rules result in another form of incomprehensibility. The number of rules obtained from decision tree algorithms can be controlled to some degree through setting different minimum support levels. This study applies a variety of data mining tools to bankruptcy data, with the purpose of comparing accuracy and number of rules. For this data, decision trees were found to be relatively more accurate compared to neural networks and support vector machines, but there were more rule nodes than desired. Adjustment of minimum support yielded more tractable rule sets.  相似文献   

4.
In this paper, we propose a new method for automatically constructing concepts maps for adaptive learning systems based on data mining techniques. First, we calculate the counter values between any two questions, where the counter values indicate the answer-consistence between any two questions. Then, we consider four kinds of association rules between two questions to mine some information. Finally, we calculate the relevance degree between two concepts derived from the association rule to construct concept maps for adaptive learning systems. The proposed method can overcome the drawbacks of Chen and Bai’s (2010) and Lee et al.’s method (2009). It provides us with a useful way to construct concept maps for adaptive learning systems based on data mining techniques.  相似文献   

5.
针对银行全成本分析的业务特点和数据挖掘各种算法的应用特征,提出了基于关联规则的分类算法在银行全成本分析系统中的分析模型.将此模型与其他机器学习分类算法进行实验比较,得出此算法在该领域的最佳效果,所挖掘出的规则得到银行工作人员的肯定.  相似文献   

6.
The visual senses for humans have a unique status, offering a very broadband channel for information flow. Visual approaches to analysis and mining attempt to take advantage of our abilities to perceive pattern and structure in visual form and to make sense of, or interpret, what we see. Visual Data Mining techniques have proven to be of high value in exploratory data analysis and they also have a high potential for mining large databases. In this work, we try to investigate and expand the area of visual data mining by proposing new visual data mining techniques for the visualization of mining outcomes.  相似文献   

7.
Current trends clearly indicate that online learning has become an important learning mode. However, no effective assessment mechanism for learning performance yet exists for e-learning systems. Learning performance assessment aims to evaluate what learners learned during the learning process. Traditional summative evaluation only considers final learning outcomes, without concerning the learning processes of learners. With the evolution of learning technology, the use of learning portfolios in a web-based learning environment can be beneficially adopted to record the procedure of the learning, which evaluates the learning performances of learners and produces feedback information to learners in ways that enhance their learning. Accordingly, this study presents a mobile formative assessment tool using data mining, which involves six computational intelligence theories, i.e. statistic correlation analysis, fuzzy clustering analysis, grey relational analysis, K-means clustering, fuzzy association rule mining and fuzzy inference, in order to identify the key formative assessment rules according to the web-based learning portfolios of an individual learner for the performance promotion of web-based learning. Restated, the proposed method can help teachers to precisely assess the learning performance of individual learner utilizing only the learning portfolios in a web-based learning environment. Hence, teachers can devote themselves to teaching and designing courseware, since they save a lot of time in measuring learning performance. More importantly, teachers can understand the main factors influencing learning performance in a web-based learning environment based on the interpretable learning performance assessment rules obtained. Experimental results indicate that the evaluation results of the proposed scheme are very close to those of summative assessment results and the factor analysis provides simple and clear learning performance assessment rules. Furthermore, the proposed learning feedback with formative assessment can clearly promote the learning performances and interests of learners.  相似文献   

8.
针对现有入侵检测系统的不足,根据入侵和正常访问模式的网络数据表现形式的不同以及特定数据分组的出现规律,提出按协议分层的入侵检测模型,并在各个协议层运用不同的数据挖掘方法抽取入侵特征,以达到提高建模的准确性、检测速度和克服人工提取入侵特征的主观性的目的。其中运用的数据挖掘算法主要有关联挖掘、序列挖掘、分类算法和聚类算法。  相似文献   

9.
正常用户行为活动是随时间变化的,一个异常分析系统要能适应这种变化更新正常行为模型,避免误报警.对增量更新算法进行了研究,使用线性回归的方法对相似度进行估计,如果实际相似度与估计值差值大于某个阈值,则产生报警;否则采用改进的滑动窗增量挖掘的方法,更新正常活动模型.并用DARPA-MIT 1999数据集验证其可行性.  相似文献   

10.
基于数据挖掘技术的证券客户分析系统   总被引:2,自引:0,他引:2  
基于数据挖掘技术研究并实现了证券客户分析系统,通过对数据进行详细的分析和预处理,通过数据挖掘工具SPSS CLEMENTINE8.0中的K-平均值、C5.0算法建立模型,并运用模型预测最有潜力的客户,实际应用验证了其准确性.  相似文献   

11.
The continuous growth of biodiversity databases has led to a search for techniques that can assist researchers. This paper presents a method for the analysis of occurrences of pairs and groups of species that aims to identify patterns in co-occurrences through the application of association rules of data mining. We propose, implement and evaluate a tool to help ecologists formulate and validate hypotheses regarding co-occurrence between two or more species. To validate our approach, we analyzed the occurrence of species with a dataset from the 50-ha Forest Dynamics Project on Barro Colorado Island (BCI). Three case studies were developed based on this tropical forest to evaluate patterns of positive and negative correlation. Our tool can be used to point co-occurrence in a multi-scale form and for multi-species, simultaneously, accelerating the identification process for the Spatial Point Pattern Analysis. This paper demonstrates that data mining, which has been used successfully in applications such as business and consumer profile analysis, can be a useful resource in ecology.  相似文献   

12.
The present study proposes an algorithm for fault detection in terms of condition‐based maintenance with data mining techniques. The proposed algorithm is applied on an aircraft turbofan engine using flight data and consists of two main sections. In the first section, the relationship between engine exhaust gas temperature (EGT) as the main engine health monitoring criterion and other operational and environmental parameters of the engine was modelled using the data‐driven models. In the second section, a data set including EGT residuals, that is, the difference between the actual EGT of the system and the EGT estimated by the developed model in the health conditions of the engine, was created. Finally, faults occurring in each flight were detected based on the identification of abnormal events by a one‐class support vector machine trained by the health condition EGT residual data set. The results indicated that the proposed algorithm was an effective approach for inspecting aircraft engine conditions and detecting faults, with no need for technical knowledge on the interior characteristics of the aircraft engine.  相似文献   

13.
针对传统质量管理理念约束下,离散制造企业质量管理体系条块分割,管理活动分散且对大规模质量数据分析匮乏的问题,文章从产品生命周期的角度对质量数据进行了分析整理,在进行数据挖掘之前对数据的获取、数据的预处理以及数据的转化进行了讨论,构建了ETL数据预处理模型.然后对具体的数据挖掘应用方法进行了研究,提出了产品生命周期质量数据挖掘整体体系结构,并对结构中包含的数据层、方法层、知识层、应用层进行了分析阐述.  相似文献   

14.
Monitoring water quality on a near-real-time basis to address water resource management and public health concerns in coupled natural systems and the built environment is by no means an easy task. Total organic carbon (TOC) in surface waters is a known precursor of disinfection by-products in drinking water treatment such as total trihalomethanes (TTHMs), which are a suspected carcinogen and have been related to birth defects if water treatment plants cannot remove them. In this paper, an early warning system using integrated data fusion and mining (IDFM) techniques was proposed to estimate spatiotemporal distributions of TOC on a daily basis for monitoring water quality in a lake that serves as the source of a drinking water treatment plant. Landsat satellite images have high spatial resolution, but such application suffers from a long overpass interval of 16 days. On the other hand, coarse-resolution sensors with frequent revisit times, such as MODIS, are incapable of providing detailed water quality information because of low spatial resolution. This issue can be resolved by using data or sensor fusion techniques, such as IDFM, in which the high-spatial-resolution Landsat and the high-temporal-resolution MODIS images are fused and analysed by a suite of regression models to optimally produce synthetic images with both high spatial and temporal resolution. Analysis of the results using four statistical indices confirmed that the genetic programming model can accurately estimate the spatial and temporal variations of TOC concentrations in a small lake. The model entails a slight bias towards overestimating TOC, and it requires cloud-free input data for the lake. The IDFM efforts lead to the reconstruction of the spatiotemporal TOC distributions in a lake in support of healthy drinking water treatment.  相似文献   

15.
既有的基于数据挖掘技术的入侵检测将研究重点放在误用检测上。提出了基于数据挖掘技术的网络异常检测方案,并详细分析了核心模块的实现。首先使用静态关联规则挖掘算法和领域层面挖掘算法刻画系统的网络正常活动简档,然后通过动态关联规则挖掘算法和领域层面挖掘算法输出表征对系统攻击行为的可疑规则集,这些规则集结合从特征选择模块中提取网络行为特征作为分类器的输入,以进一步降低误报率。在由DAR-AP1998入侵检测评估数据集上的实验证明了该方法的有效性。最后,对数据挖掘技术在入侵检测领域中的既有研究工作做了,总结。  相似文献   

16.
网球比赛中技战术的决策水平对比赛结果有着十分重要的影响,如何从大量的技战术数据中找到运动员比赛中技战术特征与规律,以弥补传统统计手段的不足,为比赛中技战术正确决策提供科学依据是一个急需研究解决的问题。采用关联分析数据挖掘理论和依托Weka数据挖掘平台,建立了网球技战术击球落点与得失分之间的关联规则数据挖掘分析模型,进行了具体案例研究,为网球比赛技战术中的落点决策问题提供客观科学的决策支持。  相似文献   

17.
针对顺序的模糊关联规则算法在处理海量飞行数据时,由于算法可扩展性低、响应时间过长而带来数据处理的不便,本文采用模糊关联并行挖掘算法,先使用并行的模糊c-2均值算法将数量型属性划分成若干个模糊集,并借助模糊集软化属性的划分边界:在用改进的布尔型关联规则的并行挖掘算法来发现频繁模糊属性集.通过飞行数据库进行算法验证,证明了并行算法具有好的可扩展性、规模增长性和加速比性能.  相似文献   

18.
刘博  彭宏  郑启伦 《计算机应用》2006,26(6):1406-1408
针对数据预处理的方法进行了研究,提出了基于非线性相关性分析与量化(Non-Linear Correlation Analysis,NLCA)算法。NLCA算法是一种基于在多重图中通过对多重边聚合从而达到约简的工具,它包括边聚合与点聚合。这种算法能够很好地表示实时数据全局的相关性,改进了现有使用联合概率的单一计算方法。对该算法进行了大量实际数据的验证,显示出它是一种优于现有的数据预处理方法。  相似文献   

19.
The classic two-stepped approach of the Apriori algorithm and its descendants, which consisted of finding all large itemsets and then using these itemsets to generate all association rules has worked well for certain categories of data. Nevertheless for many other data types this approach shows highly degraded performance and proves rather inefficient.

We argue that we need to search all the search space of candidate itemsets but rather let the database unveil its secrets as the customers use it. We propose a system that does not merely scan all possible combinations of the itemsets, but rather acts like a search engine specifically implemented for making recommendations to the customers using techniques borrowed from Information Retrieval.  相似文献   


20.
Gene expression profiling using DNA microarray technique has been shown as a promising tool to improve the diagnosis and treatment of cancer. Recently, many computational methods have been used to discover maker genes, make class prediction and class discovery based on gene expression data of cancer tissue. However, those techniques fall short on some critical areas. These included (a) interpretation of the solution and extracted knowledge. (b) Integrating various sources data and incorporating the prior knowledge into the system. (c) Giving a global understanding of biological complex systems by a complete knowledge discovery framework. This paper proposes a multiple-kernel SVM based data mining system. Multiple tasks, including feature selection, data fusion, class prediction, decision rule extraction, associated rule extraction and subclass discovery, are incorporated in an integrated framework. ALL-AML Leukemia dataset is used to demonstrate the performance of this system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号