首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper describes data mining and data warehousing techniques that can improve the performance and usability of Intrusion Detection Systems (IDS). Current IDS do not provide support for historical data analysis and data summarization. This paper presents techniques to model network traffic and alerts using a multi-dimensional data model and star schemas. This data model was used to perform network security analysis and detect denial of service attacks. Our data model can also be used to handle heterogeneous data sources (e.g. firewall logs, system calls, net-flow data) and enable up to two orders of magnitude faster query response times for analysts as compared to the current state of the art. We have used our techniques to implement a prototype system that is being successfully used at Army Research Labs. Our system has helped the security analyst in detecting intrusions and in historical data analysis for generating reports on trend analysis. Recommended by: Ashfaq Khokhar  相似文献   

2.
数据挖掘在Web智能化中应用研究   总被引:3,自引:9,他引:3  
分析了Web信息的特点和目前开发利用的局限,提出在Web上采用数据挖掘技术即Web挖掘,促进web智能化的观点。全面阐述了Web挖掘在Web智能化中的几个重要应用。指出Web挖掘是Web技术中一个重要的研究领域,是发现蕴藏在web上知识、区分权威链接、理解用户访问模式和网页语义结构的关键,它使充分利用Web大量的真正有价值的信息成为可能,为智能化Web奠定了基础。  相似文献   

3.
基于数据挖掘的SNORT网络入侵检测系统   总被引:1,自引:0,他引:1       下载免费PDF全文
回顾了当前入侵检测技术和数据挖掘技术,对Snort网络入侵检测系统进行了深入的剖析;然后在Snort的基础上构建了基于数据挖掘的网络入侵检测系统模型;重点设计和实现了其中基于k-means算法的异常检测引擎和聚类分析模块,并对k-means算法进行了改进,使其更适用于网络入侵检测系统。  相似文献   

4.
In this paper we give evidence that it is possible to characterize and detect those potential users of false invoices in a given year, depending on the information in their tax payment, their historical performance and characteristics, using different types of data mining techniques. First, clustering algorithms like SOM and neural gas are used to identify groups of similar behaviour in the universe of taxpayers. Then decision trees, neural networks and Bayesian networks are used to identify those variables that are related to conduct of fraud and/or no fraud, detect patterns of associated behaviour and establishing to what extent cases of fraud and/or no fraud can be detected with the available information. This will help identify patterns of fraud and generate knowledge that can be used in the audit work performed by the Tax Administration of Chile (in Spanish Servicio de Impuestos Internos (SII)) to detect this type of tax crime.  相似文献   

5.

While the Internet and World Wide Web have put a huge volume of low-quality information at the easy access of an information gathering system, filtering out irrelevant information has become a big challenge. In this paper, a Web data mining and cleaning strategy for information gathering is proposed. A data-mining model is presented for the data that come from multiple agents. Using the model, a data-cleaning algorithm is then presented to eliminate irrelevant data. To evaluate the data-cleaning strategy, an interpretation is given for the mining model according to evidence theory. An experiment is also conducted to evaluate the strategy using Web data. The experimental results have shown that the proposed strategy is efficient and promising.  相似文献   

6.
Web数据挖掘中数据集成问题的研究   总被引:3,自引:0,他引:3  
在分析Web环境下数据源特点的基础上,对Web数据挖掘中的数据集成问题进行了深入的研究,给出了一个基于XML技术的集成方案.该方案采用Web数据存取方式将不同数据源集成起来,为Web数据挖掘提供了统一有效的数据集,解决了Web异构数据源集成的难题.通过一个具体实例介绍了Web数据集成的过程.  相似文献   

7.
In recent years, the deep web has become extremely popular. Like any other data source, data mining on the deep web can produce important insights or summaries of results. However, data mining on the deep web is challenging because the databases cannot be accessed directly, and therefore, data mining must be performed by sampling the datasets. The samples, in turn, can only be obtained by querying deep web databases with specific inputs. In this paper, we target two related data mining problems, association mining and differential rulemining. These are proposed to extract high-level summaries of the differences in data provided by different deep web data sources in the same domain. We develop stratified sampling methods to perform these mining tasks on a deep web source. Our contributions include a novel greedy stratification approach, which recursively processes the query space of a deep web data source, and considers both the estimation error and the sampling costs. We have also developed an optimized sample allocation method that integrates estimation error and sampling costs. Our experimental results show that our algorithms effectively and consistently reduce sampling costs, compared with a stratified sampling method that only considers estimation error. In addition, compared with simple random sampling, our algorithm has higher sampling accuracy and lower sampling costs.  相似文献   

8.
Journal of Computer Virology and Hacking Techniques - The introduction of new memory-based crypto-mining techniques and the rise of new web technologies like WebAssembly, made the use of browsers...  相似文献   

9.
为满足用户精确化和个性化获取信息的需要,通过分析Deep Web信息的特点,提出了一个可搜索不同主题Deep Web 信息的爬虫框架.针对爬虫框架中Deep Web数据库发现和Deep Web爬虫爬行策略两个难题,分别提出了使用通用搜索引擎以加快发现不同主题的Deep Web数据库和采用常用字最大限度下载Deep Web信息的技术.实验结果表明了该框架采用的技术是可行的.  相似文献   

10.
11.
随着网络通信技术的快速发展与成本的不断降低,越来越多的信息都被发布到网络上.但是,由于Web数据挖掘比单个数据仓库的挖掘要复杂的多,因而面向Web的数据挖掘成了一个新的课题.介绍了Web数据挖掘的分类以及当前的发展状况,并将XML技术应用在Web数据挖掘中,介绍了一个自动挖掘的模型,应用于股票信息自动采集系统,展示了Web数据自动挖掘方法的可行性与优越性.同时,也指出了Web数据自动挖掘尚存的不足及其发展前景.  相似文献   

12.
The paper proposes an adaptive web system—that is, a website that is capable of changing its original design to fit user requirements. For the purpose of improving shortcomings of the website, and also to make it much easier for users to access information, the system analyzes user browsing patterns from their access records. This paper concentrates on the operating-efficiency of a website—that is, the efficiency with which a group of users browse a website. By achieving high efficiency, users spend less operating cost to accomplish a desired user goal. Based on user access data, we analyze each user's operating activities as well as their browsing sequences. With this data, we can calculate a measure of the efficiency of the user's browsing sequences. The paper develops an algorithm to accurately calculate this efficiency and to suggest how to increase the efficiency of user operations. This can be achieved in two ways: (i) by adding a new link between two web pages, or (ii) by suggesting to designers to reconsider existing inefficient links so as to allow users to arrive at their target pages more quickly. Using this algorithm, we develop a prototype to prove the concept of efficiency. The implementation is an adaptive website system to automatically change the website architecture according to user browsing activities and to improve website usability from the viewpoint of efficiency.  相似文献   

13.
14.
Today’s security threats like malware are more sophisticated and targeted than ever, and they are growing at an unprecedented rate. To deal with them, various approaches are introduced. One of them is Signature-based detection, which is an effective method and widely used to detect malware; however, there is a substantial problem in detecting new instances. In other words, it is solely useful for the second malware attack. Due to the rapid proliferation of malware and the desperate need for human effort to extract some kinds of signature, this approach is a tedious solution; thus, an intelligent malware detection system is required to deal with new malware threats. Most of intelligent detection systems utilise some data mining techniques in order to distinguish malware from sane programs. One of the pivotal phases of these systems is extracting features from malware samples and benign ones in order to make at least a learning model. This phase is called “Malware Analysis” which plays a significant role in these systems. Since API call sequence is an effective feature for realising unknown malware, this paper is focused on extracting this feature from executable files. There are two major kinds of approach to analyse an executable file. The first type of analysis is “Static Analysis” which analyses a program in source code level. The second one is “Dynamic Analysis” that extracts features by observing program’s activities such as system requests during its execution time. Static analysis has to traverse the program’s execution path in order to find called APIs. Because it does not have sufficient information about decision making points in the given executable file, it is not able to extract the real sequence of called APIs. Although dynamic analysis does not have this drawback, it suffers from execution overhead. Thus, the feature extraction phase takes noticeable time. In this paper, a novel hybrid approach, HDM-Analyser, is presented which takes advantages of dynamic and static analysis methods for rising speed while preserving the accuracy in a reasonable level. HDM-Analyser is able to predict the majority of decision making points by utilising the statistical information which is gathered by dynamic analysis; therefore, there is no execution overhead. The main contribution of this paper is taking accuracy advantage of the dynamic analysis and incorporating it into static analysis in order to augment the accuracy of static analysis. In fact, the execution overhead has been tolerated in learning phase; thus, it does not impose on feature extraction phase which is performed in scanning operation. The experimental results demonstrate that HDM-Analyser attains better overall accuracy and time complexity than static and dynamic analysis methods.  相似文献   

15.
The present study proposes an algorithm for fault detection in terms of condition‐based maintenance with data mining techniques. The proposed algorithm is applied on an aircraft turbofan engine using flight data and consists of two main sections. In the first section, the relationship between engine exhaust gas temperature (EGT) as the main engine health monitoring criterion and other operational and environmental parameters of the engine was modelled using the data‐driven models. In the second section, a data set including EGT residuals, that is, the difference between the actual EGT of the system and the EGT estimated by the developed model in the health conditions of the engine, was created. Finally, faults occurring in each flight were detected based on the identification of abnormal events by a one‐class support vector machine trained by the health condition EGT residual data set. The results indicated that the proposed algorithm was an effective approach for inspecting aircraft engine conditions and detecting faults, with no need for technical knowledge on the interior characteristics of the aircraft engine.  相似文献   

16.
The defacement of web sites has become a widespread problem. Reaction to these incidents is often quite slow and triggered by occasional checks or even feedback from users, because organizations usually lack a systematic and round the clock surveillance of the integrity of their web sites. A more systematic approach is certainly desirable. An attractive option in this respect consists in augmenting availability and performance monitoring services with defacement detection capabilities. Motivated by these considerations, in this paper we assess the performance of several anomaly detection approaches when faced with the problem of detecting web defacements automatically. All these approaches construct a profile of the monitored page automatically,based on machine learning techniques, and raise an alert when the page content does not fit the profile. We assessed their performance in terms of false positives and false negatives on a dataset composed of 300 highly dynamic web pages that we observed for 3 months and includesa set of 320 real defacements.  相似文献   

17.
Classification of intrusion attacks and normal network traffic is a challenging and critical problem in pattern recognition and network security. In this paper, we present a novel intrusion detection approach to extract both accurate and interpretable fuzzy IF-THEN rules from network traffic data for classification. The proposed fuzzy rule-based system is evolved from an agent-based evolutionary framework and multi-objective optimization. In addition, the proposed system can also act as a genetic feature selection wrapper to search for an optimal feature subset for dimensionality reduction. To evaluate the classification and feature selection performance of our approach, it is compared with some well-known classifiers as well as feature selection filters and wrappers. The extensive experimental results on the KDD-Cup99 intrusion detection benchmark data set demonstrate that the proposed approach produces interpretable fuzzy systems, and outperforms other classifiers and wrappers by providing the highest detection accuracy for intrusion attacks and low false alarm rate for normal network traffic with minimized number of features.  相似文献   

18.
The visual senses for humans have a unique status, offering a very broadband channel for information flow. Visual approaches to analysis and mining attempt to take advantage of our abilities to perceive pattern and structure in visual form and to make sense of, or interpret, what we see. Visual Data Mining techniques have proven to be of high value in exploratory data analysis and they also have a high potential for mining large databases. In this work, we try to investigate and expand the area of visual data mining by proposing new visual data mining techniques for the visualization of mining outcomes.  相似文献   

19.
The classic two-stepped approach of the Apriori algorithm and its descendants, which consisted of finding all large itemsets and then using these itemsets to generate all association rules has worked well for certain categories of data. Nevertheless for many other data types this approach shows highly degraded performance and proves rather inefficient.

We argue that we need to search all the search space of candidate itemsets but rather let the database unveil its secrets as the customers use it. We propose a system that does not merely scan all possible combinations of the itemsets, but rather acts like a search engine specifically implemented for making recommendations to the customers using techniques borrowed from Information Retrieval.  相似文献   


20.
《Computers in Industry》2007,58(8-9):772-782
Practically every major company with a retail operation has its own web site and online sales facilities. This paper describes a toolset that exploits web usage data mining techniques to identify customer Internet browsing patterns. These patterns are then used to underpin a personalised product recommendation system for online sales. Within the architecture, a Kohonen neural network or self-organizing map (SOM) has been trained for use both offline, to discover user group profiles, and in real-time to examine active user click stream data, make a match to a specific user group, and recommend a unique set of product browsing options appropriate to an individual user. Our work demonstrates that this approach can overcome the scalability problem that is common among these types of system. Our results also show that a personalised recommender system powered by the SOM predictive model is able to produce consistent recommendations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号