共查询到20条相似文献,搜索用时 0 毫秒
1.
Due to its damage to Internet security, malware (e.g., virus, worm, trojan) and its detection has caught the attention of both anti-malware industry and researchers for decades. To protect legitimate users from the attacks, the most significant line of defense against malware is anti-malware software products, which mainly use signature-based method for detection. However, this method fails to recognize new, unseen malicious executables. To solve this problem, in this paper, based on the instruction sequences extracted from the file sample set, we propose an effective sequence mining algorithm to discover malicious sequential patterns, and then All-Nearest-Neighbor (ANN) classifier is constructed for malware detection based on the discovered patterns. The developed data mining framework composed of the proposed sequential pattern mining method and ANN classifier can well characterize the malicious patterns from the collected file sample set to effectively detect newly unseen malware samples. A comprehensive experimental study on a real data collection is performed to evaluate our detection framework. Promising experimental results show that our framework outperforms other alternate data mining based detection methods in identifying new malicious executables. 相似文献
2.
Scalability is a primary issue in existing sequential pattern mining algorithms for dealing with a large amount of data. Previous work, namely sequential pattern mining on the cloud (SPAMC), has already addressed the scalability problem. It supports the MapReduce cloud computing architecture for mining frequent sequential patterns on large datasets. However, this existing algorithm does not address the iterative mining problem, which is the problem that reloading data incur additional costs. Furthermore, it did not study the load balancing problem. To remedy these problems, we devised a powerful sequential pattern mining algorithm, the sequential pattern mining in the cloud-uniform distributed lexical sequence tree algorithm (SPAMC-UDLT), exploiting MapReduce and streaming processes. SPAMC-UDLT dramatically improves overall performance without launching multiple MapReduce rounds and provides perfect load balancing across machines in the cloud. The results show that SPAMC-UDLT can significantly reduce execution time, achieves extremely high scalability, and provides much better load balancing than existing algorithms in the cloud. 相似文献
3.
Applied Intelligence - Nonoverlapping sequential pattern mining, as a kind of repetitive sequential pattern mining with gap constraints, can find more valuable patterns. Traditional algorithms... 相似文献
4.
Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraint-based sequential pattern mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequent-pattern mining
does not fit our mission well. An extended framework is developed based on a sequential pattern growth methodology. Our study
shows that constraints can be effectively and efficiently pushed deep into the sequential pattern mining under this new framework.
Moreover, this framework can be extended to constraint-based structured pattern mining as well.
This research is supported in part by NSERC Grant 312194-05, NSF Grants IIS-0308001, IIS-0513678, BDI-0515813 and National
Science Foundation of China (NSFC) grants No. 60303008 and 69933010. All opinions, findings, conclusions and recommendations
in this paper are those of the authors and do not necessarily reflect the views of the funding agencies. 相似文献
5.
Marlies Vanhulsel Davy Janssens Geert Wets Koen Vanhoof 《Expert systems with applications》2009,36(4):8032-8039
The present study aims at contributing to the current state-of-the art of activity-based travel demand modelling by presenting a framework to simulate sequential data. To this end, the suitability of a reinforcement learning approach to reproduce sequential data is explored. Additionally, as traditional reinforcement learning techniques are not capable of learning efficiently in large state and action spaces with respect to memory and computational time requirements on the one hand, and of generalizing based on infrequent visits of all state-action pairs on the other hand, the reinforcement learning technique as used in most applications, is enhanced by means of regression tree function approximation.Three reinforcement learning algorithms are implemented to validate their applicability: the traditional Q-learning and Q-learning with bucket-brigade updating are tested against the improved reinforcement learning approach with a CART function approximator. These methods are applied on data of 26 diary days. The results are promising and show that the proposed techniques offer great opportunity of simulating sequential data. Moreover, the reinforcement learning approach improved by introducing a regression tree function approximator learns a more optimal solution much faster than the two traditional Q-learning approaches. 相似文献
6.
World Wide Web - The performance of the existing parallel sequential pattern mining algorithms is often unsatisfactory due to high IO overhead and imbalanced load among the computing nodes. To... 相似文献
7.
Liu Zhuang Ma Yunpu Hildebrandt Marcel Ouyang Yuanxin Xiong Zhang 《Knowledge and Information Systems》2022,64(8):2239-2265
Knowledge and Information Systems - Sequential recommendations play a crucial role in many real-world applications. Due to the sequential nature, reinforcement learning has been employed to... 相似文献
8.
This article seeks to integrate two sets of theories describing action selection in the basal ganglia: reinforcement learning theories describing learning which actions to select to maximize reward and decision-making theories proposing that the basal ganglia selects actions on the basis of sensory evidence accumulated in the cortex. In particular, we present a model that integrates the actor-critic model of reinforcement learning and a model assuming that the cortico-basal-ganglia circuit implements a statistically optimal decision-making procedure. The values of cortico-striatal weights required for optimal decision making in our model differ from those provided by standard reinforcement learning models. Nevertheless, we show that an actor-critic model converges to the weights required for optimal decision making when biologically realistic limits on synaptic weights are introduced. We also describe the model's predictions concerning reaction times and neural responses during learning, and we discuss directions required for further integration of reinforcement learning and optimal decision-making theories. 相似文献
9.
10.
Scientific progress in recent years has led to the generation of huge amounts of biological data, most of which remains unanalyzed. Mining the data may provide insights into various realms of biology, such as finding co-occurring biosequences, which are essential for biological data mining and analysis. Data mining techniques like sequential pattern mining may reveal implicitly meaningful patterns among the DNA or protein sequences. If biologists hope to unlock the potential of sequential pattern mining in their field, it is necessary to move away from traditional sequential pattern mining algorithms, because they have difficulty handling a small number of items and long sequences in biological data, such as gene and protein sequences. To address the problem, we propose an approach called Depth-First SPelling (DFSP) algorithm for mining sequential patterns in biological sequences. The algorithm’s processing speed is faster than that of PrefixSpan, its leading competitor, and it is superior to other sequential pattern mining algorithms for biological sequences. 相似文献
11.
Autonomous Robots - This paper presents a learning-based method that uses simulation data to learn an object manipulation task using two model-free reinforcement learning (RL) algorithms. The... 相似文献
12.
We present a neural method that computes the inverse kinematics of any kind of robot manipulators, both redundant and non-redundant. Inverse kinematics solutions are obtained through the inversion of a neural network that has been previously trained to approximate the manipulator forward kinematics. The inversion provides difference vectors in the joint space from difference vectors in the workspace. Our differential inverse kinematics (DIV) approach can be viewed as a neural network implementation of the Jacobian transpose method for arm kinematic control that does not require previous knowledge of the arm forward kinematics. Redundancy can be exploited to obtain a special inverse kinematic solution that meets a particular constraint (e.g. joint limit avoidance) by inverting an additional neural network The usefulness of our DIV approach is further illustrated with sensor-based multilink manipulators that learn collision-free reaching motions in unknown environments. For this task, the neural controller has two modules: a reinforcement-based action generator (AG) and a DIV module that computes goal vectors in the joint space. The actions given by the AG are interpreted with regard to those goal vectors. 相似文献
13.
Since Agrawal and Srikant proposed sequential pattern mining in 1995, there have been many scholars working to improve the efficiency and reduce the processing time of algorithms. This study intends to propose a fuzzy AprioriSome algorithm for fuzzy sequential patterns mining with integration with clustering technique, K-means algorithm. Two experiments performed using transaction data provided by a securities firm and foodmarket data from SQL sever 2000 demonstrate the strength of fuzzy AprioriSome sequential pattern mining in mining large quantity of transaction data. 相似文献
14.
State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots
This paper deals with a new approach based on Q-learning for solving the problem of mobile robot path planning in complex unknown static environments.As a computational approach to learning through interaction with the environment,reinforcement learning algorithms have been widely used for intelligent robot control,especially in the field of autonomous mobile robots.However,the learning process is slow and cumbersome.For practical applications,rapid rates of convergence are required.Aiming at the problem of slow convergence and long learning time for Q-learning based mobile robot path planning,a state-chain sequential feedback Q-learning algorithm is proposed for quickly searching for the optimal path of mobile robots in complex unknown static environments.The state chain is built during the searching process.After one action is chosen and the reward is received,the Q-values of the state-action pairs on the previously built state chain are sequentially updated with one-step Q-learning.With the increasing number of Q-values updated after one action,the number of actual steps for convergence decreases and thus,the learning time decreases,where a step is a state transition.Extensive simulations validate the efficiency of the newly proposed approach for mobile robot path planning in complex environments.The results show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time,compared with the one-step Q-learning algorithm and the Q(λ)-learning algorithm. 相似文献
15.
Data classification is an important topic in the field of data mining due to its wide applications. A number of related methods have been proposed based on the well-known learning models such as decision tree or neural network. Although data classification was widely discussed, relatively few studies explored the topic of temporal data classification. Most of the existing researches focused on improving the accuracy of classification by using statistical models, neural network, or distance-based methods. However, they cannot interpret the results of classification to users. In many research cases, such as gene expression of microarray, users prefer the classification information above a classifier only with a high accuracy. In this paper, we propose a novel pattern-based data mining method, namely classify-by-sequence (CBS), for classifying large temporal datasets. The main methodology behind the CBS is integrating sequential pattern mining with probabilistic induction. The CBS has the merit of simplicity in implementation and its pattern-based architecture can supply clear classification information to users. Through experimental evaluation, the CBS was shown to deliver classification results with high accuracy under two real time series datasets. In addition, we designed a simulator to evaluate the performance of CBS under datasets with different characteristics. The experimental results show that CBS can discover the hidden patterns and classify data effectively by utilizing the mined sequential patterns. 相似文献
16.
An active research topic in data mining is the discovery of sequential patterns, which finds all frequent subsequences in a sequence database. The generalized sequential pattern (GSP) algorithm was proposed to solve the mining of sequential patterns with time constraints, such as time gaps and sliding time windows. Recent studies indicate that the pattern-growth methodology could speed up sequence mining. However, the capabilities to mine sequential patterns with time constraints were previously available only within the Apriori framework. Therefore, we propose the DELISP (delimited sequential pattern) approach to provide the capabilities within the pattern-growth methodology. DELISP features in reducing the size of projected databases by bounded and windowed projection techniques. Bounded projection keeps only time-gap valid subsequences and windowed projection saves nonredundant subsequences satisfying the sliding time-window constraint. Furthermore, the delimited growth technique directly generates constraint-satisfactory patterns and speeds up the pattern growing process. The comprehensive experiments conducted show that DELISP has good scalability and outperforms the well-known GSP algorithm in the discovery of sequential patterns with time constraints. 相似文献
17.
《Behaviour & Information Technology》2012,31(4):281-291
More and more universities and colleges are providing online courses not only for on-campus students but also for off-campus students. Tutors have to consider the differences between on- and off-campus students in order to improve effective instruction. Comparisons are made in this paper between on- and off-campus performances in online learning from four areas: learning time, path of browsing courseware, intercommunication and adaptability towards online learning. The last two areas are emphasized. Multiple approaches were adopted to collect data, which include questionnaires, posted documents, online logs, interviews and observations. This study shows that the rush time of online learning, paths of browsing courseware and favourite intercommunication means of on- and off-campus students are similar. But there are also some differences between these two groups such as competence of self-learning, enthusiasm of interpersonal exchange, dependence on tutors, feeling of learning stress, etc. 相似文献
18.
More and more universities and colleges are providing online courses not only for on-campus students but also for off-campus students. Tutors have to consider the differences between on- and off-campus students in order to improve effective instruction. Comparisons are made in this paper between on- and off-campus performances in online learning from four areas: learning time, path of browsing courseware, intercommunication and adaptability towards online learning. The last two areas are emphasized. Multiple approaches were adopted to collect data, which include questionnaires, posted documents, online logs, interviews and observations. This study shows that the rush time of online learning, paths of browsing courseware and favourite intercommunication means of on- and off-campus students are similar. But there are also some differences between these two groups such as competence of self-learning, enthusiasm of interpersonal exchange, dependence on tutors, feeling of learning stress, etc. 相似文献
19.
Dulac-Arnold Gabriel Levine Nir Mankowitz Daniel J. Li Jerry Paduraru Cosmin Gowal Sven Hester Todd 《Machine Learning》2021,110(9):2419-2468
Machine Learning - Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research... 相似文献