首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
An abstraction resilient to common malware obfuscation techniques is the call-graph. A call-graph is the representation of an executable file as a directed graph with labeled vertices, where the vertices correspond to functions and the edges to function calls. Unfortunately, most of the interesting graph comparison problems, including full-graph comparison and computing the largest common subgraph, belong to the \(NP\) -hard class. This makes the study and use of graphs in large scale systems difficult. Existing work has focused only on offline clustering and has not addressed the issue of clustering streams of graphs. In this paper we present Classy, a scalable distributed system that clusters streams of large call-graphs for purposes including automated malware classification and facilitating malware analysts. Since algorithms aimed at clustering sets are not suitable for clustering streams of objects, we propose the use of a clustering algorithm that relies on the notion of candidate clusters and reference samples therein. We demonstrate via thorough experimentation that this approach yields results very close to the offline optimal. Graph similarity is determined by computing a graph edit distance (GED) of pairs of graphs using an adapted version of simulated annealing. Furthermore, we present a novel lower bound for the GED. We also study the problem of approximating statistics of clusters of graphs when the distances of only a fraction of all possible pairs have been computed. Finally, we present results and statistics from a real production-side system that has clustered and contains more than 0.8 million graphs.  相似文献   

2.

A lot of malicious applications appears every day, threatening numerous users. Therefore, a surge of studies have been conducted to protect users from newly emerging malware by using machine learning algorithms. Albeit existing machine or deep learning-based Android malware detection approaches achieve high accuracy by using a combination of multiple features, it is not possible to employ them on our mobile devices due to the high cost for using them. In this paper, we propose MAPAS, a malware detection system, that achieves high accuracy and adaptable usages of computing resources. MAPAS analyzes behaviors of malicious applications based on API call graphs of them by using convolution neural networks (CNN). However, MAPAS does not use a classifier model generated by CNN, it only utilizes CNN for discovering common features of API call graphs of malware. For efficiently detecting malware, MAPAS employs a lightweight classifier that calculates a similarity between API call graphs used for malicious activities and API call graphs of applications that are going to be classified. To demonstrate the effectiveness and efficiency of MAPAS, we implement a prototype and thoroughly evaluate it. And, we compare MAPAS with a state-of-the-art Android malware detection approach, MaMaDroid. Our evaluation results demonstrate that MAPAS can classify applications 145.8% faster and uses memory around ten times lower than MaMaDroid. Also, MAPAS achieves higher accuracy (91.27%) than MaMaDroid (84.99%) for detecting unknown malware. In addition, MAPAS can generally detect any type of malware with high accuracy.

  相似文献   

3.
Graph-based malware detection using dynamic analysis   总被引:1,自引:0,他引:1  
We introduce a novel malware detection algorithm based on the analysis of graphs constructed from dynamically collected instruction traces of the target executable. These graphs represent Markov chains, where the vertices are the instructions and the transition probabilities are estimated by the data contained in the trace. We use a combination of graph kernels to create a similarity matrix between the instruction trace graphs. The resulting graph kernel measures similarity between graphs on both local and global levels. Finally, the similarity matrix is sent to a support vector machine to perform classification. Our method is particularly appealing because we do not base our classifications on the raw n-gram data, but rather use our data representation to perform classification in graph space. We demonstrate the performance of our algorithm on two classification problems: benign software versus malware, and the Netbull virus with different packers versus other classes of viruses. Our results show a statistically significant improvement over signature-based and other machine learning-based detection methods.  相似文献   

4.
Large graphs are scale free and ubiquitous having irregular relationships. Clustering is used to find existent similar patterns in graphs and thus help in getting useful insights. In real-world, nodes may belong to more than one cluster thus, it is essential to analyze fuzzy cluster membership of nodes. Traditional centralized fuzzy clustering algorithms incur high communication cost and produce poor quality of clusters when used for large graphs. Thus, scalable solutions are obligatory to handle huge amount of data in less computational time with minimum disk access. In this paper, we proposed a parallel fuzzy clustering algorithm named ‘PGFC’ for handling scalable graph data. It will be advantageous from the viewpoint of expert systems to develop a clustering algorithm that can assure scalability along with better quality of clusters for handling large graphs.The algorithm is parallelized using bulk synchronous parallel (BSP) based Pregel model. The cluster centers are initialized using degree centrality measure, resulting in lesser number of iterations. The performance of PGFC is compared with other state of art clustering algorithms using synthetic graphs and real world networks. The experimental results reveal that the proposed PGFC scales up linearly to handle large graphs and produces better quality of clusters when compared to other graph clustering counterparts.  相似文献   

5.
Clustering is an important problem in malware research, as the number of malicious samples that appear every day makes manual analysis impractical. Although these samples belong to a limited number of malware families, it is difficult to categorize them automatically as obfuscation is involved. By extracting relevant features we can apply clustering algorithms, then only analyze a couple of representatives from each cluster. However, classic clustering algorithms that compute the similarity between each pair of samples are slow when a large collection is involved. In this paper, the features will be strings of operation codes extracted from the binary code of each sample. With a modified suffix tree data structure we can find long enough substrings that correspond to portions of a program’s code. These substrings must be filtered against a database of known substrings so that common library code will be ignored. The items that have common substrings above a certain threshold will be grouped into the same cluster. Our algorithm was tested with data extracted from real-world malware and constructed quality clusters.  相似文献   

6.
We explore how formal methods and tools of the verification trade could be used for malware detection and analysis. In particular, we propose a new approach to learning and generalizing from observed malware behaviors based on tree automata inference. Our approach infers k-testable tree automata from system call dataflow dependency graphs. We show how inferred automata can be used for malware recognition and classification.  相似文献   

7.
针对Android恶意软件持续大幅增加的现状以及恶意软件检测能力不足这一问题,提出了一种基于非用户操作序列的静态检测方法。首先,通过对恶意软件进行逆向工程分析,提取出恶意软件的应用程序编程接口(API)调用信息;然后,采用广度优先遍历算法构建恶意软件的函数调用流程图;进而,从函数流程图中提取出其中的非用户操作序列形成恶意行为库;最后,采用编辑距离算法计算待检测样本与恶意行为库中的非用户操作序列的相似度进行恶意软件识别。在对360个恶意样本和300的正常样本进行的检测中,所提方法可达到90.8%的召回率和90.3%的正确率。与Android恶意软件检测系统Androguard相比,所提方法在恶意样本检测中召回率提高了30个百分点;与FlowDroid方法相比,所提方法在正常样本检测中准确率提高了11个百分点,在恶意样本检测中召回率提高了4.4个百分点。实验结果表明,所提方法提高了恶意软件检测的召回率,有效提升恶意软件的检测效果。  相似文献   

8.
二进制代码比较技术在病毒变种分析,安全补丁分析,版本信息导出等许多领域都有着广泛的应用。在定义了基于图的二进制代码描述方法的基础上,从函数和基本块两个层次对近似的二进制代码进行比较,分析出它们之间相同的部分和差异信息。讨论了基于图的二进制文件特征的选取,利用特征比较和固定点传播算法,建立两份代码在函数和基本块两个级别的对应关系。本文给出了这种基于特征提取的二进制代码比较技术的实现框架,并列举了它在恶意软件变种分析,公开漏洞定位方面的利用实例。  相似文献   

9.
恶意代码的相似性分析是当前恶意代码自动分析的重要部分。提出了一种基于函数调用图的恶意代码相似性分析方法,通过函数调用图的相似性距离SDMFG来度量两个恶意代码函数调用图的相似性,进而分析得到恶意代码的相似性,提高了恶意代码相似性分析的准确性,为恶意代码的同源及演化特性分析研究与恶意代码的检测和防范提供了有力支持。  相似文献   

10.
Previous work has shown that cluster analysis can be used to effectively classify malware into meaningful families. In this research, we apply cluster analysis to the challenging problem of classifying previously unknown malware. We perform several experiments involving malware clustering. We compare our clustering results to those obtained when a support vector machine (SVM) is trained on the malware family. Using clustering, we are able to classify malware with an accuracy comparable to that of an SVM. An advantage of the clustering approach is that a new malware family can be classified before a model has been trained specifically for the family.  相似文献   

11.
Most of malware detectors are based on syntactic signatures that identify known malicious programs. Up to now this architecture has been sufficiently efficient to overcome most of malware attacks. Nevertheless, the complexity of malicious codes still increase. As a result the time required to reverse engineer malicious programs and to forge new signatures is increasingly longer. This study proposes an efficient construction of a morphological malware detector, that is a detector which associates syntactic and semantic analysis. It aims at facilitating the task of malware analysts providing some abstraction on the signature representation which is based on control flow graphs. We build an efficient signature matching engine over tree automata techniques. Moreover we describe a generic graph rewriting engine in order to deal with classic mutations techniques. Finally, we provide a preliminary evaluation of the strategy detection carrying out experiments on a malware collection.  相似文献   

12.
We say a vertex v in a graph G covers a vertex w if v=w or if v and w are adjacent. A subset of vertices of G is a dominating set if it collectively covers all vertices in the graph. The dominating set problem, which is NP-hard, consists of finding a smallest possible dominating set for a graph. The straightforward greedy strategy for finding a small dominating set in a graph consists of successively choosing vertices which cover the largest possible number of previously uncovered vertices. Several variations on this greedy heuristic are described and the results of extensive testing of these variations is presented. A more sophisticated procedure for choosing vertices, which takes into account the number of ways in which an uncovered vertex may be covered, appears to be the most successful of the algorithms which are analyzed. For our experimental testing, we used both random graphs and graphs constructed by test case generators which produce graphs with a given density and a specified size for the smallest dominating set. We found that these generators were able to produce challenging graphs for the algorithms, thus helping to discriminate among them, and allowing a greater variety of graphs to be used in the experiments. Received October 27, 1998; revised March 25, 2001.  相似文献   

13.
The explosive growth of malware variants poses a major threat to information security. Traditional anti-virus systems based on signatures fail to classify unknown malware into their corresponding families and to detect new kinds of malware programs. Therefore, we propose a machine learning based malware analysis system, which is composed of three modules: data processing, decision making, and new malware detection. The data processing module deals with gray-scale images, Opcode n-gram, and import functions, which are employed to extract the features of the malware. The decision-making module uses the features to classify the malware and to identify suspicious malware. Finally, the detection module uses the shared nearest neighbor (SNN) clustering algorithm to discover new malware families. Our approach is evaluated on more than 20 000 malware instances, which were collected by Kingsoft, ESET NOD32, and Anubis. The results show that our system can effectively classify the unknown malware with a best accuracy of 98.9%, and successfully detects 86.7% of the new malware.  相似文献   

14.
A metamorphic virus is a type of malware that modifies its code using a morphing engine. Morphing engines are used to generate a large number of metamorphic malware variants by performing different obfuscation techniques. Since each metamorphic malware has its own unique structure, signature based anti-virus programs are ineffective to detect these metamorphic variants. Therefore, detection of these kind of viruses becomes an increasingly important task. Recently, many researchers have focused on extracting common patterns of metamorphic variants that can be used as micro-signatures to identify the metamorphic malware executables. With the similar motivation, in this work, we propose a novel metamorphic malware identification method, named HLES-MMI (Higher-level Engine Signature based Metamorphic Malware Identification). The proposed method firstly constructs a unique graph structure, called as co-opcode graph, for each metamorphic family, then extracts engine-specific opcode patterns from the graphs. Finally, it generates higher-level signature belonging to each family by representing the extracted opcode-patterns with a binary vector. Experimental results on four datasets produced by different morphing engines demonstrate the effectiveness and efficiency of the proposed method by comparing with several existing malware identification methods.  相似文献   

15.
We introduce three new families of stochastic algorithms to generate progressive 2D sample point sequences. This opens a general framework that researchers and practitioners may find useful when developing future sample sequences. Our best sequences have the same low sampling error as the best known sequence (a particular randomization of the Sobol’ (0,2) sequence). The sample points are generated using a simple, diagonally alternating strategy that progressively fills in holes in increasingly fine stratifications. The sequences are progressive (hierarchical): any prefix is well distributed, making them suitable for incremental rendering and adaptive sampling. The first sample family is only jittered in 2D; we call it progressive jittered. It is nearly identical to existing sample sequences. The second family is multi‐jittered: the samples are stratified in both 1D and 2D; we call it progressive multi‐jittered. The third family is stratified in all elementary intervals in base 2, hence we call it progressive multi‐jittered (0,2). We compare sampling error and convergence of our sequences with uniform random, best candidates, randomized quasi‐random sequences (Halton and Sobol'), Ahmed's ART sequences, and Perrier's LDBN sequences. We test the sequences on function integration and in two settings that are typical for computer graphics: pixel sampling and area light sampling. Within this new framework we present variations that generate visually pleasing samples with blue noise spectra, and well‐stratified interleaved multi‐class samples; we also suggest possible future variations.  相似文献   

16.
基于数据流图的恶意软件检测方法通常仅关注API(application programming interface)调用过程中的数据流信息,而忽略API调用顺序信息。为解决此问题,所提方法在传统数据流图的基础上融入API调用的时序信息,提出恶意软件时序对偶数据流图的概念,并给出模型挖掘方法,最后提出一种基于优化的图卷积网络对时序对偶数据流图进行分类、进而用于恶意软件检测与分类的方法。实验结果表明,所提方法的恶意软件识别准确率较传统基于数据流图的恶意软件识别方法有更好的检测效果。  相似文献   

17.
针对恶意代码数量呈爆发式增长,但真正的新型恶意代码却不多,多数是已有代码变种的情况,通过研究恶意代码的行为特征,提出了一套判别恶意代码同源性的方法。从恶意代码的行为特征入手,通过敏感恶意危险行为以及产生危险行为的代码流程、函数调用,应用反汇编工具提取具体特征,计算不同恶意代码之间的相似性度量,进行同源性分析比对,利用DBSCAN聚类算法将具有相同或相似特征的恶意代码汇聚成不同的恶意代码家族。设计并实现了原型系统,实验结果表明提出的方法能够有效地对不同恶意代码及其变种进行同源性分析及判定。  相似文献   

18.

Graphs are commonly used to express the communication of various data. Faced with uncertain data, we have probabilistic graphs. As a fundamental problem of such graphs, clustering has many applications in analyzing uncertain data. In this paper, we propose a novel method based on ensemble clustering for large probabilistic graphs. To generate ensemble clusters, we develop a set of probable possible worlds of the initial probabilistic graph. Then, we present a probabilistic co-association matrix as a consensus function to integrate base clustering results. It relies on co-occurrences of node pairs based on the probability of the corresponding common cluster graphs. Also, we apply two improvements in the steps before and after of ensembles generation. In the before step, we append neighborhood information based on node features to the initial graph to achieve a more accurate estimation of the probability between the nodes. In the after step, we use supervised metric learning-based Mahalanobis distance to automatically learn a metric from ensemble clusters. It aims to gain crucial features of the base clustering results. We evaluate our work using five real-world datasets and three clustering evaluation metrics, namely the Dunn index, Davies–Bouldin index, and Silhouette coefficient. The results show the impressive performance of clustering large probabilistic graphs.

  相似文献   

19.
计算机反病毒厂商每天接收成千上万的病毒样本,如何快速有效地将这些海量样本家族化是一个亟待解决的问题。提出了一种可伸缩性的聚类方法,面对输入海量的病毒样本向量化特征集,使用局部敏感哈希索引技术进行初次快速聚类,使用扩展K均值算法进行二次细致聚类。实验表明该聚类方法在有限牺牲准确度的情况下,大为提高了病毒聚类的时间效率。  相似文献   

20.
Attributed graphs describe nodes via attribute vectors and also relationships between different nodes via edges. To partition nodes into clusters with tighter correlations, an effective way is applying clustering techniques on attributed graphs based on various criteria such as node connectivity and/or attribute similarity. Even though clusters typically form around nodes with tight edges and similar attributes, existing methods have only focused on one of these two data modalities. In this paper, we comprehend each node as an autonomous agent and develop an accurate and scalable multiagent system for extracting overlapping clusters in attributed graphs. First, a kernel function with a tunable bandwidth factor δ is introduced to measure the influence of each agent, and those agents with highest local influence can be viewed as the “leader” agents. Then, a novel local expansion strategy is proposed, which can be applied by each leader agent to absorb the most relevant followers in the graph. Finally, we design the cluster-aware multiagent system (CAMAS), in which agents communicate with each other freely under an efficient communication mechanism. Using the proposed multiagent system, we are able to uncover the optimal overlapping cluster configuration, i.e. nodes within one cluster are not only connected closely with each other but also with similar attributes. Our method is highly efficient, and the computational time is shown that nearly linearly dependent on the number of edges when δ ∈ [0.5, 1). Finally, applications of the proposed method on a variety of synthetic benchmark graphs and real-life attributed graphs are demonstrated to verify the systematic performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号