首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A large number of today’s botnets leverage the HTTP protocol to communicate with their botmasters or perpetrate malicious activities. In this paper, we present a new scalable system for network-level behavioral clustering of HTTP-based malware that aims to efficiently group newly collected malware samples into malware family clusters. The end goal is to obtain malware clusters that can aid the automatic generation of high quality network signatures, which can in turn be used to detect botnet command-and-control (C&C) and other malware-generated communications at the network perimeter.We achieve scalability in our clustering system by simplifying the multi-step clustering process proposed in [31], and by leveraging incremental clustering algorithms that run efficiently on very large datasets. At the same time, we show that scalability is achieved while retaining a good trade-off between detection rate and false positives for the signatures derived from the obtained malware clusters. We implemented a proof-of-concept version of our new scalable malware clustering system and performed experiments with about 65,000 distinct malware samples. Results from our evaluation confirm the effectiveness of the proposed system and show that, compared to [31], our approach can reduce processing times from several hours to a few minutes, and scales well to large datasets containing tens of thousands of distinct malware samples.  相似文献   

2.
Dynamic behavior-based malware analysis and detection is considered to be one of the most promising ways to combat with the obfuscated and unknown malwares. To perform such analysis, behavioral feature abstraction plays a fundamental role, because how to specify program formally to a large extend determines what kind of algorithm can be used. In existing research, graph-based methods keep a dominant position in specifying malware behaviors. However, they restrict the detection algorithm to be chosen from graph mining algorithm. In this paper, we build a complete virtual environment to capture malware behaviors, especially that to stimulate network behaviors of a malware. Then, we study the problem of abstracting constant behavioral features from API call sequences and propose a minimal security-relevant behavior abstraction way, which absorbs the advantages of prevalent graph-based methods in behavior representation and has the following advantages: first API calls are aggregated by data dependence, therefore it is resistent to redundant data and is a kind of more constant feature. Second, API call arguments are also abstracted particularly, this further contributes to common and constant behavioral features of malware variants. Third, it is a moderate degree aggregation of a small group of API calls with a constructing criterion that centering on an independent operation on a sensitive resource. Fourth, it is very easy to embed the extracted behaviors in a high dimensional vector space, so that it can be processed by almost all of the prevalent statistical learning algorithms. We then evaluate these minimal security-relevant behaviors in three kinds of test, including similarity comparison, clustering and classification. The experimental results show that our method has a capacity in distinguishing malwares from different families and also from benign programs, and it is useful for many statistical learning algorithms.  相似文献   

3.
The advent of microarray technology enables us to monitor an entire genome in a single chip using a systematic approach. Clustering, as a widely used data mining approach, has been used to discover phenotypes from the raw expression data. However traditional clustering algorithms have limitations since they can not identify the substructures of samples and features hidden behind the data. Different from clustering, biclustering is a new methodology for discovering genes that are highly related to a subset of samples. Several biclustering models/methods have been presented and used for tumor clinical diagnosis and pathological research. In this paper, we present a new biclustering model using Binary Matrix Factorization (BMF). BMF is a new variant rooted from non-negative matrix factorization (NMF). We begin by proving a new boundedness property of NMF. Two different algorithms to implement the model and their comparison are then presented. We show that the microarray data biclustering problem can be formulated as a BMF problem and can be solved effectively using our proposed algorithms. Unlike the greedy strategy-based algorithms, our proposed algorithms for BMF are more likely to find the global optima. Experimental results on synthetic and real datasets demonstrate the advantages of BMF over existing biclustering methods. Besides the attractive clustering performance, BMF can generate sparse results (i.e., the number of genes/features involved in each biclustering structure is very small related to the total number of genes/features) that are in accordance with the common practice in molecular biology.  相似文献   

4.
Malware classification based on call graph clustering   总被引:1,自引:0,他引:1  
Each day, anti-virus companies receive tens of thousands samples of potentially harmful executables. Many of the malicious samples are variations of previously encountered malware, created by their authors to evade pattern-based detection. Dealing with these large amounts of data requires robust, automatic detection approaches. This paper studies malware classification based on call graph clustering. By representing malware samples as call graphs, it is possible to abstract certain variations away, enabling the detection of structural similarities between samples. The ability to cluster similar samples together will make more generic detection techniques possible, thereby targeting the commonalities of the samples within a cluster. To compare call graphs mutually, we compute pairwise graph similarity scores via graph matchings which approximately minimize the graph edit distance. Next, to facilitate the discovery of similar malware samples, we employ several clustering algorithms, including k-medoids and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Clustering experiments are conducted on a collection of real malware samples, and the results are evaluated against manual classifications provided by human malware analysts. Experiments show that it is indeed possible to accurately detect malware families via call graph clustering. We anticipate that in the future, call graphs can be used to analyse the emergence of new malware families, and ultimately to automate implementation of generic detection schemes.  相似文献   

5.
A metamorphic virus is a type of malware that modifies its code using a morphing engine. Morphing engines are used to generate a large number of metamorphic malware variants by performing different obfuscation techniques. Since each metamorphic malware has its own unique structure, signature based anti-virus programs are ineffective to detect these metamorphic variants. Therefore, detection of these kind of viruses becomes an increasingly important task. Recently, many researchers have focused on extracting common patterns of metamorphic variants that can be used as micro-signatures to identify the metamorphic malware executables. With the similar motivation, in this work, we propose a novel metamorphic malware identification method, named HLES-MMI (Higher-level Engine Signature based Metamorphic Malware Identification). The proposed method firstly constructs a unique graph structure, called as co-opcode graph, for each metamorphic family, then extracts engine-specific opcode patterns from the graphs. Finally, it generates higher-level signature belonging to each family by representing the extracted opcode-patterns with a binary vector. Experimental results on four datasets produced by different morphing engines demonstrate the effectiveness and efficiency of the proposed method by comparing with several existing malware identification methods.  相似文献   

6.
Cybersecurity has become a major concern for society, mainly motivated by the increasing number of cyber attacks and the wide range of targeted objectives. Due to the popularity of smartphones and tablets, Android devices are considered an entry point in many attack vectors. Malware applications are among the most used tactics and tools to perpetrate a cyber attack, so it is critical to study new ways of detecting them. In these detection mechanisms, machine learning has been used to build classifiers that are effective in discerning if an application is malware or benignware. However, training such classifiers require big amounts of labelled data which, in this context, consist of categorised malware and benignware Android applications represented by a set of features able to describe their behaviour. For that purpose, in this paper we present OmniDroid, a large and comprehensive dataset of features extracted from 22,000 real malware and goodware samples, aiming to help anti-malware tools creators and researchers when improving, or developing, new mechanisms and tools for Android malware detection. Furthermore, the characteristics of the dataset make it suitable to be used as a benchmark dataset to test classification and clustering algorithms or new representation techniques, among others. The dataset has been released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License and was built using AndroPyTool, our automated framework for dynamic and static analysis of Android applications. Finally, we test a set of ensemble classifiers over this dataset and propose a malware detection approach based on the fusion of static and dynamic features through the combination of ensemble classifiers. The experimental results show the feasibility and potential usability (for the machine learning, soft computing and cyber security communities) of our automated framework and the publicly available dataset.  相似文献   

7.
This article presents Andromaly—a framework for detecting malware on Android mobile devices. The proposed framework realizes a Host-based Malware Detection System that continuously monitors various features and events obtained from the mobile device and then applies Machine Learning anomaly detectors to classify the collected data as normal (benign) or abnormal (malicious). Since no malicious applications are yet available for Android, we developed four malicious applications, and evaluated Andromaly’s ability to detect new malware based on samples of known malware. We evaluated several combinations of anomaly detection algorithms, feature selection method and the number of top features in order to find the combination that yields the best performance in detecting new malware on Android. Empirical results suggest that the proposed framework is effective in detecting malware on mobile devices in general and on Android in particular.  相似文献   

8.
Research and development efforts have recently started to compare malware variants, as it is believed that malware authors are reusing code. A number of these projects have focused on identifying functions through the use of signature-based classifiers. We introduce three new classifiers that characterize a function’s use of global data. Experiments on malware show that we can meaningfully correlate functions on the basis of their global data references even when their functions share little code. We also present an algorithm that combines existing classifiers and our new ones into an ensemble for correlating functions in two binary programs. For testing, we developed a model for comparing our work to previous signature based classifiers. We then used that model to show how our new combined ensemble classifier dominates the previously reported classifiers. The resulting ensemble can be used by malware analysts when they are comparing two binaries. This technique will allow them to correlate both functions and global data references between the two and will lead to a quick identification of any sharing that is occurring.  相似文献   

9.
恶意代码分类是一种基于特征进行恶意代码自动家族类别划分的分析方法。恶意代码的多维度特征融合与深度处理,是恶意代码分类研究的一种发展趋势,也是恶意代码分类研究的一个难点问题。本文提出了一种适用于恶意代码分类的高维特征融合方法,对恶意代码的静态二进制文件和反汇编特征等进行提取,借鉴SimHash的局部敏感性思想,对多维特征进行融合分析和处理,最后基于典型的机器学习方法对融合后的特征向量进行学习训练。实验结果和分析表明,该方法能够适应于样本特征维度高而样本数量较少的恶意代码分类场景,而且能够提升分类学习的时间性能。  相似文献   

10.
谢娟英  丁丽娟  王明钊 《软件学报》2020,31(4):1009-1024
基因表达数据具有高维小样本特点,包含了大量与疾病无关的基因,对该类数据进行分析的首要步骤是特征选择.常见的特征选择方法需要有类标的数据,但样本类标获取往往比较困难.针对基因表达数据的特征选择问题,提出基于谱聚类的无监督特征选择思想FSSC(feature selection by spectral clustering).FSSC对所有特征进行谱聚类,将相似性较高的特征聚成一类,定义特征的区分度与特征独立性,以二者之积度量特征重要性,从各特征簇选取代表性特征,构造特征子集.根据使用的不同谱聚类算法,得到FSSC-SD(FSSC based on standard deviation) FSSCMD(FSSC based on mean distance)和FSSC-ST(FSSC based on self-tuning)这3种无监督特征选择算法.以SVMs(support vector machines)和KNN(K-nearest neighbours)为分类器,在10个基因表达数据集上进行实验测试.结果表明,FSSC-SD、FSSC-MD和FSSC-ST算法均能选择到具有强分类能力的特征子集.  相似文献   

11.
The sheer volume of new malware samples presents some big data challenges for antivirus vendors. Not only does the metadata for tens (or even hundreds) of millions of samples need to be stored, but all this data also needs to be clustered - mined to find groups of related samples. Existing techniques cannot easily scale to the magnitudes of samples already arriving today, yet alone those that we expect to receive in the future. This paper proposes the use of a data structure called an aggregation overlay graph to simplify these problems. By exploiting the similarities shared between most malware variants, we can reduce the total volume of metadata by more than an entire magnitude without any loss of information. Furthermore, by including a wide variety of features from each sample, this process of reduction also creates groups of similar samples, a clustering technique that is capable of handling extremely high volumes. The versatility of this approach is demonstrated by applying it not only to large corpuses of Windows PE metadata, but also for Android APK files.  相似文献   

12.
基于环境敏感分析的恶意代码脱壳方法   总被引:1,自引:0,他引:1  
王志  贾春福  鲁凯 《计算机学报》2012,35(4):693-702
加壳技术是软件的常用保护手段,但也常被恶意代码用于躲避杀毒软件的检测.通用脱壳工具根据加壳恶意代码运行时的行为特征或统计特征进行脱壳,需要建立监控环境,因此易受环境敏感技术的干扰.文中提出了一种基于环境敏感分析的恶意代码脱壳方法,利用动静结合的分析技术检测并清除恶意代码的环境敏感性.首先,利用中间语言对恶意代码的执行轨迹进行形式化表示;然后,分析执行轨迹中环境敏感数据的来源和传播过程,提取脱壳行为的环境约束;最后,求解环境约束条件,根据求解结果对恶意代码进行二进制代码插装,清除其环境敏感性.基于此方法,作者实现了一个通用的恶意代码脱壳工具:MalUnpack,并对321个最新的恶意代码样本进行了对比实验.实验结果表明MalUnpack能有效对抗恶意代码的环境敏感技术,其脱壳率达到了89.1%,显著高于现有基于动态监控的通用脱壳工具的35.5%和基于特征的定向脱壳工具的28.0%.  相似文献   

13.
As the number and impact of online threats increases exponentially, the automatic classification of malware becomes increasingly important in the antivirus business. The heavy use of machine learning in this field raises the following question: ??How much will a trained machine-learning model resist in time against the ever-changing malware binary code??? In this paper we present a study of proactivity in malware detection using Perceptron derived algorithms. We gathered an industrial quantity of both malicious and benign files, then we trained a series of classifiers on nine months?? worth of data and discuss the behavior of the obtained models tested on the next fourteen weeks. We conclude with the result analysis and recommendations for the practical use of this technique in real life scenarios.  相似文献   

14.
Detection of malware using data mining techniques has been explored extensively. Techniques used for detecting malware based on structural features rely on being able to identify anomalies in the structure of executable files. The structural attributes of an executable that can be extracted include byte ngrams, Portable Executable (PE) features, API call sequences and Strings. After a thorough analysis we have extracted various features from executable files and applied it on an ensemble of classifiers to efficiently detect malware. Ensemble methods combine several individual pattern classifiers in order to achieve better classification. The challenge is to choose the minimal number of classifiers that achieve the best performance. An ensemble that contains too many members might incur large storage requirements and even reduce the classification performance. Hence the goal of ensemble pruning is to identify a subset of ensemble members that performs at least as good as the original ensemble and discard any other members.  相似文献   

15.
An abstraction resilient to common malware obfuscation techniques is the call-graph. A call-graph is the representation of an executable file as a directed graph with labeled vertices, where the vertices correspond to functions and the edges to function calls. Unfortunately, most of the interesting graph comparison problems, including full-graph comparison and computing the largest common subgraph, belong to the \(NP\) -hard class. This makes the study and use of graphs in large scale systems difficult. Existing work has focused only on offline clustering and has not addressed the issue of clustering streams of graphs. In this paper we present Classy, a scalable distributed system that clusters streams of large call-graphs for purposes including automated malware classification and facilitating malware analysts. Since algorithms aimed at clustering sets are not suitable for clustering streams of objects, we propose the use of a clustering algorithm that relies on the notion of candidate clusters and reference samples therein. We demonstrate via thorough experimentation that this approach yields results very close to the offline optimal. Graph similarity is determined by computing a graph edit distance (GED) of pairs of graphs using an adapted version of simulated annealing. Furthermore, we present a novel lower bound for the GED. We also study the problem of approximating statistics of clusters of graphs when the distances of only a fraction of all possible pairs have been computed. Finally, we present results and statistics from a real production-side system that has clustered and contains more than 0.8 million graphs.  相似文献   

16.
随着移动终端恶意软件的种类和数量不断增大,本文针对Android系统恶意软件单特征检测不全面、误报率高等技术问题,提出一种基于动静混合特征的移动终端恶意软件检测方法,以提高检测的覆盖率、准确率和效率.该方法首先采用基于改进的CHI方法和凝聚层次聚类算法优化的K-Means方法构建高危权限和敏感API库,然后分别从静态分...  相似文献   

17.
由于变种和多态技术的出现,恶意代码的数量呈爆发式增长。然而涌现的恶意代码只有小部分是新型的,大部分仍是已知病毒的变种。针对这种情况,为了从海量样本中筛选出已知病毒的变种,从而聚焦新型未知病毒,提出一种改进的判定恶意代码所属家族的方法。从恶意代码的行为特征入手,使用反汇编工具提取样本静态特征,通过单类支持向量机筛选出恶意代码的代表性函数,引入聚类算法的思想,生成病毒家族特征库。通过计算恶意代码与特征库之间的相似度,完成恶意代码的家族判定。设计并实现了系统,实验结果表明改进后的方法能够有效地对各类家族的变种进行分析及判定。  相似文献   

18.
目前移动恶意软件数量呈爆炸式增长,变种层出不穷,日益庞大的特征库增加了安全厂商在恶意软件样本处理方面的难度,传统的检测方式已经不能及时有效地处理软件行为样本大数据。基于机器学习的移动恶意软件检测方法存在特征数量多、检测准确率低和不平衡数据的问题。针对现存的问题,文章提出了基于均值和方差的特征选择方法,以减少对分类无效的特征;实现了基于不同特征提取技术的集合分类方法,包括主成分分析、Kaehunen-Loeve 变换和独立成分分析,以提高检测的准确性。针对软件样本的不平衡数据,文章提出了基于决策树的多级分类集成模型。实验结果表明,文章提出的三种检测方法都可以有效地检测 Android平台中的恶意软件样本,准确率分别提高了6.41%、3.96%和3.36%。  相似文献   

19.
One of the major problems concerning information assurance is malicious code. To evade detection, malware has also been encrypted or obfuscated to produce variants that continue to plague properly defended and patched networks with zero day exploits. With malware and malware authors using obfuscation techniques to generate automated polymorphic and metamorphic versions, anti-virus software must always keep up with their samples and create a signature that can recognize the new variants. Creating a signature for each variant in a timely fashion is a problem that anti-virus companies face all the time. In this paper we present detection algorithms that can help the anti-virus community to ensure a variant of a known malware can still be detected without the need of creating a signature; a similarity analysis (based on specific quantitative measures) is performed to produce a matrix of similarity scores that can be utilized to determine the likelihood that a piece of code under inspection contains a particular malware. Two general malware detection methods presented in this paper are: Static Analyzer for Vicious Executables (SAVE) and Malware Examiner using Disassembled Code (MEDiC). MEDiC uses assembly calls for analysis and SAVE uses API calls (Static API call sequence and Static API call set) for analysis. We show where Assembly can be superior to API calls in that it allows a more detailed comparison of executables. API calls, on the other hand, can be superior to Assembly for its speed and its smaller signature. Our two proposed techniques are implemented in SAVE) and MEDiC. We present experimental results that indicate that both of our proposed techniques can provide a better detection performance against obfuscated malware. We also found a few false positives, such as those programs that use network functions (e.g. PuTTY) and encrypted programs (no API calls or assembly functions are found in the source code) when the thresholds are set 50% similarity measure. However, these false positives can be minimized, for example by changing the threshold value to 70% that determines whether a program falls in the malicious category or not.  相似文献   

20.
Next-generation malware will adopt self-mutation to circumvent current malware detection techniques. The authors propose a strategy based on code normalization that reduces different instances of the same malware into a common form that can enable accurate detection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号