首页 | 本学科首页   官方微博 | 高级检索  
     


Malware Detection Using Nonparametric Bayesian Clustering and Classification Techniques
Authors:Yimin Kao  Brian Reich  Curtis Storlie  Blake Anderson
Affiliation:1. North Carolina State University, Raleigh, NC (kiddnumber5ykao@gmail.com);2. North Carolina State University 2311 Stinson Drive 4264 SAS Hall, Box 8203, Raleigh, 27695, NC (brian_reich@ncsu.edu);3. Los Alamos National Laboratory, Los Alamos, NM (storlie@lanl.gov);4. Los Alamos National Laboratory, Los Alamos, NM (banderson@lanl.gov)
Abstract:Computer security requires statistical methods to quickly and accurately flag malicious programs. This article proposes a nonparametric Bayesian approach for classifying programs as benign or malicious and simultaneously clustering malicious programs. The analysis is based on the dynamic trace (DT) of instructions under the first-order Markov assumption. Each row of the trace’s transition matrix is modeled using the Dirichlet process mixture (DPM) model. The DPM model clusters programs within each class (malicious or benign), and produces the posterior probability of being a malware which is used for classification. The novelty of the model is using this clustering algorithm to improve the classification accuracy. The simulation study shows that the DPM model outperforms the elastic net logistic (ENL) regression and the support vector machine (SVM) in classification performance under most of the scenarios, and also outperforms the spectral clustering method for grouping similar malware. In an analysis of real malicious and benign programs, the DPM model gives significantly better classification performance than the ENL model, and competitive results to the SVM. More importantly, the DPM model identifies clusters of programs during the classification procedure which is useful for reverse engineering.
Keywords:Classification  Clustering  Dirichlet process mixture  Dynamic trace  
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号