Big Data Framework for Zero-Day Malware Detection |
| |
Authors: | Deepak Gupta Rinkle Rani |
| |
Affiliation: | Department of Computer Science and Engineering, Thapar University, Patiala, Punjab, India |
| |
Abstract: | Malware has already been recognized as one of the most dominant cyber threats on the Internet today. It is growing exponentially in terms of volume, variety, and velocity, and thus overwhelms the traditional approaches used for malware detection and classification. Moreover, with the advent of Internet of Things, there is a huge growth in the volume of digital devices and in such scenario, malicious binaries are bound to grow even faster making it a big data problem. To analyze and detect unknown malware on a large scale, security analysts need to make use of machine learning algorithms along with big data technologies. These technologies help them to deal with current threat landscape consisting of complex and large flux of malicious binaries. This paper proposes the design of a scalable architecture built on the top of Apache Spark which uses its scalable machine learning library (MLlib) for detecting zero-day malware. The proposed platform is tested and evaluated on a dataset comprising of 0.2 million files consisting of 0.05 million clean files and 0.15 million malicious binaries covering a large number of malware families over a period of 7 years starting from 2010. |
| |
Keywords: | Apache Spark big data machine learning malware detection MLlib |
|
|