首页 | 本学科首页   官方微博 | 高级检索  
     

大规模移动应用第三方库自动检测和分类方法
引用本文:王浩宇,郭耀,马子昂,陈向群.大规模移动应用第三方库自动检测和分类方法[J].软件学报,2017,28(6):1373-1388.
作者姓名:王浩宇  郭耀  马子昂  陈向群
作者单位:智能通信软件与多媒体北京市重点实验室, 北京邮电大学计算机学院, 北京 100876,高可信软件技术教育部重点实验室, 北京大学信息科学技术学院软件所, 北京 100871,高可信软件技术教育部重点实验室, 北京大学信息科学技术学院软件所, 北京 100871,高可信软件技术教育部重点实验室, 北京大学信息科学技术学院软件所, 北京 100871
基金项目:国家自然科学基金创新研究群体基金资助项目(61421061,61421091),国家高技术研究发展计划(863计划)(2015AA017202)
摘    要:移动应用中广泛使用第三方库来帮助开发和增强应用功能.很多关于移动应用分析以及访问控制的研究工作,需要在分析之前对第三方库进行检测、过滤或者对其进行功能分类.当前大部分研究工作都使用白名单的方式来检测第三方库或者对其功能进行分类.然而,通过白名单检测第三方库不完善且不准确,其原因包括:(1)第三方库的种类和数量很大,(2)常见的代码混淆或者第三方库伪装等技术使得白名单方法不能准确的识别第三方库.本文提出一种第三方库自动检测和分类方法,包括基于多级聚类技术准确识别第三方库,以及基于机器学习对第三方库的功能进行准确分类.实验对超过130,000个Android应用进行分析,验证所提出方法的有效性.实验总共检测到4,916个不同的第三方库.在人工标记的数据集上,通过十折交叉验证,对第三方库分类的准确率达到84.28%.将训练好的分类器应用于全部4,916个检测到的第三方库,人工进行抽样验证的准确率达到75%.

关 键 词:Android  第三方库  广告库  移动应用  机器学习
收稿时间:2016/5/8 0:00:00
修稿时间:2016/7/15 0:00:00

Automated Detection and Classification of Third-Party Libraries in Large Scale Android Apps
WANG Hao-Yu,GUO Yao,MA Zi-Ang and CHEN Xiang-Qun.Automated Detection and Classification of Third-Party Libraries in Large Scale Android Apps[J].Journal of Software,2017,28(6):1373-1388.
Authors:WANG Hao-Yu  GUO Yao  MA Zi-Ang and CHEN Xiang-Qun
Affiliation:Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, School of Computer Science, Beijing;University of Posts and Telecommunications, Beijing 100876, China,Key Laboratory of High-Confidence Software Technologies(Ministry of Education), School of Electronics Engineering and Computer;Science, Peking University, Beijing 100871, China,Key Laboratory of High-Confidence Software Technologies(Ministry of Education), School of Electronics Engineering and Computer;Science, Peking University, Beijing 100871, China and Key Laboratory of High-Confidence Software Technologies(Ministry of Education), School of Electronics Engineering and Computer;Science, Peking University, Beijing 100871, China
Abstract:Third-party libraries are widely used in mobile applications such as Android apps. Much research on app analysis or access control needs to detect or classify third-party librariesfirst in order to provide accurate results.Most previous studies use a whitelist to identify third-partylibraries and manually categorize them. However, itis impossible to build a complete whitelist of third-party libraries and classify thembecause:(1) there are too many of them, and (2) widelyused techniques such as library obfuscation and librarymasquerading cannot be handled with a whitelist.In this paper, we propose an automated approach to detectand classify frequently-used third-party libraries in Android apps. We propose a multi-level clustering based methodto identify third-party libraries, and a machine learning basedtechnique to classify them. Experiments on more than 130,000apps show that we could detect 4,916 third-party librarieswithout prior knowledge. The classification resultof 10-folds cross validation on sampled libraries is 84.28%.With the trained classifier, we areable to classify more than 75% of the 4,916 libraries into six categories with an accuracy of 75%.
Keywords:Android  third-party library  advertisement library  mobileapps  machine learning
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号