首页 | 本学科首页   官方微博 | 高级检索  
     


Exploiting unlabeled data to improve peer-to-peer traffic classification using incremental tri-training method
Authors:Bijan Raahemi  Weicai Zhong  Jing Liu
Affiliation:(1) Telfer School of Management, University of Ottawa, 55 Laurier Ave., Ottawa, ON, K1N 6N5, Canada;(2) Institute of Intelligent Information Processing, Xidian University, No.2 South Taibai Road, Xi’an, Shaanxi, 710071, P.R. China
Abstract:Unlabeled training examples are readily available in many applications, but labeled examples are fairly expensive to obtain. For instance, in our previous works on classification of peer-to-peer (P2P) Internet traffics, we observed that only about 25% of examples can be labeled as “P2P”or “NonP2P” using a port-based heuristic rule. We also expect that even fewer examples can be labeled in the future as more and more P2P applications use dynamic ports. This fact motivates us to investigate the techniques which enhance the accuracy of P2P traffic classification by exploiting the unlabeled examples. In addition, the Internet data flows dynamically in large volumes (streaming data). In P2P applications, new communities of peers often join and old communities of peers often leave, requiring the classifiers to be capable of updating the model incrementally, and dealing with concept drift. Based on these requirements, this paper proposes an incremental Tri-Training (iTT) algorithm. We tested our approach on a real data stream with 7.2 Mega labeled examples and 20.4 Mega unlabeled examples. The results show that iTT algorithm can enhance accuracy of P2P traffic classification by exploiting unlabeled examples. In addition, it can effectively deal with dynamic nature of streaming data to detect the changes in communities of peers. We extracted attributes only from the IP layer, eliminating the privacy concern associated with the techniques that use deep packet inspection.
Contact Information Jing LiuEmail:

Bijan Raahemi   is an assistant professor at the Telfer School of Management, University of Ottawa, Canada, with cross-appointment with the School of Information Technology and Engineering. He received his Ph.D. in Electrical and Computer Engineering from the University of Waterloo, Canada, in 1997. Prior to joining the University of Ottawa, Dr. Raahemi held several research positions in Telecommunications industry, including Nortel Networks and Alcatel-Lucent, focusing on Computer Networks Architectures and Services, Dynamics of Internet Traffic, Systems Modeling, and Performance Analysis of Data Networks. His current research interests include Knowledge Discovery and Data Mining, Information Systems, and Data Communications Networks. Dr. Raahemi’s work has appeared in several peer-reviewed journals and conference proceedings. He also holds 10 patents in Data Communications. He is a senior Member of the Institute of Electrical and Electronics Engineering (IEEE), and a member of the Association for Computing Machinery (ACM). MediaObjects/12083_2008_22_Figa_HTML.jpg Weicai Zhong   is a post-doctoral fellow at the Telfer School of Management, University of Ottawa, Canada. He received a B.S. degree in computer science and technology from Xidian University, Xi’an, China, in 2000 and a Ph.D. in pattern recognition and intelligent systems from Xidian University in 2004. Prior to joining the University of Ottawa, Dr. Zhong was a senior statistician in SPSS Inc. from Jan. 2005 to Dec. 2007. His current research interests include Internet Traffic Identification, Data Mining, and Evolutionary Computation. He is a member of the Institute of Electrical and Electronics Engineering (IEEE). MediaObjects/12083_2008_22_Figb_HTML.jpg Jing Liu   is an Associate Professor with Xidian University, China. She received a B.S. degree in computer science and technology from Xidian University, Xi’an, China, in 2000, and a Ph.D. in circuits and systems from Xidian University in 2004. Her research interests include Data Mining, Evolutionary Computation, and Multiagent Systems. She is a member of the Institute of Electrical and Electronics Engineering (IEEE). MediaObjects/12083_2008_22_Figc_HTML.jpg
Keywords:Stream data mining  Concept drift  Windowing technique  Tri-training  Unlabeled data  Peer-to-peer traffic  IP traffic identification
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号