Exploiting unlabeled data to improve peer-to-peer traffic classification using incremental tri-training method |
| |
Authors: | Bijan Raahemi Weicai Zhong Jing Liu |
| |
Affiliation: | (1) Telfer School of Management, University of Ottawa, 55 Laurier Ave., Ottawa, ON, K1N 6N5, Canada;(2) Institute of Intelligent Information Processing, Xidian University, No.2 South Taibai Road, Xi’an, Shaanxi, 710071, P.R. China |
| |
Abstract: | Unlabeled training examples are readily available in many applications, but labeled examples are fairly expensive to obtain.
For instance, in our previous works on classification of peer-to-peer (P2P) Internet traffics, we observed that only about
25% of examples can be labeled as “P2P”or “NonP2P” using a port-based heuristic rule. We also expect that even fewer examples
can be labeled in the future as more and more P2P applications use dynamic ports. This fact motivates us to investigate the
techniques which enhance the accuracy of P2P traffic classification by exploiting the unlabeled examples. In addition, the
Internet data flows dynamically in large volumes (streaming data). In P2P applications, new communities of peers often join
and old communities of peers often leave, requiring the classifiers to be capable of updating the model incrementally, and
dealing with concept drift. Based on these requirements, this paper proposes an incremental Tri-Training (iTT) algorithm.
We tested our approach on a real data stream with 7.2 Mega labeled examples and 20.4 Mega unlabeled examples. The results
show that iTT algorithm can enhance accuracy of P2P traffic classification by exploiting unlabeled examples. In addition,
it can effectively deal with dynamic nature of streaming data to detect the changes in communities of peers. We extracted
attributes only from the IP layer, eliminating the privacy concern associated with the techniques that use deep packet inspection.
Bijan Raahemi
is an assistant professor at the Telfer School of Management, University of Ottawa, Canada, with cross-appointment with the
School of Information Technology and Engineering. He received his Ph.D. in Electrical and Computer Engineering from the University
of Waterloo, Canada, in 1997. Prior to joining the University of Ottawa, Dr. Raahemi held several research positions in Telecommunications
industry, including Nortel Networks and Alcatel-Lucent, focusing on Computer Networks Architectures and Services, Dynamics
of Internet Traffic, Systems Modeling, and Performance Analysis of Data Networks. His current research interests include Knowledge
Discovery and Data Mining, Information Systems, and Data Communications Networks. Dr. Raahemi’s work has appeared in several
peer-reviewed journals and conference proceedings. He also holds 10 patents in Data Communications. He is a senior Member
of the Institute of Electrical and Electronics Engineering (IEEE), and a member of the Association for Computing Machinery
(ACM).
Weicai Zhong
is a post-doctoral fellow at the Telfer School of Management, University of Ottawa, Canada. He received a B.S. degree in computer
science and technology from Xidian University, Xi’an, China, in 2000 and a Ph.D. in pattern recognition and intelligent systems
from Xidian University in 2004. Prior to joining the University of Ottawa, Dr. Zhong was a senior statistician in SPSS Inc.
from Jan. 2005 to Dec. 2007. His current research interests include Internet Traffic Identification, Data Mining, and Evolutionary
Computation. He is a member of the Institute of Electrical and Electronics Engineering (IEEE).
Jing Liu
is an Associate Professor with Xidian University, China. She received a B.S. degree in computer science and technology from
Xidian University, Xi’an, China, in 2000, and a Ph.D. in circuits and systems from Xidian University in 2004. Her research
interests include Data Mining, Evolutionary Computation, and Multiagent Systems. She is a member of the Institute of Electrical
and Electronics Engineering (IEEE).
![MediaObjects/12083_2008_22_Figc_HTML.jpg](/content/l45430225783060p/MediaObjects/12083_2008_22_Figc_HTML.jpg) |
| |
Keywords: | Stream data mining Concept drift Windowing technique Tri-training Unlabeled data Peer-to-peer traffic IP traffic identification |
本文献已被 SpringerLink 等数据库收录! |
|