首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Offline/realtime traffic classification using semi-supervised learning   总被引:4,自引:0,他引:4  
Jeffrey  Anirban  Martin  Ira  Carey 《Performance Evaluation》2007,64(9-12):1194-1213
Identifying and categorizing network traffic by application type is challenging because of the continued evolution of applications, especially of those with a desire to be undetectable. The diminished effectiveness of port-based identification and the overheads of deep packet inspection approaches motivate us to classify traffic by exploiting distinctive flow characteristics of applications when they communicate on a network. In this paper, we explore this latter approach and propose a semi-supervised classification method that can accommodate both known and unknown applications. To the best of our knowledge, this is the first work to use semi-supervised learning techniques for the traffic classification problem. Our approach allows classifiers to be designed from training data that consists of only a few labeled and many unlabeled flows. We consider pragmatic classification issues such as longevity of classifiers and the need for retraining of classifiers. Our performance evaluation using empirical Internet traffic traces that span a 6-month period shows that: (1) high flow and byte classification accuracy (i.e., greater than 90%) can be achieved using training data that consists of a small number of labeled and a large number of unlabeled flows; (2) presence of “mice” and “elephant” flows in the Internet complicates the design of classifiers, especially of those with high byte accuracy, and necessitates the use of weighted sampling techniques to obtain training flows; and (3) retraining of classifiers is necessary only when there are non-transient changes in the network usage characteristics. As a proof of concept, we implement prototype offline and realtime classification systems to demonstrate the feasibility of our approach.  相似文献   

2.
Streaming multimedia with UDP has become increasingly popular over distributed systems like the Internet. Scientific applications that stream multimedia include remote computational steering of visualization data and video-on-demand teleconferencing over the Access Grid. However, UDP does not possess a self-regulating, congestion-control mechanism; and most best-effort traffic is served by congestion-controlled TCP. Consequently, UDP steals bandwidth from TCP such that TCP flows starve for network resources. With the volume of Internet traffic continuing to increase, the perpetuation of UDP-based streaming will cause the Internet to collapse as it did in the mid-1980's due to the use of non-congestion-controlled TCP.To address this problem, we introduce the counter-intuitive notion of inter-packet spacing with control feedback to enable UDP-based applications to perform well in the next-generation Internet and computational grids. When compared with traditional UDP-based streaming, we illustrate that our approach can reduce packet loss over 50% without adversely affecting delivered throughput.  相似文献   

3.
Streaming multimedia with UDP has become increasingly popular over distributed systems like the Internet. Scientific applications that stream multimedia include remote computational steering of visualization data and video-on-demand teleconferencing over the Access Grid. However, UDP does not possess a self-regulating, congestion-control mechanism; and most best-effort traffic is served by congestion-controlled TCP. Consequently, UDP steals bandwidth from TCP such that TCP flows starve for network resources. With the volume of Internet traffic continuing to increase, the perpetuation of UDP-based streaming will cause the Internet to collapse as it did in the mid-1980's due to the use of non-congestion-controlled TCP. To address this problem, we introduce the counter-intuitive notion of inter-packet spacing with control feedback to enable UDP-based applications to perform well in the next-generation Internet and computational grids. When compared with traditional UDP-based streaming, we illustrate that our approach can reduce packet loss over 50% without adversely affecting delivered throughput.  相似文献   

4.
TCP与UDP网络流量对比分析研究*   总被引:12,自引:1,他引:11  
网络带宽不断增长,越来越多的音/视频、在线游戏等应用成为网络空间的主体。基于实时性考虑,这些新兴应用协议多选择UDP作为其底层的传输协议,使得UDP流量呈上升趋势,而以往的流量测量工作一般基于TCP进行,忽略了UDP协议。对国内某骨干网流量进行了连续12 h的在线测量,在传输层和应用层分别对TCP和UDP及其应用层协议的流的总数、长度分布、持续时间分布、流的速度分布等进行了详尽的分析,并对TCP和UDP的应用层协议流的大小、长短、快慢作了详细的分类。为网络流的分类技术、网络行为发现、网络设计等提供了数据支持。  相似文献   

5.
As the number of satellite-borne synthetic aperture radar (SAR) systems increases, both the availability and the length of multi-temporal (MT) sequences of SAR images have also increased. Previous research on MT SAR sequences suggests that they increase the classification accuracy for all applications over single date images. Yet the presence of speckle noise remains a problem and all images in the sequence must be speckle filtered before acceptable classification accuracy can be attained. Several speckle filters designed specifically for MT sequences have been reported in the literature. Filtering in the spatial domain, as is usually done, reduces the effective spatial resolution of the filtered image. MT speckle filters operate in both the spatial and temporal dimensions, thus the reduction in resolution is not likely to be as severe (although a comparison between MT and spatial filters has not been reported). While this advantage may be useful when extracting spatial features from the image sequence, it is not quite as apparent for classification applications. This research explores the relative performance of spatial and MT speckle filtering for a particular classification application: mapping boreal forest types. We report filter performance using the radiometric resolution as measured by the equivalent number of looks (NL), and classification performance as measured by the classification accuracy. We chose representative spatial and MT filters and found that spatial speckle filters offer the advantage of higher radiometric resolution and higher classification accuracy with lower algorithm complexity. Thus, we confirm that MT filtering offers no advantage for classification applications; spatial speckle filters yield higher overall performance.  相似文献   

6.
Classifying traffic into specific network applications is essential for application-aware network management and it becomes more challenging because modern applications complicate their network behaviors. While port number-based classifiers work only for some well-known applications and signature-based classifiers are not applicable to encrypted packet payloads, researchers tend to classify network traffic based on behaviors observed in network applications. In this paper, a session level flow classification (SLFC) approach is proposed to classify network flows as a session, which comprises of flows in the same conversation. SLFC first classifies flows into the corresponding applications by packet size distribution (PSD) and then groups flows as sessions by port locality. With PSD, each flow is transformed into a set of points in a two-dimension space and the distances between each flow and the representatives of pre-selected applications are computed. The flow is recognized as the application having a minimum distance. Meanwhile, port locality is used to group flows as sessions because an application often uses consecutive port numbers within a session. If flows of a session are classified into different applications, an arbitration algorithm is invoked to make the correction. The evaluation shows that SLFC achieves high accuracy rates on both flow and session classifications, say 99.9% and 99.98%, respectively. When SLFC is applied to online classification, it is able to make decisions quickly by checking at most 300 packets for long-lasting flows. Based on our test data, an average of 72% of packets in long-lasting flows can be skipped without reducing the classification accuracy rates.  相似文献   

7.
This paper provides algorithms for adding and subtracting eigenspaces, thus allowing for incremental updating and downdating of data models. Importantly, and unlike previous work, we keep an accurate track of the mean of the data, which allows our methods to be used in classification applications. The result of adding eigenspaces, each made from a set of data, is an approximation to that which would obtain were the sets of data taken together. Subtracting eigenspaces yields a result approximating that which would obtain were a subset of data used. Using our algorithms, it is possible to perform ‘arithmetic’ on eigenspaces without reference to the original data. Eigenspaces can be constructed using either eigenvalue decomposition (EVD) or singular value decomposition (SVD). We provide addition operators for both methods, but subtraction for EVD only, arguing there is no closed-form solution for SVD. The methods and discussion surrounding SVD provide the principle novelty in this paper. We illustrate the use of our algorithms in three generic applications, including the dynamic construction of Gaussian mixture models.  相似文献   

8.
This paper described the nascent filed of big health data classification and disease probability prediction based on multi-GPU cluster MapReduce platform. Firstly, we presented a novel optimization-based multi-GPU cluster MapReduce system (gcMR) which is general purpose and suitable for processing big health data. Secondly, we proposed a new method IVP-SVM to solve the problem of big health data classification and disease probabilistic predictive inaccuracy. To illustrate the power and flexibility of gcMR platform for big health data, applications of a broad class of health big data using IVP-SVM on gcMR platform are described. Experimental results shown that gcMR platform yields an average computing efficiency on different health applications ranging from 1.8- to 13.5-folds by comparing gcMR with other Multi-GPU MapReduce platform. And an accuracy of the proposed IVP-SVM on different health applications is ranging from 85 to 100 %. This provides a motivation for pursuing the use of gcMR and IVP-SVM as a big health data analytical platform and tool, respectively.  相似文献   

9.
传统的TCP由于采用速率减半的拥塞退避机制而使其在数据传输时易产生过大的速率波动,而UDP不具备拥塞退避机制,在拥塞的网络环境中,UDP流将大量抢占TCP流的网络带宽,同时自身的丢包也迅速增加,并可能带来系统拥塞崩溃的潜在危险,因此TCP和UDP都不能很好地满足实时流媒体业务的需要。文中研究了一个具有拥塞退避机制、网络吞吐量波动小,且能够与TCP协议公平分享带宽的传输协议———TCP友好速率控制协议(TFRC),并将其应用于实时多媒体流传输应用程序。研究测试结果表明采用TFRC后多媒体的实时播放较TCP平滑了许多。  相似文献   

10.
Accurate identification of network applications is important for many network activities. The traditional port-based technique has become much less effective since many new applications no longer use well-known fixed port numbers. In this paper, we propose a novel profile-based approach to identifying traffic flows belonging to the target application. In contrast to the method used in previous studies, of classifying traffic based on statistics of individual flows, we build behavioral profiles of the target application, which describe dominant patterns in the application. Based on the behavior profiles, a two-level matching method is used to identify new traffic. We first determine whether a host participates in the target application by comparing its behavior with the profiles. Subsequently, we compare each flow of the host with those patterns in the application profiles to determine which flows belong to this application. We demonstrate the effectiveness of our method on-campus traffic traces. Our results show that one can identify popular P2P applications with very high accuracy.  相似文献   

11.
Many organisms rely on reedbed habitats for their existence, yet, over the past century there has been a drastic reduction in the area and quality of reedbeds in the UK due to intensified human activities. In order to develop management plans for conserving and expanding this threatened habitat, accurate up-to-date information is needed concerning its current distribution and status. This information is difficult to collect using field surveys because reedbeds exist as small patches that are sparsely distributed across landscapes. Hence, this study was undertaken to develop a methodology for accurately mapping reedbeds using very high resolution QuickBird satellite imagery. The objectives were to determine the optimum combination of textural and spectral measures for mapping reedbeds; to investigate the effect of the spatial resolution of the input data upon classification accuracy; to determine whether the maximum likelihood classifier (MLC) or artificial neural network (ANN) analysis produced the most accurate classification; and to investigate the potential of refining the reedbed classification using slope suitability filters produced from digital terrain data. The results indicate an increase in the accuracy of reedbed delineations when grey-level co-occurrence textural measures were combined with the spectral bands. The most effective combination of texture measures were entropy and angular second moment. Optimal reedbed and overall classification accuracies were achieved using a combination of pansharpened multispectral and texture images that had been spatially degraded from 0.6 to 4.8 m. Using the 4.8 m data set, the MLC produced higher classification accuracy for reedbeds than the ANN analysis. The application of slope suitability filters increased the classification accuracy of reedbeds from 71% to 79%. Hence, this study has demonstrated that it is possible to use high resolution multispectral satellite imagery to derive accurate maps of reedbeds through appropriate analysis of image texture, judicious selection of input bands, spatial resolution and classification algorithm and post-classification refinement using terrain data.  相似文献   

12.
The Internet has significantly evolved in the number and variety of applications. Network operators need mechanisms to constantly monitor and study these applications. Modern routers employ passive measurement solution called Sampled NetFlow to collect basic statistics on a per-flow basis (for a small subset of flows), that could provide valuable information for application monitoring. Given modern applications routinely consist of several flows, potentially to many different destinations, only a few flows are sampled per application session using Sampled NetFlow. To address this issue, in this paper, we introduce related sampling that allows network operators to give a higher probability to flows that are part of the same application session. Given the lack of application semantics in the middle of the network, our architecture, RelSamp, treats flows that share the same source IP address as related. Our heuristic works well in practice as hosts typically run few applications at any given instant, as observed using a measurement study on real traces. In our evaluation using real traces, we show that RelSamp achieves 5–10× more flows per application session compared to Sampled NetFlow for the same effective number of sampled packets. We also show that behavioral and statistical classification approaches such as BLINC, SVM and C4.5 achieve up to 50% better classification accuracy compared to Sampled NetFlow, while not impairing existing management tasks such as volume estimation too much.  相似文献   

13.
Variation in the leaf optical properties imposed by variation in genetics and location has been addressed in recent literature, but those stemming from forest seasonality and phenology have been less well explored. Here, we explore the effect of inter-seasonal spectral variation on the potential for automated classification methods to accurately discern species of trees and lianas from high-resolution spectral data collected at the leaf level at two tropical forest sites. Through the application of a set of data reduction techniques and classification methods to leaf-level spectral data collected at both rainforest and seasonally dry sites in Panama, we found that in all cases the structure and organization of spectrally-derived taxonomies varied substantially between seasons. Using principle component analysis and a non-parametric classifier, we found at both sites that species-level classification was possible with a high level of accuracy within a given season. Classification across season was not, however, with accuracy dropping on average by a factor of 10. This study represents one of the first systematic investigations of leaf-level spectro-temporal variability, an appreciation for which is crucial to the advancement of species classification methods, with broad applications within the environmental sciences.  相似文献   

14.
In this paper we focus on the application of a higher-order finite volume method for the resolution of Computational Aeroacoustics problems. In particular, we present the application of a finite volume method based in Moving Least Squares approximations in the context of a hybrid approach for low Mach number flows. In this case, the acoustic and aerodynamic fields can be computed separately. We focus on two kinds of computations: turbulent flow and aeroacoustics in complex geometries. Both fields require very accurate methods to capture the fine features of the flow, small scales in the case of turbulent flows and very low-amplitude acoustic waves in the case of aeroacoustics. On the other hand, the use of unstructured grids is interesting for real engineering applications, but unfortunately, the accuracy and efficiency of the numerical methods developed for unstructured grids is far to reach the performance of those methods developed for structured grids. In this context, we propose the FV-MLS method as a tool for accurate CAA computations on unstructured grids.  相似文献   

15.
Sentinel-2 satellite sensors acquire three kinds of optical remote sensing images with different spatial resolutions.How to improve the spatial resolution of lower spatial resolution bands by fusion method is one of the problems faced by Sentinel-2 applications.Taking the Sentinel\|2B image as the data source,a high spatial resolution band was generated or selected from the four 10m spatial resolution bands by four methods:the maximum correlation coefficient,the central wavelength nearest neighbor,the pixel maximum and the principal component analysis.We fused the one high spatial resolution band produced and six multispectral bands with 20 m spatial resolution by the five fusion methods of PCA,HPF,WT,GS and Pansharp to produce six multispectral bands with 10 m spatial resolution and the fusion results were evaluated from three aspects:qualitative and quantitative (information entropy,average gradient,spectral correlation coefficient,root mean square error and general image quality index) and classification accuracy of fused images.Results show that the fusion quality of Pansharp with the maximum correlation coefficient is better than other fusion methods,and the classification accuracy is slightly lower than the GS with the pixel maximum of the highest classification accuracy and far higher than the original four multispectral image with 10 m spatial resolution.According to the classification accuracy of experimental data,different fusion methods have different advantages in extraction of different ground objects.In application,appropriate schemes should be selected according to actual research needs.This research can provide reference for Sentinel-2 satellite and similar satellite data processing and application.  相似文献   

16.
A spatial feature extraction method was applied to increase the accuracy of land-cover classification of forest type information extraction. Traditional spatial feature extraction applications use high-resolution images. However, improving the classification accuracy is difficult when using medium-resolution images, such as a 30 m resolution Enhanced Thematic Mapper Plus (ETM+) image. In this study, we demonstrated a novel method that used the vegetation local difference index (VLDI) derived from the normalized difference vegetation index (NDVI), which were calculated based on the topographically corrected ETM+ image, to delineate spatial features. A simple maximum likelihood classifier and two different ways to use spatial information were introduced in this study as the frameworks to incorporate both spectral and spatial information for analysis. The results of the experiments, where Landsat ETM+ and digital elevation model (DEM) images, together with ground truth data acquired in the study area were used, show that combining the spatial information extracted from medium-resolution images and spectral information improved both classification accuracy and visual qualities. Moreover, the use of spatial information extracted through the proposed method greatly improved the classification performance of particular forest types, such as sparse woodlands.  相似文献   

17.
The problem of classifying traffic flows in networks has become more and more important in recent times, and much research has been dedicated to it. In recent years, there has been a lot of interest in classifying traffic flows by application, based on the statistical features of each flow. Information about the applications that are being used on a network is very useful in network design, accounting, management, and security. In our previous work we proposed a classification algorithm for Internet traffic flow classification based on Artificial Immune Systems (AIS). We also applied the algorithm on an available data set, and found that the algorithm performed as well as other algorithms, and was insensitive to input parameters, which makes it valuable for embedded systems. It is also very simple to implement, and generalizes well from small training data sets. In this research, we expanded on the previous research by introducing several optimizations in the training and classification phases of the algorithm. We improved the design of the original algorithm in order to make it more predictable. We also give the asymptotic complexity of the optimized algorithm as well as draw a bound on the generalization error of the algorithm. Lastly, we also experimented with several different distance formulas to improve the classification performance. In this paper we have shown how the changes and optimizations applied to the original algorithm do not functionally change the original algorithm, while making its execution 50–60% faster. We also show that the classification accuracy of the Euclidian distance is superseded by the Manhattan distance for this application, giving 1–2% higher accuracy, making the accuracy of the algorithm comparable to that of a Naïve Bayes classifier in previous research that uses the same data set.  相似文献   

18.
This study deals with the evaluation of accuracy benefits offered by a fuzzy classifier as compared to hard classifiers using satellite imagery for thematic mapping applications. When a crisp classifier approach is adopted to classify moderate resolution data, the presence of mixed coverage pixels implies that the final product will have errors, either of omission or commission, which are not avoidable and are solely due to the spatial resolution of the data. Theoretically, a soft classifier is not affected by such errors, and in principle can produce a classification that is more accurate than any hard classifier. In this study we use the Pareto boundary of optimal solutions as a quantitative method to compare the performance of a fuzzy statistical classifier to the one of two hard classifiers, and to determine the highest accuracy which could be achieved by hard classifiers. As an application, the method is applied to a case of snow mapping from Moderate-Resolution Imaging Spectroradiometer (MODIS) data on two alpine sites, validated with contemporaneous fine-resolution Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data. The results for this case study showed that the soft classifier not only outperformed the two crisp classifiers, but also yielded higher accuracy than the maximum theoretical accuracy of any crisp classifier on the study areas. While providing a general assessment framework for the performance of soft classifiers, the results obtained by this inter-comparison exercise showed that soft classifiers can be an effective solution to overcome errors which are intrinsic in the classification of coarse and moderate resolution data.  相似文献   

19.
Security of session initiation protocol (SIP) servers is a serious concern of Voice over Internet (VoIP) vendors. The important contribution of our paper is an accurate and real-time attack classification system that detects: (1) application layer SIP flood attacks that result in denial of service (DoS) and distributed DoS attacks, and (2) Spam over Internet Telephony (SPIT). The major advantage of our framework over existing schemes is that it performs packet-based analysis using a set of spatial and temporal features. As a result, we do not need to transform network packet streams into traffic flows and thus save significant processing and memory overheads associated with the flow-based analysis. We evaluate our framework on a real-world SIP traffic—collected from the SIP server of a VoIP vendor—by injecting a number of application layer anomalies in it. The results of our experiments show that our proposed framework achieves significantly greater detection accuracy compared with existing state-of-the-art flooding and SPIT detection schemes.  相似文献   

20.
R. Wilson  M. Spann 《Pattern recognition》1990,23(12):1413-1425
Estimation theory is used to derive a new approach to the clustering problem. The new method is a unification of centroid and mode estimation, achieved by considering the effect of spatial scale on the estimator. The result is a multiresolution method which spans a range of spatial scales, giving enhanced robustness both to noise in the data and to changes of scale in the data, by using comparison between scales as a test of cluster validity. Iterative and non-iterative algorithms based on the new estimator are presented and are shown to be more accurate than simple scale-space filtering in identifying and locating the cluster centres from noisy test data. Results from a wide range of applications are used to illustrate the power and versatility of the new method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号