首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Hence, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Since employing link based Web page importance metrics within a multi-processes crawler bears a considerable communication overhead on the overall system and cannot produce the precise answer set, employing these metrics in search engines is not an absolute solution to identify the best search answer set by the overall search system. Thus considering the employment of a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel Web crawler which employs a link independent clickstream based Web page importance metric. The experiments of this metric over the restricted boundary Web zone of our crowded UTM University Web site shows the efficiency of the proposed metric.  相似文献   

2.
一种资源敏感的Web应用性能诊断方法   总被引:1,自引:0,他引:1  
王伟  张文博  魏峻  钟华  黄涛 《软件学报》2010,21(2):194-208
提出一种资源敏感的性能诊断方法.对于Web应用事务,该方法利用资源服务时间对于不同负载特征相对稳定的特点建立性能特征链,并依据运行时资源服务时间异常实现性能异常的有效检测、定位和诊断.实验结果表明,该方法可适应系统负载特征变化,诊断各种资源使用相关的性能异常.  相似文献   

3.
In this work, we present a novel approach for the efficient materialization of dynamic web pages in e-commerce applications such as an online retail store with millions of items, hundreds of HTTP requests per second and tens of dynamic web page types. In such applications, user satisfaction, as measured in terms of response time (QoS) and content freshness (QoD), determines their success especially under heavy workload. The novelty of our materialization approach over existing ones is that, it considers the data dependencies between content fragments of a dynamic web page. We introduce two new semantic-based data freshness metrics that capture the content dependencies and propose two materialization algorithms that balance QoS and QoD. In our evaluation, we use a real-world experimental system that resembles an online bookstore and show that our approach outperforms existing QoS-QoD balancing approaches in terms of server-side response time (throughput), data freshness and scalability.  相似文献   

4.
Resource sharing is an inherent characteristic of cloud data centers. Virtual Machines (VMs) and/or Containers that are co-located in the same physical server often compete for resources leading to interference. The noisy neighbor’s effect refers to an anomaly caused by a VM/container limiting resources accessed by another one. Our main contribution is an online, lightweight and application-agnostic solution for anomaly detection, that follows an unsupervised approach. It is based on comparing models for different lags: Dirichlet Process Gaussian Mixture Models to characterize the resource usage profile of the application, and distance measures to score the similarity among models. An alarm is raised when there is an abrupt change in short-term lag (i.e. high distance score for short-term models), while the long-term state remains constant. We test the algorithm for different cloud workloads: websites, periodic batch applications, Spark-based applications, and Memcached server. We are able to detect anomalies in the CPU and memory resource usage with up to 82–96% accuracy (recall) depending on the scenario. Compared to other baseline methods, our approach is able to detect anomalies successfully, while raising low number of false positives, even in the case of applications with unusual normal behavior (e.g. periodic). Experiments show that our proposed algorithm is a lightweight and effective solution to detect noisy neighbor effect without any historical info about the application, that could also be potentially applied to other kind of anomalies.  相似文献   

5.
We propose and motivate an alternative to the traditional error-based or cost-based evaluation metrics for the goodness of speaker detection performance. The metric that we propose is an information-theoretic one, which measures the effective amount of information that the speaker detector delivers to the user. We show that this metric is appropriate for the evaluation of what we call application-independent detectors, which output soft decisions in the form of log-likelihood-ratios, rather than hard decisions. The proposed metric is constructed via analysis and generalization of cost-based evaluation metrics. This construction forms an interpretation of this metric as an expected cost, or as a total error-rate, over a range of different application-types. We further show how the metric can be decomposed into a discrimination and a calibration component. We conclude with an experimental demonstration of the proposed technique to evaluate three speaker detection systems submitted to the NIST 2004 Speaker Recognition Evaluation.  相似文献   

6.
Detection of cross-channel anomalies   总被引:1,自引:1,他引:0  
The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common among the individual channel anomalies and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single-channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large-scale data stream analysis.  相似文献   

7.
We investigated the relationships between 11 phenological metrics, topographic shade, and anomalous temperature patterns detected using wavelet analysis in seasonal deciduous forests of south Brazil. To obtain the metrics, we applied the TIMESAT algorithm to the enhanced vegetation index (EVI) from Moderate Resolution Imaging Spectroradiometer (MODIS)/Terra. MODIS acquires data from the study area under a large seasonal amplitude in the solar zenith angle (SZA). We evaluated the effect of topography on phenological metrics by correlating the metrics with shaded relief values. To analyse the inter-annual phenological metric variations with anomalous and regular temperature patterns, we calculated standard anomalies for each metric. Finally, we established relationships between the metrics and the minimum, maximum, and mean temperatures from growing seasons that spanned over 10 seasonal cycles between 2002 and 2012. The correlation results with shaded relief showed that the left (LD) and right derivative (RD), small integral (SInt), seasonal amplitude (SA), base level (BL), and maximum VI value (MV) were sensitive to topographic effects. The seasonal cycles with the highest temperatures in the growing season (2006/2007 and 2009/2010) exhibited a delay at the end of the cycle and a higher interval of duration and productivity, which was indicated by the positive standard anomalies for end of season (EOS), length of season (LOS), large integral (LInt), and SInt. We observed a different result for the lowest temperature cycle (2003/2004). The means for these metrics in anomalous seasons differed significantly from the metrics of other regular cycles at the 0.05 significance level using paired t-tests. Statistically significant correlations were observed between the metrics and minimum and mean temperature values of the 10 seasonal cycles.  相似文献   

8.
Networks of dynamic systems, including social networks, the World Wide Web, climate networks, and biological networks, can be highly clustered. Detecting clusters, or communities, in such dynamic networks is an emerging area of research; however, less work has been done in terms of detecting community-based anomalies. While there has been some previous work on detecting anomalies in graph-based data, none of these anomaly detection approaches have considered an important property of evolutionary networks??their community structure. In this work, we present an approach to uncover community-based anomalies in evolutionary networks characterized by overlapping communities. We develop a parameter-free and scalable algorithm using a proposed representative-based technique to detect all six possible types of community-based anomalies: grown, shrunken, merged, split, born, and vanished communities. We detail the underlying theory required to guarantee the correctness of the algorithm. We measure the performance of the community-based anomaly detection algorithm by comparison to a non?Crepresentative-based algorithm on synthetic networks, and our experiments on synthetic datasets show that our algorithm achieves a runtime speedup of 11?C46 over the baseline algorithm. We have also applied our algorithm to two real-world evolutionary networks, Food Web and Enron Email. Significant and informative community-based anomaly dynamics have been detected in both cases.  相似文献   

9.
The emergence of the deep Web has given a new connotation to the concept of ranking database query results. Earlier approaches for ranking either resorted to analyzing frequencies of database values and query logs or establishing user profiles. In contrast, an integrated approach, based on the notion of a similarity model, for holistically supporting user- and query-dependent ranking has been recently proposed (Telang et al. in IEEE Transactions on Knowledge and Data Engineering (TKDE), 2011). An important component of this framework is a workload consisting of ranking functions, wherein each function represents an individual user’s preferences towards the results of a specific query. At the time of answering a query for which no prior ranking function exists, the similarity model is employed, and is expected to ensure a good quality of ranking as long as a ranking function for a very similar user-query pair exists in this workload. In this paper, we address the problem of determining an appropriate set of user-query pairs to form a workload of ranking functions to support user- and query-dependent ranking for Web databases. We propose a novel metric, termed workload goodness, that quantifies the notion of a “good” workload into an absolute value. The process of finding such a workload of optimal goodness is a combinatorially explosive problem; therefore, we propose a heuristic solution, and advance three approaches for determining an acceptable workload, in a static as well as a dynamic environment. We discuss the effectiveness of our proposal analytically as well as experimentally over two Web databases.  相似文献   

10.
A pattern is a model or a template used to summarize and describe the behavior (or the trend) of data having generally some recurrent events. Patterns have received a considerable attention in recent years and were widely studied in the data mining field. Various pattern mining approaches have been proposed and used for different applications such as network monitoring, moving object tracking, financial or medical data analysis, scientific data processing, etc. In these different contexts, discovered patterns were useful to detect anomalies, to predict data behavior (or trend) or, more generally, to simplify data processing or to improve system performance. However, to the best of our knowledge, patterns have never been used in the context of Web archiving. Web archiving is the process of continuously collecting and preserving portions of the World Wide Web for future generations. In this paper, we show how patterns of page changes can be useful tools to efficiently archive Websites. We first define our pattern model that describes the importance of page changes. Then, we present the strategy used to (i) extract the temporal evolution of page changes, (ii) discover patterns, to (iii) exploit them to improve Web archives. The archive of French public TV channels France Télévisions is chosen as a case study to validate our approach. Our experimental evaluation based on real Web pages shows the utility of patterns to improve archive quality and to optimize indexing or storing.  相似文献   

11.
We characterize usage and problems for Web applications, evaluate their reliability, and examine the potential for reliability improvement. Based on the characteristics of Web applications and the overall Web environment, we classify Web problems and focus on the subset of source content problems. Using information about Web accesses, we derive various measurements that can characterize Web site workload at different levels of granularity and from different perspectives. These workload measurements, together with failure information extracted from recorded errors, are used to evaluate the operational reliability for source contents at a given Web site and the potential for reliability improvement. We applied this approach to the Web sites www.seas.smu.edu and www.kde.org. The results demonstrated the viability and effectiveness of our approach.  相似文献   

12.
Obtaining good probability estimates is imperative for many applications. The increased uncertainty and typically asymmetric costs surrounding rare events increase this need. Experts (and classification systems) often rely on probabilities to inform decisions. However, we demonstrate that class probability estimates obtained via supervised learning in imbalanced scenarios systematically underestimate the probabilities for minority class instances, despite ostensibly good overall calibration. To our knowledge, this problem has not previously been explored. We propose a new metric, the stratified Brier score, to capture class-specific calibration, analogous to the per-class metrics widely used to assess the discriminative performance of classifiers in imbalanced scenarios. We propose a simple, effective method to mitigate the bias of probability estimates for imbalanced data that bags estimators independently calibrated over balanced bootstrap samples. This approach drastically improves performance on the minority instances without greatly affecting overall calibration. We extend our previous work in this direction by providing ample additional empirical evidence for the utility of this strategy, using both support vector machines and boosted decision trees as base learners. Finally, we show that additional uncertainty can be exploited via a Bayesian approach by considering posterior distributions over bagged probability estimates.  相似文献   

13.
Video surveillance applications need video data center to provide elastic virtual machine (VM) provisioning. However, the workloads of the VMs are hardly to be predicted for online video surveillance service. The unknown arrival workloads easily lead to workload skew among VMs. In this paper, we study how to balance the workload skew on online video surveillance system. First, we design the system framework for online surveillance service which consists of video capturing and analysis tasks. Second, we propose StreamTune, an online resource scheduling approach for workload balancing, to deal with irregular video analysis workload with the minimum number of VMs. We aim at timely balancing the workload skew on video analyzers without depending on any workload prediction method. Furthermore, we evaluate the performance of the proposed approach using a traffic surveillance application. The experimental results show that our approach is well adaptive to the variation of workload and achieves workload balance with less VMs.  相似文献   

14.
Ontology languages such as OWL are being widely used as the Semantic Web movement gains momentum. With the proliferation of the Semantic Web, more and more large-scale ontologies are being developed in real-world applications to represent and integrate knowledge and data. There is an increasing need for measuring the complexity of these ontologies in order for people to better understand, maintain, reuse and integrate them. In this paper, inspired by the concept of software metrics, we propose a suite of ontology metrics, at both the ontology-level and class-level, to measure the design complexity of ontologies. The proposed metrics are analytically evaluated against Weyuker’s criteria. We have also performed empirical analysis on public domain ontologies to show the characteristics and usefulness of the metrics. We point out possible applications of the proposed metrics to ontology quality control. We believe that the proposed metric suite is useful for managing ontology development projects.  相似文献   

15.
Evaluation of clustering has significant importance in various applications of expert and intelligent systems. Clusters are evaluated in terms of quality and accuracy. Measuring quality is a unsupervised approach that completely depends on edges, whereas measuring accuracy is a supervised approach that measures similarity between the real clustering and the predicted clustering. Accuracy cannot be measured for most of the real-world networks since real clustering is unavailable. Thus, it will be advantageous from the viewpoint of expert systems to develop a quality metric that can assure certain level of accuracy along with the quality of clustering.In this paper we have proposed a set of three quality metrics for graph clustering that have the ability to ensure accuracy along with the quality. The effectiveness of the metrics has been evaluated on benchmark graphs as well as on real-world networks and compared with existing metrics. Results indicate competency of the suggested metrics while dealing with accuracy, which will definitely improve the decision-making in expert and intelligent systems. We have also shown that our metrics satisfy all of the six quality-related properties.  相似文献   

16.
The problem of anomaly detection in time series has received a lot of attention in the past two decades. However, existing techniques cannot locate where the anomalies are within anomalous time series, or they require users to provide the length of potential anomalies. To address these limitations, we propose a self-learning online anomaly detection algorithm that automatically identifies anomalous time series, as well as the exact locations where the anomalies occur in the detected time series. In addition, for multivariate time series, it is difficult to detect anomalies due to the following challenges. First, anomalies may occur in only a subset of dimensions (variables). Second, the locations and lengths of anomalous subsequences may be different in different dimensions. Third, some anomalies may look normal in each individual dimension but different with combinations of dimensions. To mitigate these problems, we introduce a multivariate anomaly detection algorithm which detects anomalies and identifies the dimensions and locations of the anomalous subsequences. We evaluate our approaches on several real-world datasets, including two CPU manufacturing data from Intel. We demonstrate that our approach can successfully detect the correct anomalies without requiring any prior knowledge about the data.  相似文献   

17.
Embedded systems execute applications that execute hardware differently depending on the computation task, generating time-varying workloads. Energy minimization can be reached by using the low-power central processing unit (CPU) frequency for each workload. We propose an autonomous and online approach, capable of reducing energy consumption from adaptation to workload variations even in an unknown environment. In this approach, we improved the AEWMA algorithm into a new algorithm called AEWMA-MSE, adding new functionality to detect workload changes and demonstrating why it is better to use statistical analysis for real user cases in a mobile environment. Also, a new power model for mobile devices based on k-NN algorithm for regression was proposed and validated proving to have a better trade-off between execution time and precision than neural networks and linear regression-based models. AEWMA-MSE and the proposed power model are integrated into a novel algorithm for energy management based on reinforcement learning that suitably selects the appropriate CPU frequency based on workload predictions to minimize energy consumption. The proposed approach is validated through simulation by using real smartphone data from an ARM Cortex A7 processor used in a commercial smartphone. Our proposal proved to have an improvement in the Q-learning cost function and can effectively minimize the average energy consumption by 21% and up to 29% when compared to the already existing approaches.  相似文献   

18.
In spite of several decades of software metrics research and practice, there is little understanding of how software metrics relate to one another, nor is there any established methodology for comparing them. We propose a novel experimental technique, based on search-based refactoring, to ‘animate’ metrics and observe their behaviour in a practical setting. Our aim is to promote metrics to the level of active, opinionated objects that can be compared experimentally to uncover where they conflict, and to understand better the underlying cause of the conflict. Our experimental approaches include semi-random refactoring, refactoring for increased metric agreement/disagreement, refactoring to increase/decrease the gap between a pair of metrics, and targeted hypothesis testing. We apply our approach to five popular cohesion metrics using ten real-world Java systems, involving 330,000 lines of code and the application of over 78,000 refactorings. Our results demonstrate that cohesion metrics disagree with each other in a remarkable 55 % of cases, that Low-level Similarity-based Class Cohesion (LSCC) is the best representative of the set of metrics we investigate while Sensitive Class Cohesion (SCOM) is the least representative, and we discover several hitherto unknown differences between the examined metrics. We also use our approach to investigate the impact of including inheritance in a cohesion metric definition and find that doing so dramatically changes the metric.  相似文献   

19.
ContextSoftware quality attributes are assessed by employing appropriate metrics. However, the choice of such metrics is not always obvious and is further complicated by the multitude of available metrics. To assist metrics selection, several properties have been proposed. However, although metrics are often used to assess successive software versions, there is no property that assesses their ability to capture structural changes along evolution.ObjectiveWe introduce a property, Software Metric Fluctuation (SMF), which quantifies the degree to which a metric score varies, due to changes occurring between successive system's versions. Regarding SMF, metrics can be characterized as sensitive (changes induce high variation on the metric score) or stable (changes induce low variation on the metric score).MethodSMF property has been evaluated by: (a) a case study on 20 OSS projects to assess the ability of SMF to differently characterize different metrics, and (b) a case study on 10 software engineers to assess SMF's usefulness in the metric selection process.ResultsThe results of the first case study suggest that different metrics that quantify the same quality attributes present differences in their fluctuation. We also provide evidence that an additional factor that is related to metrics’ fluctuation is the function that is used for aggregating metric from the micro to the macro level. In addition, the outcome of the second case study suggested that SMF is capable of helping practitioners in metric selection, since: (a) different practitioners have different perception of metric fluctuation, and (b) this perception is less accurate than the systematic approach that SMF offers.ConclusionsSMF is a useful metric property that can improve the accuracy of metrics selection. Based on SMF, we can differentiate metrics, based on their degree of fluctuation. Such results can provide input to researchers and practitioners in their metric selection processes.  相似文献   

20.
The objective of the study was to evaluate effects of complexity on cognitive workload in a simulated air traffic control conflict detection task by means of eye movements recording. We manipulated two complexity factors, convergence angle and aircrafts minimum distance at closest approach, in a multidimensional workload assessment method based on psychophysiological, performance, and subjective measures. Conflict trials resulted more complex and time-consuming than no conflicts, requiring more frequent fixations and saccades. Moreover, large saccades showed reduced burst power with higher task complexity. A motion-based and a ratio-based strategy were suggested for conflicts and no conflicts on the basis of ocular metrics analysis: aircrafts differential speed and distance to convergence point at trial start were considered determinant for strategy adoption.Relevance to industryEye metrics measurement for online workload assessment enhances better identification of workload-inducing scenarios and adopted strategy for traffic management. System design, as well as air traffic control operators training programs, might benefit from on line workload measurement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号